Back to jobs
DI₹15L – ₹25L /yr
Software Engineer I-Continuous Learning
DigitalOcean24 hours ago
Full Time
**About the Role** DigitalOcean's Agentic AI organisation provides a powerful inference cloud, Managed Agents, and robust Feedback systems that enable customers to run AI inference confidently at scale. We are looking for a Software Engineer I to join our Feedback Systems team. This team is responsible for building the scalable backend infrastructure that tests and evaluates AI agents safely and reliably. In this role, you will help develop and maintain high-throughput backend systems that orchestrate complex execution workflows, interface with isolated execution environments, and process evaluation signals. You will work on solving distributed systems problems at the intersection of infrastructure orchestration and asynchronous data flows, ensuring our platforms are robust, scalable, and highly available. What You’ll Do: Building, testing, and maintaining robust backend services and highly concurrent asynchronous workflows, primarily in Python and Go. Integrating backend control planes with isolated, secure execution environments to safely run agents and capture execution artifacts. Developing and maintaining scalable APIs (gRPC, REST) to serve as the connective tissue across data pipelines, evaluation engines, and internal platforms. Collaborating closely with cross-functional teams to ensure infrastructure reliably supports evaluation scenarios and outcome metrics. Contributing to engineering best practices, including participating in code reviews, writing comprehensive tests, and helping with technical design documentation. Monitoring system performance, investigating bugs, and ensuring high reliability of orchestration systems. What You’ll Add to DigitalOcean: 1+ years of software engineering experience building concurrent, fault-tolerant distributed systems. Proficiency in Python and Go (or strong experience in C++/Java with a willingness to master Go and Python quickly). Familiarity or practical experience with workflow orchestration engines for managing distributed state and asynchronous tasks. Working knowledge of containerization, virtualization, or workload isolation technologies (e.g., Docker, Kubernetes, or microVMs). Experience building and consuming resilient APIs and handling event or message data. A strong sense of ownership, excellent communication skills, and the ability to work effectively in a globally distributed team. AI/ML Ecosystem Awareness: While you do not need to be an ML researcher, you possess a strong foundational understanding of or interest in interacting with LLM APIs, prompt constraints, and the architectural challenges of testing non-deterministic systems.
Listing aggregated from a public source. Always verify details on the employer's site before applying.