As a Senior Software Engineer (L4/L5) on the Model Serving Systems team, you will design, build, and operate robust, scalable, and highly available systems that serve machine learning models for a wide range of applications across the organization. Your work will directly impact how our members discover content, how our studio produces original films and series, and how we optimize our streaming infrastructure.
Our team provides the core infrastructure that enables ML engineers and data scientists to deploy, serve, and manage their models effectively and efficiently. We focus on building generic, self-serve solutions that can adapt to diverse model types and serving patterns, from real-time recommendations to offline analytics. We are passionate about reducing friction for ML practitioners, ensuring high reliability, and optimizing for performance and cost.
What you will do:
- Design and implement highly scalable, performant, and reliable model serving systems.
- Collaborate with ML engineers and data scientists to understand their serving needs and translate them into generic platform capabilities.
- Develop tools and frameworks that simplify the deployment, management, and monitoring of ML models in production.
- Optimize existing systems for latency, throughput, and resource utilization.
- Participate in on-call rotations to support production systems.
- Contribute to the long-term vision and strategy of the ML Platform team.
Who you are:
Experience:
- 5+ years of experience designing and building large-scale distributed systems.
- Strong proficiency in Java or other JVM languages (e.g., Scala, Kotlin).
- Experience with cloud platforms (e.g., AWS, GCP, Azure) and containerization technologies (e.g., Docker, Kubernetes).
- Familiarity with machine learning concepts and lifecycle (training, deployment, inference, monitoring).
- Experience with real-time data processing and streaming technologies (e.g., Apache Kafka, Flink) is a plus.
- Experience with performance tuning and optimization of distributed systems.
Skills:
- Excellent problem-solving, debugging, and analytical skills.
- Strong communication and collaboration skills, with the ability to work effectively with cross-functional teams.
- Ability to operate independently and drive projects from conception to completion.
- Passion for building robust, scalable, and user-friendly infrastructure.
Education:
- Bachelor’s or Master’s degree in Computer Science or a related field, or equivalent practical experience.