Evaluation Engineer

The Opportunity

We are seeking a highly skilled and passionate Evaluation Engineer to join our innovative team and play a pivotal role in ensuring the quality, reliability, and safety of our cutting-edge AI models. As an Evaluation Engineer, you will be responsible for designing, implementing, and executing comprehensive evaluation strategies for large language models (LLMs) and other advanced AI systems.

This is a unique opportunity to work at the forefront of AI development, contributing to the responsible and robust deployment of intelligent technologies.

What you’ll do:

Develop and implement rigorous evaluation strategies and frameworks for cutting-edge AI models, with a focus on large language models (LLMs).
Design and execute comprehensive test suites and benchmarks to assess model performance, reliability, robustness, safety, and ethical considerations.
Analyze evaluation results, identify model strengths and weaknesses, and provide actionable insights and recommendations for improvement to research and engineering teams.
Automate evaluation processes, build robust data pipelines, and develop tools to streamline and scale our evaluation efforts.
Collaborate closely with AI researchers, engineers, and product managers to understand model capabilities, define evaluation criteria, and integrate feedback into model development cycles.
Stay up-to-date with the latest advancements in AI evaluation techniques, benchmarks, and responsible AI practices.

What you’ll need:

Bachelor’s or Master’s degree in Computer Science, Machine Learning, Statistics, or a related quantitative field.
2+ years of experience in a quantitative evaluation role, data science, machine learning engineering, or a related field.
Demonstrated experience in evaluating machine learning models, particularly large language models (LLMs).
Proficiency in Python and experience with relevant ML frameworks (e.g., PyTorch, TensorFlow).
Strong analytical skills and experience with statistical analysis and data visualization.
Excellent problem-solving skills, attention to detail, and a commitment to rigorous testing.
Strong communication and collaboration skills, with the ability to convey complex technical information clearly to diverse audiences.
Ability to work independently and as part of a fast-paced, interdisciplinary team.
Experience with cloud platforms (e.g., AWS, GCP, Azure) and MLOps practices is a plus.

What we offer:

The opportunity to make a significant impact on the development of state-of-the-art AI.
A collaborative and intellectually stimulating work environment.
Competitive salary and benefits package.
Flexible remote work arrangements.
Opportunities for professional growth and development.

The Opportunity

What you’ll do:

What you’ll need:

What we offer:

Apply for this position

Company

Newsletter