Senior AI Engineer – Grafana Ops, AI/ML

We are seeking a highly skilled and experienced Senior AI Engineer to join our dynamic AI/ML team. This is a crucial role where you will be instrumental in designing, developing, and deploying cutting-edge AI-driven solutions to enhance Grafana’s operational intelligence and monitoring capabilities. Your work will directly impact how users interact with and gain insights from their data, pushing the boundaries of what’s possible in observability.

Responsibilities:

  • Design, develop, and deploy AI/ML models and systems for operational intelligence, anomaly detection, predictive analytics, and root cause analysis within the Grafana ecosystem.
  • Collaborate with product managers, engineers, and researchers to identify key opportunities for AI integration, translate business requirements into technical specifications, and deliver impactful solutions.
  • Build and maintain robust, scalable, and high-performance data pipelines for AI model training, inference, and continuous improvement.
  • Lead the evaluation and selection of appropriate AI/ML technologies, frameworks, and tools.
  • Drive the entire AI/ML lifecycle, from ideation and experimentation to production deployment, monitoring, and maintenance.
  • Mentor junior engineers, foster a culture of technical excellence, and contribute to the overall growth of the AI/ML team.
  • Stay abreast of the latest advancements in AI/ML, particularly in the fields of time series analysis, large language models (LLMs), and explainable AI (XAI), and apply relevant innovations to our products.
  • Contribute to the broader AI/ML community through internal presentations, blog posts, or open-source contributions.

Prerequisites:

  • Minimum of 5-7 years of professional experience in AI/ML engineering, with a strong focus on deploying production-grade systems.
  • Deep expertise in machine learning algorithms, deep learning, statistical modeling, and data science principles.
  • Proficiency in programming languages such as Python (with libraries like TensorFlow, PyTorch, scikit-learn) and/or Go.
  • Solid understanding of cloud platforms (AWS, GCP, Azure) and experience with MLOps practices (CI/CD for ML, model versioning, monitoring).
  • Experience with time-series databases (e.g., Prometheus, InfluxDB, ClickHouse) and/or observability platforms (Grafana, OpenTelemetry).
  • Familiarity with containerization technologies (Docker, Kubernetes).
  • Excellent problem-solving, analytical, and communication skills.
  • Ability to work effectively in a fast-paced, collaborative, and distributed environment.
  • A Bachelor’s or Master’s degree in Computer Science, Artificial Intelligence, Data Science, or a related technical field is highly preferred.
  • Experience with large-scale data processing frameworks (e.g., Spark, Flink) is a plus.
  • Familiarity with LLM orchestration frameworks (e.g., LangChain, LlamaIndex) is a plus.
Job Type: Hybrid
Job Location: USA
Organization: Job Hunting U

Apply for this position

Allowed Type(s): .pdf, .doc, .docx