As the Engineering Manager for Database Reliability, Scalability & Operations, you will be a key leader in ensuring the robustness, performance, and scalability of our core database infrastructure. You will lead a talented team of database reliability engineers, driving strategic initiatives and fostering a culture of excellence in database management within GitLab.
What you’ll do
- Lead and mentor a team of talented database reliability engineers, fostering a culture of technical excellence, collaboration, and continuous improvement.
- Drive the strategic direction and roadmap for database reliability, scalability, and operations, aligning with the overall engineering and product goals.
- Oversee the design, implementation, and maintenance of robust, scalable, and highly available database systems (e.g., PostgreSQL, Redis, ClickHouse).
- Develop and implement proactive monitoring, alerting, and incident response strategies to ensure optimal database performance and minimize downtime.
- Collaborate closely with product, development, and infrastructure teams to ensure database best practices are integrated throughout the software development lifecycle.
- Manage database capacity planning, performance tuning, and optimization efforts to support GitLab’s growth and evolving needs.
- Evaluate and implement new database technologies and tools to enhance efficiency, scalability, and reliability.
- Drive continuous improvement initiatives, including automation, disaster recovery planning, and database security best practices.
- Participate in on-call rotations and provide leadership during critical incidents, ensuring swift resolution and root cause analysis.
- Foster a strong learning environment, encouraging professional development and knowledge sharing within the team.
What you’ll bring
- Minimum 5 years of experience in database reliability engineering, operations, or a related field, with a strong focus on large-scale distributed systems.
- Minimum 2 years of experience in a leadership or management role, leading a team of database engineers or SREs.
- Extensive experience with PostgreSQL, including advanced administration, performance tuning, replication, and high availability solutions.
- Solid understanding of cloud platforms (e.g., GCP, AWS, Azure) and their database offerings.
- Experience with infrastructure-as-code tools (e.g., Terraform) and configuration management (e.g., Chef, Ansible).
- Proficiency in scripting languages (e.g., Python, Go, Ruby) for automation and tooling.
- Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack).
- Demonstrated ability to drive complex technical projects to successful completion.
- Strong leadership, communication, and interpersonal skills, with the ability to influence and collaborate effectively across all levels of the organization.
- Deep understanding of database internals, query optimization, and data modeling best practices.
- Familiarity with various database technologies beyond PostgreSQL, such as Redis, ClickHouse, or other NoSQL databases.
- Experience with continuous integration/continuous deployment (CI/CD) pipelines and methodologies.
- Knowledge of database security best practices and compliance requirements.
- A passion for open-source and a willingness to contribute to the community.
- Ability to thrive in a fast-paced, remote-first environment.