AI/ML/DevOps Engineer

Abu Dhabi, UAE

Own the MLOps platform that powers enterprise AI at scale.

 

A leading Abu Dhabi-based holding group is hiring an AI/ML/DevOps Engineer to architect, operate, and continuously improve the end-to-end MLOps and LLMOps platform for a flagship enterprise AI programme. You'll be the technical authority reviewing, governing, and signing off CI/CD, data and model pipelines, infrastructure, deployment, security, and observability — ensuring secure, scalable, and compliant delivery across environments. Reports to the AI Product Manager within the AI Excellence Centre.

 

What you'll own:

  • Own the end-to-end MLOps/LLMOps reference architecture: ingestion → validation → feature and embedding pipelines → training and fine-tuning → evaluation → registry → deployment → monitoring — including RAG and agentic workflows.

  • Architect, review, and approve CI/CD for ML and LLM systems: code, data, prompt, and model artifact versioning; build and release pipelines (Azure DevOps / GitHub Actions); automated unit, integration, and contract testing; and promotion/rollback (blue-green / canary) across dev, test, and production.

  • Define and govern AI platform foundations on Azure: IaC (Bicep/Terraform), AML workspaces, AKS GPU node pools and scheduling, private networking (VNet integration / Private Link), identity (Managed Identities / PIM), secrets (Key Vault), and encryption and data residency controls.

  • Review and approve production deployment patterns for model and LLM serving (AKS / KServe / AML online endpoints), including containerization, inference optimization (batching, quantization where applicable), API management, autoscaling, resiliency, and RAG runtime components (vector store, retriever, re-ranker, cache).

  • Own observability and reliability for AI services: OpenTelemetry tracing, prompt and inference logs (with PII controls), latency/throughput/cost metrics, SLOs/SLIs, model performance monitoring, data and model drift detection, and LLM evaluations (quality, hallucination checks, toxicity and safety guardrails) with incident playbooks.

  • Establish and enforce MLOps/LLMOps governance: dataset lineage, data quality validation (schema and tests), feature store and model registry standards, artifact provenance (SBOM/SLSA), vulnerability scanning, approval gates for model and prompt releases, and compliance-aligned documentation for model risk (intended use, limitations, evaluation results).

  • Enable delivery squads — including the primary delivery partner — with "golden path" templates (AML pipelines, RAG blueprints, evaluation harnesses), reusable IaC modules, and coding standards; run deep technical design and architecture reviews and sign off production readiness (capacity, security, observability, DR) for all AI releases.

  • Support the Run & Operate model by enabling issue triage and minor enhancement workflows (ticket intake → fix → controlled release), ensuring changes follow the same release governance and quality gates.

  • Own the Operational Acceptance Gate: no production release without runbooks, monitoring dashboards, incident playbooks, access model, and DR test evidence.

 

Scope clarity: you provide platform standards, review, and sign-off — you do not replace the delivery partner's engineering, but you enforce the "golden path" and production readiness bar.

 

What you bring:

  • 8–10 years across DevOps, SRE, and/or ML Engineering with production systems on Azure.

  • Hands-on experience with Azure ML, AKS, Azure DevOps or GitHub Actions, IaC, and containerization.

  • Bachelor's in Computer Science, Engineering, or equivalent experience.

 

Core skills required:

  • Python, YAML, Docker, Helm, KQL; GitOps (Argo/Flux) awareness.

  • Security in CI/CD: SAST/DAST, supply-chain security (Sigstore), secrets management (Key Vault).

  • Performance testing (k6 / JMeter), contract testing, and E2E testing.

  • Cost optimization and capacity planning for GPU and CPU workloads.

  • Strong grasp of model serving, inference optimization, and observability tooling.

 

Required certification:

  • Microsoft Certified: DevOps Engineer Expert (AZ-400)

 

Preferred certifications:

  • Microsoft Certified: Azure Administrator (AZ-104) or Solutions Architect (AZ-305)

  • CKA or CKAD (Kubernetes)

 

Location: Abu Dhabi, UAE

Employment Type: Permanent, Full-time
Experience: 8–10 years

Salary Range: 25,000 - 33,000 (AED per month)

Apply Now
Book An Appointment