AI/ML/DevOps Engineer
Abu Dhabi, UAE
Own the MLOps platform that powers enterprise AI at scale.
A leading Abu Dhabi-based holding group is hiring an AI/ML/DevOps Engineer to architect, operate, and continuously improve the end-to-end MLOps and LLMOps platform for a flagship enterprise AI programme. You'll be the technical authority reviewing, governing, and signing off CI/CD, data and model pipelines, infrastructure, deployment, security, and observability — ensuring secure, scalable, and compliant delivery across environments. Reports to the AI Product Manager within the AI Excellence Centre.
What you'll own:
-
Own the end-to-end MLOps/LLMOps reference architecture: ingestion → validation → feature and embedding pipelines → training and fine-tuning → evaluation → registry → deployment → monitoring — including RAG and agentic workflows.
-
Architect, review, and approve CI/CD for ML and LLM systems: code, data, prompt, and model artifact versioning; build and release pipelines (Azure DevOps / GitHub Actions); automated unit, integration, and contract testing; and promotion/rollback (blue-green / canary) across dev, test, and production.
-
Define and govern AI platform foundations on Azure: IaC (Bicep/Terraform), AML workspaces, AKS GPU node pools and scheduling, private networking (VNet integration / Private Link), identity (Managed Identities / PIM), secrets (Key Vault), and encryption and data residency controls.
-
Review and approve production deployment patterns for model and LLM serving (AKS / KServe / AML online endpoints), including containerization, inference optimization (batching, quantization where applicable), API management, autoscaling, resiliency, and RAG runtime components (vector store, retriever, re-ranker, cache).
-
Own observability and reliability for AI services: OpenTelemetry tracing, prompt and inference logs (with PII controls), latency/throughput/cost metrics, SLOs/SLIs, model performance monitoring, data and model drift detection, and LLM evaluations (quality, hallucination checks, toxicity and safety guardrails) with incident playbooks.
-
Establish and enforce MLOps/LLMOps governance: dataset lineage, data quality validation (schema and tests), feature store and model registry standards, artifact provenance (SBOM/SLSA), vulnerability scanning, approval gates for model and prompt releases, and compliance-aligned documentation for model risk (intended use, limitations, evaluation results).
-
Enable delivery squads — including the primary delivery partner — with "golden path" templates (AML pipelines, RAG blueprints, evaluation harnesses), reusable IaC modules, and coding standards; run deep technical design and architecture reviews and sign off production readiness (capacity, security, observability, DR) for all AI releases.
-
Support the Run & Operate model by enabling issue triage and minor enhancement workflows (ticket intake → fix → controlled release), ensuring changes follow the same release governance and quality gates.
-
Own the Operational Acceptance Gate: no production release without runbooks, monitoring dashboards, incident playbooks, access model, and DR test evidence.
Scope clarity: you provide platform standards, review, and sign-off — you do not replace the delivery partner's engineering, but you enforce the "golden path" and production readiness bar.
What you bring:
-
8–10 years across DevOps, SRE, and/or ML Engineering with production systems on Azure.
-
Hands-on experience with Azure ML, AKS, Azure DevOps or GitHub Actions, IaC, and containerization.
-
Bachelor's in Computer Science, Engineering, or equivalent experience.
Core skills required:
-
Python, YAML, Docker, Helm, KQL; GitOps (Argo/Flux) awareness.
-
Security in CI/CD: SAST/DAST, supply-chain security (Sigstore), secrets management (Key Vault).
-
Performance testing (k6 / JMeter), contract testing, and E2E testing.
-
Cost optimization and capacity planning for GPU and CPU workloads.
-
Strong grasp of model serving, inference optimization, and observability tooling.
Required certification:
-
Microsoft Certified: DevOps Engineer Expert (AZ-400)
Preferred certifications:
-
Microsoft Certified: Azure Administrator (AZ-104) or Solutions Architect (AZ-305)
-
CKA or CKAD (Kubernetes)
Location: Abu Dhabi, UAE
Employment Type: Permanent, Full-time
Experience: 8–10 years
Salary Range: 25,000 - 33,000 (AED per month)

