Own end-to-end architecture for GenAI apps (RAG, agents, fine-tuning) on GCC-hosted cloud. Translate use-cases into secure, performant designs across model hosting, vector DBs, data pipelines, and observability.
Key responsibilities
Design reference architectures for LLM inference/fine-tuning (managed endpoints, GPUs, autoscaling, token/cost controls).
Land data-residency-compliant stacks (object stores, vector DB, feature stores) and govern security (KMS, VPC, zero-trust).
Build retrieval (RAG) pipelines: chunking, embeddings, guardrails, eval frameworks; set SLOs for latency/cost/quality.
Benchmark models (open/commercial), select providers, and negotiate GPU/compute reservations with FinOps discipline.
Create IaC blueprints (Terraform) and golden paths for app teams; handover to platform/SRE for production.
Set up monitoring & evals (hallucination, toxicity, grounding) and incident runbooks.
Candidate profile
8–12+ yrs cloud architecture; 2–3+ yrs hands-on GenAI/LLM delivery.
Depth in one major cloud + GPU/accelerator stacks; strong with vector DBs, RAG patterns, model gateways.
Proven security, compliance, and cost-controls in regulated MENA contexts.
Compensation
Tax-free base + bonus; housing/transport allowances; family medical; visa/relocation.