VinSmart Future — VinGroup
May 2026 — PresentMiddle MLOps Engineer
Large-scale LLM inference infrastructure & AI platform
- Operate and maintain an enterprise-scale H200 GPU cluster (one of the largest in Vietnam) powering production LLM services
- Optimize LLM inference throughput and latency using vLLM; own the end-to-end vLLM production serving stack
- Design GPU resource scheduling, multi-tenant isolation, and capacity planning strategies for high-demand AI workloads
- Integrate DevSecOps practices across the AI/ML development lifecycle — from model CI/CD pipelines to container image security and supply-chain hardening
- Collaborate with AI research teams to productionize and reliably serve large language models at scale