Huynh Thien Tung

MLOps / Cloud / DevOps Engineer

Building and operating large-scale AI/ML infrastructure — from GPU cluster management and LLM inference optimization to cloud-native DevOps and a security-first mindset.

Ho Chi Minh City, Vietnam huynhthientung.dev@gmail.com LinkedIn GitHub

Explore

Docs

Technical references, runbooks, and engineering notes

docs.huynhthientung.com

Blogs

Writing on DevOps, cloud, and software engineering

blogs.huynhthientung.com

EnglishPod

Curated audio resources for English listening practice

englishpod.huynhthientung.com

Experience

VinSmart Future — VinGroup

May 2026 — Present

Middle MLOps Engineer

Large-scale LLM inference infrastructure & AI platform

Operate and maintain an enterprise-scale H200 GPU cluster (one of the largest in Vietnam) powering production LLM services
Optimize LLM inference throughput and latency using vLLM; own the end-to-end vLLM production serving stack
Design GPU resource scheduling, multi-tenant isolation, and capacity planning strategies for high-demand AI workloads
Integrate DevSecOps practices across the AI/ML development lifecycle — from model CI/CD pipelines to container image security and supply-chain hardening
Collaborate with AI research teams to productionize and reliably serve large language models at scale

ITRVN

Jun 2025 — Apr 2026

Cloud / DevOps Engineer

US-based digital health & wearable medical devices platform

Operate EKS clusters and CI/CD pipelines for a US production healthcare platform
Built a Golang-based AI tool that detects CVEs and assigns remediation tasks to developers
Led AWS account and data migration for a $20K/month production environment
Deployed AI/ML workloads on Kubernetes; ran an internal seminar on AI Agentic systems
Maintained cloud data pipelines and campaign delivery systems in production
Enforced security and compliance practices; optimized performance and cloud cost

Splus Software

Oct 2022 — May 2025

Backend / Cloud / DevOps Engineer

Semiconductor machine control system — outsourcing for a Japanese customer

Implemented REST and WebSocket APIs for a next-gen semiconductor control platform
Designed serverless AWS architecture and CI/CD pipelines; cut pipeline cost by ~50%
Built CI/CD infrastructure on AWS CodeBuild and CodePipeline from HLD to production
Automated internal tooling: budget alerts, CloudTrail-to-Slack alerts, and software installers
Managed AWS cost and resource hygiene across 6 production accounts
Maintained legacy C++ desktop GUI software for machine control reliability

Splus Software

Jun 2022 — Apr 2024

Team Leader

Web scraping system for Japanese real estate and maps data

Defined system architecture, crawling strategy, and database schema for the platform
Built an extensible, maintainable codebase shared across all crawlers
Engineered techniques to bypass rate limiting and bot detection at scale
Delivered a scalable system collecting Japanese real estate and maps data

Skills

MLOps & AI Infrastructure: vLLM LLM Serving GPU Cluster Management NVIDIA H200 CUDA Model CI/CD Inference Optimization Multi-tenant GPU Scheduling
Cloud (AWS): EC2 EKS EFS Lambda S3 RDS Aurora DynamoDB TimestreamDB CloudFront WAF IAM Cognito Pinpoint SQS SNS API Gateway CodeBuild CodePipeline
Containers & Orchestration: Docker Kubernetes (EKS) Helm Karpenter ArgoCD
Infrastructure as Code: AWS CDK (Golang, Python) CloudFormation Pulumi Terraform
CI/CD & Automation: GitHub Actions Jenkins Bash scripting AWS CodePipeline
Languages: Golang Python C++ (11/14/17) Bash JavaScript / Node.js
Databases: PostgreSQL MySQL MongoDB DynamoDB TimestreamDB Aurora RDS
Observability: CloudWatch Prometheus Grafana ELK Stack

Education

University of Information Technology — VNU

Sep 2019 — Nov 2023

Bachelor of Computer Science, specialization in Machine Learning & AI

Graduated with Good degree, GPA 87.1%
Awarded three merit-based scholarships for academic excellence
Full-tuition scholarship for one semester (2021, COVID-19 relief)

Certifications

AWS Solutions Architect — Associate verify ↗

Amazon Web Services

In progress

CKA AWS SAP IELTS 6.0

English

TOEIC — Reading & Listening 690, Speaking & Writing 250