Siemens - LLM Infrastructure & AWS Bedrock Integration
→
Summary
Designed and deployed a scalable, secure LLM infrastructure on AWS, integrating with Bedrock for advanced AI capabilities and optimizing performance for diverse user scenarios.
Highly accomplished Senior MLOps Engineer with over 5 years of experience in designing, deploying, and optimizing robust AI/ML infrastructure, including expertise in Large Language Models (LLMs), GPU resource management, and CI/CD pipelines. Proven leader in driving complex AI initiatives, enhancing development environments, and delivering scalable, production-ready solutions that ensure stable customer deployments and measurable business impact. Adept at leveraging advanced tools and frameworks to streamline MLOps workflows and elevate AI system performance.
Senior MLOps Engineer
Cairo, Egypt, Egypt
→
Summary
Currently leading MLOps initiatives and developing advanced AI infrastructure, including LLM and multi-agent systems, to enhance development environments and deployment stability.
Highlights
Spearheaded MLOps initiatives across the DVT business unit, significantly improving development environments and ensuring stable customer deployments for critical AI projects.
Architected and managed robust GPU infrastructure for AI/ML workloads, establishing GPU-enabled environments with Docker/Docker-Swarm and a Jenkins pipeline for optimized resource reservation.
Developed a Python Flask-based backend to collect GPU performance data from Docker containers, integrating with Prometheus and Grafana for real-time visualization and alerting.
Deployed and fine-tuned various LLMs, including a production-ready model on AWS with Auto Scaling Groups (ASG) and Application Load Balancers (ALB), ensuring scalability and high availability.
Contributed to an internal Retrieval-Augmented Generation (RAG) framework, leveraging HashiCorp Consul for service mesh and HashiCorp Vault for secure secret management.
Implemented CI/CD pipelines for RAG-based projects and established a central LLM proxy using LiteLLM, optimizing endpoint management, RBAC, token allocation, and rate limiting.
Deployed an internal model registry using MinIO with automated HuggingFace pull pipelines and an LLM data registry with Langfuse, streamlining fine-tuning and evaluation experiments.
Collaborated on building a multi-agent system for tools orchestration using OpenAI Agent SDK and Model Context Protocol (MCP), integrating various LLMs (Qwen, DeepSeek, Claude-Sonnet) for specialized tasks.
MLOps Engineer
Cairo, Egypt, Egypt
→
Summary
Led a 6-engineer team in designing and implementing end-to-end machine learning project lifecycles and developing automation frameworks for continuous training and deployment.
Highlights
Led a team of 6 engineers to design and implement end-to-end machine learning project lifecycles, utilizing DVC, MLFlow, JFrog Artifactory, MinIO, and CVAT for efficient model and data management.
Developed an automated annotation framework for CVAT and a continuous training framework, leveraging Airflow, MLFlow, and Jenkins, significantly enhancing data preparation and model iteration speed.
Engineered and optimized automation pipelines using Jenkins and Azure DevOps Pipelines, streamlining deployment processes and improving operational efficiency.
Computer Vision Engineer
Cairo, Egypt, Egypt
→
Summary
Developed and optimized computer vision systems for driver safety, focusing on data collection, deep learning model training, and performance benchmarking on edge devices.
Highlights
Developed a comprehensive computer vision system for driver safety, encompassing data collection, filtration, and literature review to establish a robust foundation.
Trained and optimized deep learning models for driver behavior detection, benchmarking performance on edge devices such as Nvidia AGX and Jetson Nano for real-world deployment.
Utilized TensorFlow for model development, W&B for experiment tracking, and TensorRT for optimization, improving model efficiency and deployment readiness.
Computer Vision Engineer
Cairo, Egypt, Egypt
→
Summary
Co-developed vision-based solutions and optimized deep learning systems for edge inference, contributing to advancements in person detection and classification.
Highlights
Co-developed vision-based solutions for age and gender classification and person detection, enhancing the capabilities of AI-driven analytics.
Implemented a robust pipeline for generating synthetic datasets, significantly expanding training data availability and diversity for deep learning models.
Optimized deep learning systems for efficient edge inference, improving real-time performance and deployment on resource-constrained devices.
→
B.Sc.
Electronics and Electrical Communications Engineering
Grade: Very Good
MLOps, LLMOps, RAG, DevOps, Machine Learning, Computer Vision.
Python, C++, SQL, Bash.
Tensorflow, MLFlow, vLLM, LiteLLM, Prometheus, Grafana, HashiCorp Consul, HashiCorp Vault, MINIO, Jenkins, DVC, Ansible, Proxmox, Airflow, W&B, Azure DevOps Pipelines, Hawkbit, BentoML, Flask, Docker, Kubernetes, Artifactory, CVAT, Git, OpenVino, TensorRT, dbt, PostgreSQL, Trino, Apache Superset.
→
Summary
Designed and deployed a scalable, secure LLM infrastructure on AWS, integrating with Bedrock for advanced AI capabilities and optimizing performance for diverse user scenarios.
→
Summary
Led the deployment and continuous improvement of an AI-based smart identification system across 25+ stations in the UAE, enhancing operational efficiency and management.
→
Summary
Developed and optimized a road quality inspection system, focusing on packaging, containerization, and OTA updates for a fleet of vehicles to ensure continuous system performance.