Data Engineer · ML · Analytics
Writing about machine learning engineering, data systems, and whatever else I find interesting. 3+ years building production data pipelines at P&G. M.Sc. Applied Mathematics & Statistics.
Latest
A hands-on walkthrough of deploying a 738M parameter model to a SageMaker GPU endpoint — covering instance selection, model configuration, and the IAM setup that will silently ruin your day if you get it wrong.
Archive
Instance types, memory limits, cold start behavior — the gaps in AWS docs that cost me hours.
How to connect a Lambda function to a SageMaker endpoint with least-privilege IAM and actually handle timeouts.
CORS, throttling, and API keys — finishing the serverless MLOps pipeline so the outside world can hit your model.
More articles on Databricks, Delta Lake, medallion architecture, and production pipeline design.
Deep dives into CareerPulse and other personal projects — architecture decisions, failures, and what I learned.
Essays on whatever I find interesting — outside the data world.
Work
End-to-end medallion lakehouse pipeline ingesting live job posting data via REST APIs with incremental loading and a downstream XGBoost forecasting model tracked in MLflow.
Deployed a 738M parameter model to SageMaker and built a fully serverless inference pipeline via Lambda and API Gateway.
Led end-to-end migration of legacy enterprise pipelines to Databricks with Delta Lake, PySpark, and ACID-compliant distributed processing at scale.
I'm a Data Engineer with a background in Statistics who spent 3+ years at Procter & Gamble building production-grade data infrastructure. I write about what I'm building and learning — from MLOps and lakehouse architecture to whatever rabbit hole I fall into.
Currently seeking remote ML Engineering, Data Engineering, and Analytics Engineering roles and freelance projects.