Data Engineer · AI Builder · Analytics Storyteller
I build data pipelines and AI systems that ship to production. Previously at Five EDC Architects and Tata Consumer Products — cutting reporting time by 65%, hitting 99.9% forecast accuracy, and improving recommendation accuracy by 75%.
I'm a Data Engineer with 2+ years of experience building ETL pipelines and AI systems in production. Currently completing my MS in Business Analytics at UMass Amherst (GPA: 4.0).
My engineering stack is Databricks, Apache Spark, Delta Lake, and Airflow. On the AI side I work with LangChain, RAG pipelines, OpenAI API, and MLflow. I write clean SQL, Python, and PySpark and care about the full pipeline — from raw ingestion to the dashboard a stakeholder actually uses.
At Five EDC Architects I designed SQL/Databricks ETL workflows and built an AI knowledge graph on SCM data. At Tata Consumer Products I processed 1M+ sales records and deployed ThoughtSpot dashboards that cut reporting time by 45%.
End-to-end pipeline on Databricks — K-Means gap analysis across 140+ countries, GPT-4 LLM recommendations via LangChain, Streamlit Q&A chatbot, and Power BI DAX dashboard for content investment strategy.
Multi-label activity classification on the ExtraSensory dataset (60 users, 300K+ samples). 94% accuracy with Random Forest, outperforming Logistic Regression by 17% F1. Statistical QA via t-test, Chi-square, ANOVA.
Advanced statistical analysis of ~9.6K records. Log-log regression for price elasticity (β₁ ≈ –1.8), K-Means segmentation, COVID-19 time-series, geo-spatial COGS heatmaps revealing $180M cost reduction opportunity.
Data-driven phased deployment model for Amazon Prime Air across Massachusetts. Mapped 56 hubs across 14 counties covering ~5.1M residents using GIS, FAA airspace analysis, and Python geospatial modeling.
IJCA-published research classifying 7 skin cancer types from 10K+ dermoscopic images. ResNet18 at 85% accuracy outperforms MobileNet (83.1%). CLAHE preprocessing, GLCM features, Grad-CAM interpretability.
Custom CNN classifying 4 Alzheimer stages from 5,121 MRI images. 93.7% accuracy — outperforming DBN (91%) and Multi-Kernel Learning (93.5%). SMOTE class balancing, SeparableConv2D for efficiency.
HR analytics dashboard with advanced DAX — SWITCH() age banding, SUMX() bonus expense, pay equity area charts. KPI cards for headcount, salary, compensation. Enables DEI monitoring and strategic workforce planning.
Automated inventory system — YOLOv8 at 97.26% mAP for product detection, Siamese Network at 95% accuracy for SKU classification, 93% shelf emptiness detection on SKU110K (1.7M+ annotations).
What sets Shruti apart is not just technical expertise, but a genuine curiosity and a problem-solving mindset. She is always ready to dive deep into a challenge, bring fresh ideas to the table, and follow through with consistent execution. Any team would be lucky to have her on board.
Shruti has been a brilliant performer in the field of Content Marketing at Step Up Student. Her genuine interest and involvement in every task has helped us grow socially! I believe this recommendation would help her stay creative and inspired throughout the whole career.
If you have a role, project, or research opportunity you think I'd be a good fit for, feel free to reach out. I check email daily.