Hello, I'm Mayank Vyas
Brewing Software with AI Solutions.
Work Experience

ML Research Engineer, Multimodal Systems at Coral Labs
Tempe, Arizona
- Engineered a distributed data pipeline over 160K+ tables (1.2TB) using Apache Spark and BM25 indexing with row-level chunking — reduced retrieval latency 3× and improved recall from 84% → 93% through custom tokenization and contrastive reranking
- Built SEAR, a 3-stage meta-reasoning engine that dynamically routes LLM queries (CoT, PoT, Decomposition) — outperforming 13 baselines across 8 datasets with a 92.5% HCS score on GPT-4o, Gemini, and LLaMA 70B. Accepted AACL-IJCNLP 2024
- Designed TRIM-QA, a noise-aware row pruning system using adaptive confidence thresholding — improving downstream LLM grounding with 93% Recall@10. Submitted to ACL Rolling Review
No Universal Prompt: Unifying Reasoning through Adaptive Prompting for Temporal Table Reasoning
AACL 2025
TRIM-QA: Noise-Aware Row Pruning for Table QA
Coming Soon — arXiv

Founding Software Engineer at JobMatch-AI
Tempe, Arizona (Self-Employed)
- Architected a hybrid search platform combining Elasticsearch (BM25), FAISS (ANN semantic search), and Neo4j (knowledge graph traversal) — serving <82ms median latency via GCP Cloud Run with 30+ FastAPI endpoints and a live waitlist across 2 countries
- Built end-to-end: resume parsing, LLM-based job description alignment, explainable match scoring, and an invite system with custom email templates — deployed full-stack with React frontend and Dockerized backend
- Achieved NDCG@10 = 0.81 across 1,283 job postings using LambdaMART reranking over hybrid BM25+SBERT retrieval. Submitted as first author to ACL 2026 and COLM 2026

Software Engineer, Machine Learning Architecture at Indian Institute of Information Technology
Chennai, TamilNadu
- Optimized C++ inference kernels for TinyML on Raspberry Pi — achieved 35% latency reduction (0.15ms), enabling real-time anomaly detection at 99.97% accuracy with live streaming to AWS
- Designed predictive edge filtering that reduced fog-node data transmissions by 95% and energy consumption by 40% — deployed on LoRa hardware across smart agriculture field sites
- Published 3 papers at IEEE/Springer (17+ citations) on scalable distributed IoT-ML inference — covering data aggregation, fog computing, and edge filtering algorithms
Projects

Real-time speech + text scam detection on edge hardware, no cloud dependency. Built a multimodal fusion pipeline combining Whisper Tiny (audio transcription) and XGBoost (text classification) with federated learning for on-device model updates.
- Built a multimodal fusion pipeline running at <50ms latency on constrained hardware with zero cloud calls
- Integrated federated learning for on-device model updates — preserving user privacy by ensuring no raw audio ever leaves the device
- Evaluated against adversarial and real-world audio distributions; measured false-positive rate across diverse scam speech patterns

A 4-agent orchestration pipeline using LangChain + LangGraph that autonomously prepares candidates for interviews — parsing resumes, researching companies, generating tailored questions, and building personalized study plans.
- Architected planner/executor agent loops with shared memory and tool routing across 4 specialized agents
- Agents autonomously scrape company data, parse resumes, generate role-specific questions, and build study plans
- Configurable fallbacks and kill-switches for reliable multi-source knowledge routing

Transform educational questions into interactive, story-based visualizations using AI. Features a 4-layer pipeline that intelligently routes content from documents (PDF/DOCX) to 18 distinct game templates with intelligent caching and real-time progress tracking.
- 1st Place Winner at HackASU 2025 (Anthropic Sponsored)
- Built 18 game templates with template-aware story generation
- Implemented intelligent caching reducing processing time by 80%

Research project investigating efficiency, scalability, and linguistic adaptability of Fine-Tuned LLMs for code generation. Explores LoRA rank optimization, data scaling effects, and cross-language generalization using GPT-2.
- Identified optimal LoRA rank 16 achieving 30% syntax pass rate
- Discovered "Complexity Trap" in data scaling behavior
- Demonstrated language-agnostic learning across Python, Java, JavaScript

Python pipeline to convert inspection JSON data into populated TREC (Texas Real Estate Commission) HTML reports. Features smart mapping across 6 TREC sections, automatic empty section removal, and proper formatting for comments, images, and videos.
- Automated mapping of line items to TREC sections I-VI
- Smart filtering removes empty sections automatically
- Proper media embedding with images and video controls

Power BI dashboard with DAX measures and advanced data modeling for actionable business intelligence.
- Achieved 99.8% accuracy in YoY growth calculations
- Reduced query time by 40% through star schema optimization
- Revealed $1.2M revenue opportunity via geo-spatial analysis

Automated data extraction and real-time visualization pipeline for Intel's retail edge computing platform. Built Python scripts to extract metrics from results logs and publish to an MQTT broker, integrated with Grafana dashboards via the MQTT plugin. Created custom Docker images for Grafana and MQTT configured to communicate on the same Docker network using Docker Compose.
- Reduced MTTR by 73% through custom alerting
- Implemented JWT-based OAuth 2.0 with RBAC for SOC2 compliance
- Automated data extraction pipeline with real-time MQTT streaming

This project is part of a Bachelor's Research Thesis, aiming to detect and segment primary roots in plant images using a customized version of the Mask R-CNN model adapted for TensorFlow 2.0 and Keras 2.2.8. The original codebase from Matterport's Mask R-CNN was modified for compatibility and to support training and inference on annotated root datasets.
- Achieved 96.5% IoU accuracy through transfer learning
- Reduced annotation workload by 90%
- Published in Springer's CV in Plant Phenotyping conference

This projects implementation of a Multi-Layer Perceptron (MLP) from scratch using Python. It demonstrates the fundamental concepts of building and training a neural network, including forward propagation, backward propagation, and parameter optimization.
- Achieved 92% accuracy on MNIST using only NumPy
- Implemented automatic differentiation for gradients
- Created interactive weight matrix visualizations
About Me
A glimpse into my journey, passions, and the adventures that shape who I am
Hey there! I'm Mayank, a Master's student in Data Science at Arizona State University. My story is one of curiosity-driven pivots and bold decisions.
I started my academic journey with a Bachelor's in Electrical Engineering from IITRAM, where I got hands-on experience with electrical machines and power systems. But somewhere along the way, I found myself increasingly fascinated by Machine Learning, AI, and distributed systems.
Before ASU, I spent one and a half years at IIITDM Kancheepuram as a research intern, diving deep into computer vision and deep learning. Then came a crossroads: a full-time offer from Micron as a Process Engineer. It was a secure path, but my heart was set on something different.
I took the leap—declining the offer to pursue my Master's at ASU, betting on myself to validate and deepen my expertise in the AI/ML domain. And honestly? It's been the best decision I've made.
Languages
AI/ML
Web & Frameworks
Infrastructure
Databases
Tools
I build production ML systems — and then write papers about what I learned building them.
On the engineering side: a hybrid job-matching platform live on GCP (30+ endpoints, <82ms latency), a 1.2TB table retrieval pipeline achieving 93% Recall@10, and an on-device scam detection system running under 50ms on edge hardware without cloud dependency.
On the research side: 5 publications across IEEE, Springer, AACL, and ACL venues. 17+ citations. Thesis defense May 2026.
My stack: Python, C++, TypeScript — FastAPI backends, React frontends, Dockerized deployments on GCP and AWS. On the ML side: PyTorch, HuggingFace, LangChain, FAISS, Neo4j, Elasticsearch.
Building innovative solutions under pressure - here are some memorable hackathon moments.
Gamify
Zoom App Hackathon
Developed a Zoom application leveraging real-time transcription to automatically generate interactive quizzes using Gemini AI with seamless platform integration.
TwinGenius
Devils Invent - Honeywell & ASU
Revolutionized industrial digital twin creation by generating complete environments from natural language prompts in under 60 seconds using Gemini AI and AWS IoT TwinMaker.
Life isn't just about algorithms and neural networks (though I do love those!). I believe in living fully and finding joy in diverse experiences.
Hiking & Trekking
Arizona's trails are my weekend therapy
Pool & Golf
Precision sports that clear my mind
Photography
Capturing moments and landscapes
Road Trips
Exploring the American Southwest

Golf days

8-ball enthusiast

Mountain adventures

Good times with great people
"युक्तः कर्मफल त्यक्त्वा।"
"Give your best without obsessing over results. Let go and trust."
— My guiding philosophy
When I'm not coding, I'm exploring the beautiful landscapes of the American Southwest. From the majestic Grand Canyon to the iconic Golden Gate Bridge, every destination teaches me something new.

Grand Canyon, Arizona

That's me!

San Francisco, California

Walking the iconic bridge

Water Wheels Bridge, Payson
GitHub Contributions
contributions in 2026
Education

Master of Science in Data Science
Aug 2024 - May 2026
Tempe, Arizona

Bachelors of Science in Electrical Engineering
Aug 2020 - May 2024
Ahmedabad, India
What People Say
Testimonials from colleagues and collaborators I've had the pleasure of working with

"Mayank is an outstanding software engineer with strong hands-on experience in Python, distributed systems, and applied AI/ML. I’ve closely mentored him over the past year and consistently seen his ability to translate theory into real-world systems. His recent publication at AACL, “No Universal Prompting,” highlights his depth in prompt engineering and his practical understanding of modern AI workflows. Beyond research, Mayank excels at rapid execution, most notably winning HackASU, where he built a full-fledged education platform leveraging prompt engineering and software engineering skills in a single night, securing first place.He is highly adaptable, receptive to feedback, and consistently adds value to any team he works with. I strongly believe Mayank has the technical ability and mindset to thrive in fast-paced, high-impact engineering environments."

"I had the pleasure of working with Mayank Vyas during the Intel Open Source Hackathon, and I couldn't have asked for a better teammate. We were tackling an issue that involved visualizing real-time machine configuration data using MQTT and Grafana Docker, and Mayank jumped right in with his problem-solving mindset and enthusiasm. What really stood out to me was his curiosity and dedication—even after the hackathon ended, he kept working on the issue, not because he had to, but because he genuinely wanted to learn more. That kind of passion is rare and speaks volumes about his approach to technology and innovation. Beyond his technical skills, Mayank is a fantastic collaborator—always open to ideas, eager to experiment, and ready to help. I'd highly recommend him to anyone looking for a proactive, skilled, and passionate team player!"

"I've worked with Mayank on several projects during my master's program and on current work, and it's been a great experience. He's strong with data tools, handles analysis and communication confidently, and is especially skilled in machine learning. What stands out is his ability to understand models deeply and apply them thoughtfully to real problems. He's reliable, collaborative, and easy to work with, making him a valuable addition to any data or ML-focused team."

"It was a good experience working with him. He was easy to talk to, responded quickly, and delivered things on time. I was impressed by how fast he was able to experiment with five prompting techniques across eight datasets and three models in a very short span of time. One piece of feedback I would share, based on our single project together, is that it might help to slow down slightly at times and not rush through the work."

"Mayank was a standout teammate during our hackathon, bringing together strong AI insight and solid software engineering skills. What impressed me most was his refusal to give up. He consistently went the extra mile to ensure the project worked end to end."

"Working with you has been a great experience. You are professional, dependable, and communicate with clarity and ease. You’re always approachable and open to feedback, which makes collaboration smooth and effective. Your reliability and positive attitude truly stand out."

"Mayank is genuinely one of the most helpful people I’ve worked with. He’s dependable, collaborative, and always ready to step in whether it’s brainstorming, debugging, or guiding the team through challenges. Working with him has been a smooth and positive experience, and I’d happily collaborate with him again."

"I had the pleasure to work on a research project with Mayank. He was always thinking outside the box and going the extra mile to create tangible solutions. He is also a great leader and easy to get along with."



