Hello, I'm Mayank Vyas
Brewing Software with AI Solutions. βοΈ
As a Machine Learning Researcher, I leverage Natural Language Processing and Large Language Models to build impactful AI solutions.
Professional Experience

Arizona State University
Developing efficient Table retrieval RAG pipeline to reduce user query latency and better inference.
- π Served 500+ users- a RAG prototype on 160K+ NQ tables with 10s query time and 98% retrieval accuracy.
- π Built a ranking algorithm using S-Bert to rank query specific gold tables with 98% Accuracy.
- π Conducting research on improving document question-answering pipelines using sparse and learned embeddings (SPALDE) and Contrastive learning techniques for more accurate retrieval and reduced context noise.
- π Working on hierarchical chunking methods to optimize embeddings and improve information retrieval recall.
- π Designing pruning algorithms to discard irrelevant table segments for better recall and reduced hallucinations
Improved data processing and analysis for IoT applications.
- π IoT Infrastructure Development: Engineered a LoRa-based fog computing framework for smart agriculture, reducing sensor energy consumption by 40% and optimizing data transmission using regression models.
- π Data Efficiency: Deployed APAEs (Analytical Prediction Algorithm) across edge-fog-cloud layers, cutting data transmissions by 93.6% while maintaining <10% MAE.
- π System Integration: Streamlined sensor data collection (temperature, humidity, soil moisture) using Arduino and LoRa, achieving 98% irrigation efficiency.
Publications:


Indian Institute of Information Technology Design & Manufacturing Kancheepuram

Indian Institute of Information Technology Design & Manufacturing Kancheepuram
Developed an IoT machine learning framework using TensorFlow Lite and Decision Trees, enabling real-time actuation during internet outages with minimum transmission costs.
- π Designed a Regressive Prediction Data Forwarding Model (RPDM) using TensorFlow Lite, reducing bandwidth usage by 85% in IoT networks.
- π Achieved 99.97% prediction accuracy with Decision Trees, enabling real-time actuation on edge devices during internet outages.
- π Implemented lightweight model compression for deployment on Raspberry Pi/Arduino, reducing power consumption by 82.89%.
Developed IoT data aggregation and real-time monitoring systems, optimizing efficiency and publishing findings in IEEE AINA 2023
- π Designed a Ward's method clustering algorithm to compress IoT sensor data by 57.39%, deployed on fog nodes to reduce cloud transmission costs by 38%.
- π Integrated with The Things Network, achieving 1.1s latency for real-time field monitoring, improving response time by 35% over traditional cellular networks.
- πPublished in IEEE AINA 2023 and tested on a 20-acre testbed, cutting energy consumption by 82.89% at tolerance thresholds (Ξ΅=1.0)

Indian Institute of Information Technology Design & Manufacturing Kancheepuram
Featured Projects
Showcasing innovative solutions that blend cutting-edge technology with real-world impact
π Enterprise Sales Analytics Dashboard
Developed an enterprise-grade Power BI dashboard implementing DAX measures and advanced data modeling techniques to transform raw sales data into actionable business intelligence. The solution features multi-dimensional analysis capabilities with drill-through functionality for granular insights.
π Intel Automated Checkout System (OSS Contribution)
Engineered a microservices-based observability solution for Intel's retail edge computing platform that processes real-time computer vision data. Implemented comprehensive telemetry capturing CPU/GPU utilization, inference latency, and throughput metrics critical for retail deployment reliability.
π± MaskRoot: Computer Vision for Agricultural Phenomics
Engineered an instance segmentation pipeline utilizing Mask R-CNN architecture to automate root phenotyping at scale. The system overcomes occlusion challenges through a custom-designed loss function and transfer learning from MS COCO weights to compensate for limited agricultural training data.
π‘ DASA: Distributed Agricultural Sensing Architecture
Designed a hierarchical IoT architecture leveraging LoRaWAN's low-power wide-area network capabilities for agricultural monitoring in remote areas. Implemented a novel fog computing layer using edge devices to perform data preprocessing, anomaly detection, and compression before cloud transmission.
π¦ Deep Reinforcement Learning for Urban Traffic Control
Developed an adaptive traffic signal control system using Deep Q-Networks (DQN) in the SUMO traffic simulation environment. The system leverages vehicle-to-infrastructure (V2I) communication to optimize traffic flow based on real-time density and waiting time metrics.
π§ Multi-Layer Perceptron Implementation from First Principles
Built a neural network framework from mathematical foundations without reliance on deep learning libraries. Implemented forward propagation, backpropagation, gradient descent optimization, and regularization techniques to demonstrate core principles of neural computation.
πΆ RPDM: Resource-efficient Predictive Decision Model for IoT
Designed an ultra-lightweight machine learning inference system for resource-constrained IoT devices that optimizes when to transmit sensor data based on predictive value. The framework uses model quantization and pruning techniques to enable ML on microcontrollers with severe memory constraints.
π οΈ Scalable Data Processing Pipeline for Time-Series Analytics
Architected a distributed ETL pipeline for processing high-frequency sensor data from industrial equipment. The system handles data ingestion, cleansing, transformation, and aggregation while maintaining data lineage for regulatory compliance and audit purposes.
π Geospatial Market Intelligence Platform for Tucson Businesses
Developed a comprehensive market intelligence platform integrating geospatial, demographic, and economic data sources to identify growth patterns and market opportunities in Arizona's urban centers. Utilized advanced spatiotemporal analysis to reveal hidden business patterns.
Hackathon Adventures
Agentic AI Hackathon
Software Development club at ASU
Built an Agentic AI system using Langchain and LangGraph for automated, personalized interview prepβcut manual effort by 90% via modular orchestration, relevant question generation, and evaluation with feedback.
Achieved 90% user satisfaction by generating personalized prep plans and using LLMs for response evaluation, creating a comprehensive interview preparation ecosystem.
Implemented intelligent question generation algorithms that adapt to user skill level and target role requirements, ensuring relevant and challenging practice sessions.
Developed automated feedback mechanisms that provide detailed analysis of responses, highlighting strengths and areas for improvement with actionable insights.
Technologies:
Designed an end-to-end NLP candidate search engine using BERT (Hugging Face Transformers) and FAISS to convert natural-language queries into embeddings, enabling real-time semantic matching across 10,000+ profiles with <100ms latency.
Engineered a scalable data pipeline using BeautifulSoup, to scrape, clean, and structure 10,000+ GitHub profiles, extracting features like project complexity, commit frequency, and tech stack relevance, which improved candidate-match accuracy by 40% for hiring teams.
Developed a holistic applicant evaluation portal (React frontend + FastAPI backend) where candidates showcase GitHub activity (stars, forks, PRs) alongside resumes. Integrated a Popularity Index algorithm to auto-rank talent, cutting recruiter screening time by 60% while boosting candidate visibility for niche roles.
Technologies:

DevHacks x Stratergy Hackathon
DevHacks and Stratergy

Zoom App Hackathon
Zoom
Developed an innovative Zoom application that leverages real-time transcription of lecture content to automatically generate interactive quizzes for students.
Integrated Zoom's Real-Time Messaging System (RTMS) to capture and process lecture transcripts as they happen, ensuring immediate content relevance.
Implemented Gemini AI to analyze transcriptions and intelligently generate contextually appropriate quiz questions based on the lecture material.
Built an intuitive user interface using React, TypeScript, and Tailwind CSS that seamlessly integrates with the Zoom platform as pop-up quizzes.
Created a backend infrastructure with Supabase for user authentication, quiz storage, and performance analytics.
Technologies:
Revolutionized industrial digital twin creation by developing a system that generates complete digital twin environments from natural language prompts in under 60 seconds.
Integrated Gemini AI to interpret complex prompts like "Build a 3D model of a 10-assembly-line factory" and translate them into actionable outputs.
Engineered automatic Boto3 script generation that dynamically builds assets, hierarchies, and 3D scenes within AWS IoT TwinMaker and SiteWise.
Implemented real-time data monitoring through AWS SiteWise telemetry integration with LLM interaction capabilities for instant insights.
Reduced digital twin setup time from hours to seconds through complete end-to-end automation.
Technologies:

Devils Invent Hackathon
Honeywell & Arizona State University
Interactive Dashboards
Explore my data visualization projects and analytical insights through interactive dashboards
Comprehensive data visualization and analysis dashboard showcasing project metrics, performance indicators, and key insights.
Video Content
Explore my video content showcasing projects, tutorials, and technical insights
Showcasing how this google chrome extension improves your productivity by tracking jobs directly to your google sheets.
Showcasing how this AI-powered technical recruitment platform improves the hiring process by using AI to find the best candidates.
About Me
As a Data Science master's student at ASU, I architect intelligent systems by specializing in RAG (Retrieval-Augmented Generation) pipelines for LLMs and developing sophisticated AI Agents. My core expertise lies in Natural Language Processing, where I design high-performance retrieval algorithms to power next-generation AI applications.
I translate complex theory into real-world impact. My project experience includes analyzing Time Series data to build robust IoT Pipelines for smart agriculture and engineering a production-ready, dockerized pipeline for Intel's automated self-checkout system to visualize critical data on Grafana.
I also engineered a Masked R-CNN pipeline to intelligently detect the primary root length of plant species like wheat, brassica napus, and arabidopsis thaliana, enabling biologists to study the root phenome more effectively.
Education

Master of Science in Data Science
Aug 2024 - May 2026
Tempe, Arizona

Bachelors of Science in Electrical Engineering
Aug 2020 - May 2024
Ahmedabad, India