This page documents my applied AI engineering and research projects, spanning LLM systems, retrieval-augmented generation, model compression, and production-grade ML pipelines.
Text-to-3D mesh generation using diffusion models
I built Tesseract as my first flagship, end-to-end ML system to deliberately move beyond notebook-driven experiments and understand what separates production-grade AI systems from typical college-level projects..
Early-stage 3D asset creation is slow and labor-intensive, often forcing artists and developers to start from scratch before any meaningful iteration can begin. While text-to-3D diffusion models exist, most are released as fragile scripts or demos that are difficult to integrate into real workflows, lack reproducibility, and are not designed for deployment or scaling.
I wanted to explore how a research model could be wrapped into a reliable, scriptable, and scalable system, usable through real interfaces rather than isolated experiments.
How can text-conditioned 3D mesh generation be exposed as a reliable, production-ready system suitable for batch workflows, service-based integration, and reproducible experimentation—rather than remaining a research demo or notebook artifact?
Tesseract v1 is a modular, production-oriented ML pipeline that wraps a diffusion-based 3D generation model and exposes it through multiple interfaces:
The system manages prompt ingestion, latent generation, mesh decoding, file export, and optional preview rendering in a unified pipeline. and is is intentionally designed to be stateless at the service layer, device-aware (GPU with CPU fallback), and structured in a way that is compatible with containerized deployment patterns.
Rather than focusing on model novelty, I centered this project around system reliability, iteration ergonomics, and realistic integration constraints:
While initially attempting to extract only the minimal components required for text-to-3D generation, I found that Shape-E's research-oriented codebase was highly coupled, with core functionality spanning a large dependency graph. Rather than risk subtle breakage or silent correctness issues, I chose to vendor the full Shape-E project into the core, and build clean abstraction layers around it instead of aggressively pruning internals.
This decision favored correctness, stability, and debuggability over premature modularization.
The focus was on keeping the system simple to reason about, easy to debug, and reproducible, rather than optimizing prematurely for scale.
Evaluation focused on system correctness and operational behavior, rather than benchmark scores:
The emphasis was on ensuring the pipeline behaved predictably under iteration and failure, rather than optimizing for output aesthetics alone.
Several engineering choices emerged directly from repeated friction during development.
I introduced batch processing not only to increase the likelihood of obtaining useful outputs from stochastic diffusion, but also to intentionally stress-test the pipeline and observe how resource usage and failure modes surfaced under load.
Similarly, latent cache saving was added after encountering multiple cases where decoding or rendering failures—often due to small bugs or misconfigurations—forced the entire diffusion process to restart. Persisting intermediate latents significantly reduced wasted compute and made debugging faster and less frustrating.
Integrating a research codebase like Shape-E also highlighted the tradeoff between ideal modularity and practical correctness. This project reinforced that, in applied ML systems, stability and recoverability often matter more than architectural purity.
As my first flagship, end-to-end ML system, Tesseract helped solidify my understanding of what distinguishes robust engineering from experimental code: careful state management, clear boundaries, and designing for things to fail gracefully.
Completed (v1) — a production-oriented architecture with clear scope for future improvements in model quality, evaluation rigor, and modular refinement.
July 2025
LLM-based user persona generation from Reddit activity
I built Reddit-Persona to explore how large language models can synthesize coherent behavioral personas from noisy, unstructured, real-world text data. This project originated as a pre-interview internship assignment, but I intentionally extended it beyond initial requirements to understand how LLMs behave when tasked with higher-level reasoning over fragmented user activity rather than direct question answering.
I was particularly interested in how design choices around chunking, context density, and output structure affect the quality and interpretability of persona-level outputs.
How can we transform a Reddit user's scattered posts and comments—spanning multiple topics, tones, and timeframes—into a structured, interpretable persona, without overwhelming the model or losing important behavioral signals?
Reddit-Persona is a modular pipeline that:
The system supports both a CLI workflow for scripted runs and a Streamlit UI to make persona generation accessible to non-technical users.
This project emphasized LLM interaction design and reasoning quality, rather than infrastructure scale:
These decisions were driven by the goal of understanding how to reason with LLM outputs, not just generate text.
I chose LLaMA-3.3-70B for this task because persona synthesis requires semantic abstraction, consistency, and narrative grounding, rather than factual recall or short-form completion.
Compared to smaller models, LLaMA-70B demonstrated:
Using Groq's inference API allowed fast, low-latency processing of multiple chunks per user, making iterative experimentation feasible without compromising output quality. This was particularly important given the chunk-based design of the system.
The project was intentionally kept simple from an infrastructure perspective to focus on reasoning quality, modularity, and clarity.
Evaluation was primarily qualitative and interpretive:
The emphasis was on interpretability, traceability, and reasoning fidelity, rather than numerical benchmarks.
This project surfaced several important lessons.
Aggregating all user activity into a single prompt consistently degraded persona quality, reinforcing the importance of chunk-level reasoning. However, generating multiple personas introduced a new ambiguity: deciding which persona to trust. This motivated the addition of an experimental ranking step that re-evaluates persona blocks and selects the most insight-rich candidate, along with an explanation for the choice.
From an engineering standpoint, this was my first project where I became comfortable with clean Python project structuring, modular design, and environment isolation using Conda. While it does not yet include advanced logging or configuration separation, it laid the groundwork for how I structure larger systems today.
Finally, the project highlighted ethical and practical considerations around persona generation from real user data. For this reason, the system is not publicly hosted and requires users to supply their own API keys.
Completed — a functional, reasoning-centric system that explores LLM-driven behavioral analysis and serves as an early foundation for more advanced applied and research-oriented work.
2025 - Present
Research-oriented LLM system for DevOps incident reasoning
Designing a research-oriented LLM system to study retrieval-augmented reasoning and parameter-efficient adaptation under realistic system constraints. Emphasizes reproducible evaluation over application demos.
RAG Systems, LoRA, Model Quantization, Evaluation Frameworks, System Optimization
Smart India Hackathon
Radar-vision research prototype for bird vs drone discrimination
An exploratory radar–vision research system for bird vs drone discrimination using micro-Doppler signatures
SkySentinel-X is a research-oriented prototype built to investigate whether micro-Doppler signatures extracted from FMCW radar signals can be effectively transformed into a visual learning problem for modern deep learning models.
Low-altitude airspace monitoring systems frequently struggle to distinguish biological targets (birds) from small UAVs, leading to false alarms and unreliable threat assessment. Traditional radar pipelines rely on handcrafted features and heuristics, which often fail under real-world variability.
This project explores a different question: Can time–frequency representations of radar returns (STFT spectrograms) serve as a robust intermediate representation for CNN-based classification of aerial targets?
SkySentinel-X was developed as part of the Smart India Hackathon (SIH) problem statement on micro-Doppler–based target classification and successfully cleared the college-level selection round, validating its research direction and technical grounding.
Objective: To experimentally evaluate whether deep convolutional models trained on STFT spectrograms can learn discriminative micro-Doppler patterns that separate birds from drone-like aerial objects.
Key research challenges addressed:
SkySentinel-X implements a signal-processing → vision-learning pipeline, designed explicitly for experimentation and evaluation:
Radar Signal Processing
Representation Learning
Hierarchical Inference
This structure prioritizes interpretability and experimental control, not deployment.
Why STFT Spectrograms?
Why CNN Transfer Learning?
Handling Dataset Bias
This project intentionally avoids over-engineering and instead focuses on clean experimental design.
Dataset
Results
Observations
These results support the hypothesis that micro-Doppler spectrograms are a viable visual representation for aerial target classification.
Signal Processing
ML & Experimentation
Analysis
This project is positioned as a research probe, not a finished product.
Potential downstream relevance:
Its primary contribution is methodological: demonstrating a clean bridge between radar signal processing and deep visual learning.
SkySentinel-X represents an important shift in my work toward research-driven system design:
While currently implemented as a notebook-based research prototype, the pipeline is intentionally structured to be refactored into modular training and inference systems in future iterations.
Type: Research prototype
Stage: Experimental validation complete
Current form: Jupyter-based exploratory pipeline
Future work:
College Research Project
Low-cost wearable computer vision glasses aid
Omni-Purpose Real-time Computer-vision Assistant (College Research Project)
Assistive smart glasses for visually impaired users exist, but they are typically proprietary, expensive (₹1–3 lakh / ~$1,200–$3,600), and tightly coupled to closed hardware and cloud-based inference.
ORCA v0 was built as a college research project to explore a fundamental question: Can modern open-source computer vision models be composed into a low-cost, real-time, wearable vision aid with meaningful assistive capabilities?
The project targeted a novel computer vision glasses concept, using commodity hardware and open-source models to prototype a system that could assist users with environmental awareness, navigation, and social interaction, while remaining affordable and privacy-preserving.
How can we design a multi-purpose wearable computer vision system that:
...rather than being a closed, single-purpose demo or a high-cost commercial device?
ORCA follows a hybrid edge-compute architecture designed for prototyping wearable vision systems:
The system intentionally supports multiple use cases, not just one, making it closer to a general-purpose vision aid than a task-specific demo.
ORCA was designed to support and experiment with:
This multi-capability design was deliberate — most college projects focus on one vision task, whereas ORCA explored system-level integration across tasks.
1. Multi-Model, Task-Specific Pipelines
Instead of forcing all tasks into a single model, ORCA uses specialized models per capability:
This allowed independent benchmarking of accuracy vs latency, faster iteration and model swapping, and clear separation of concerns across pipelines.
2. Low-Cost Hardware Constraint
A strict constraint was using ESP32-CAM instead of high-end sensors or proprietary cameras. This forced realistic engineering trade-offs around frame resolution, network bandwidth, model size and inference speed, and end-to-end latency budgets. This constraint is what makes ORCA meaningful as a wearable assistive CV project, not just a desktop demo.
3. Privacy-Preserving, Local Inference
All inference runs locally: No cloud APIs for face recognition or object detection, no external transmission of sensitive visual data, suitable for assistive and accessibility-related contexts. This mirrors real-world constraints in assistive technology design.
4. Quantization for Real-Time Performance
Models were tested with INT8 quantization, reducing memory usage by ~40–50% while maintaining usable accuracy. This enabled ~8 FPS end-to-end processing, sub-second latency, and feasibility on consumer-grade hardware.
Languages & Frameworks: Python, OpenCV, PyTorch
Models: YOLOv10, NanoDet (object detection), YuNet, S-Face (face detection & recognition), MiDaS / Depth Anything V2 (depth estimation)
Hardware: ESP32-CAM (~₹800 / ~$10), Consumer PC / edge device
System Components: Wireless video streaming (HTTP), Parallel inference pipelines, Audio alert system, Visual debugging overlays
ORCA was tested across indoor and outdoor environments, focusing on practical usability rather than benchmark chasing.
Approximate observed performance:
These results validated the feasibility of a low-cost, multi-purpose wearable vision aid, though not production readiness.
This project marked an important transition from single-model demos to system-level computer vision engineering.
Key takeaways:
ORCA also revealed gaps — configuration management, logging, and abstraction — which directly informed the design of later projects like Tesseract and MÍMIR.
Current limitations: Performance degrades in low-light conditions, Depth estimation adds significant latency, Configuration is partially hard-coded
Future improvements: Unified config and logging layers, Better low-light preprocessing, Spatial audio feedback, More capable edge hardware (Jetson / Raspberry Pi)
Status: Completed (College Research Project)
Nature: Experimental research & prototyping
Outcome: Demonstrated feasibility of a low-cost, multi-purpose computer vision glasses aid
College Project & Hackathon
Edge-deployed contamination detection system
Low-cost, field-deployable computer vision and IIoT system for chemical contamination screening using color-based sensing and lightweight machine learning.
Monitoring chemical contamination in the field is often expensive, slow, and inaccessible, especially in rural or resource-constrained environments. In India alone, laboratory-grade chemical tests can cost ₹600–₹1200 ($7–$15) per sample, making large-scale or frequent monitoring impractical.
Vulkyrie was built as a low-cost, field-deployable computer vision and IIoT system to explore whether color-based sensing + lightweight machine learning can serve as an early-warning mechanism for chemical contamination.
This project specifically targeted Diclofenac contamination, a chemical linked to a catastrophic 99% collapse of India's vulture population, with severe downstream ecological and public-health consequences. The goal was not laboratory-grade accuracy, but rapid, accessible, and scalable screening.
How can chemical contamination be detected cheaply, in real time, and in the field, using hardware that costs under ₹1000 ($12–$15), while still producing meaningful, interpretable outputs that can guide further investigation?
Vulkyrie is an end-to-end research-to-prototype pipeline spanning:
Rather than focusing purely on model accuracy, the system emphasizes deployment realism, hardware constraints, and operational usability.
Random Forest Regression was chosen over deep models due to:
Edge-first design:
Model-to-hardware translation:
Data augmentation over brute-force modeling:
Centralized IIoT backend:
Input: RGB values from TCS3200 color sensor
Preprocessing:
Model:
Outputs:
Inference:
Interaction:
Cost:
This makes Vulkyrie orders of magnitude cheaper than conventional lab testing for preliminary screening.
Interactive geographic contamination maps, centralized monitoring of distributed devices. Designed to simulate IIoT-style deployment, not just a standalone device.
Dataset:
Evaluation focused on:
The system is explicitly positioned as early-warning & screening, not definitive diagnosis.
This project was intentionally ambitious for its scope and constraints. The most challenging aspects were:
Vulkyrie was also one of the first projects where I deeply engaged with:
Stage: Experimental / Prototype
Context: College project & hackathon prototype
Readiness: Not production-grade, but technically validated
Positioning: Proof-of-concept for low-cost, AI-assisted contamination screening