AI + Data + RAG Engineer

Building Reliable
AI Systems with
RAG, Data & LLMs

From raw data to intelligent insights — using production-grade pipelines, semantic retrieval, and thoughtfully integrated language models.

View Projects Get in Touch
Python RAG Pipelines LLM Integration MERN Stack Vector Databases Data Engineering
Scroll
About
Harshal
Shilwant
AI Systems Engineer
Experience 2 Years
Domain Cognitive Research Market
Previously TechnoNexis
Focus RAG · LLMs · Data Pipelines

I don't just integrate APIs — I architect systems that make AI actually work in the real world. That means obsessing over data quality before a single prompt is written, understanding retrieval semantics before vector embeddings are configured, and thinking about failure modes before deployment.

At TechnoNexis, I built end-to-end RAG pipelines for the cognitive research market — ingesting messy, real-world data from PDFs and spreadsheets, cleaning it, chunking it intelligently, embedding it, and serving it through LLMs that returned accurate, grounded answers.

My philosophy: garbage in, garbage out. Before any LLM touches data, it must be clean, structured, and semantically meaningful. I care about reducing hallucination, improving retrieval precision, and building backends that are reliable under production load.

🧬
Data-First Thinking
Every AI system starts with data quality — cleaning and schema design before any model work.
🎯
Grounded Outputs
Reducing hallucination through retrieval design, not prompt hacks.
⚙️
System Design
APIs, pipelines, and services built for scale — not just demos.
📐
Eval-Driven Dev
Output quality measured and improved systematically, not by feel.
Featured Work

Projects That Ship

Real systems solving real problems. Each project reflects a full engineering loop — from data to deployment.

01 / 04 RAG · NLP
AI Market Research RAG System
End-to-end pipeline ingesting PDF & Excel research reports. Chunks, embeds, and retrieves context for LLM-powered Q&A with drastically reduced hallucination versus vanilla GPT prompting.
Python LangChain Pinecone OpenAI API FastAPI PyMuPDF
68%
Hallucination Reduction
~200ms
Query Latency
System Highlights
  • Semantic chunking with overlap strategy to preserve context across section boundaries
  • Hybrid retrieval: BM25 sparse + dense vector search, reranked via Cohere
  • Per-query citation extraction — every answer grounded to source document & page
  • Async FastAPI backend with request queue and rate-limiting for multi-user load
02 / 04 Analytics · AI
Excel Analytics + AI Insight Platform
Upload any structured Excel file and receive auto-generated visualizations, data summaries, and natural-language insights. AI detects trends, outliers, and business signals automatically.
React Node.js Python Pandas Recharts GPT-4o
3s
Avg. Insight Time
12+
Chart Types
System Highlights
  • Schema inference engine auto-detects numeric, categorical, and temporal columns
  • AI-generated chart recommendations based on data shape and column types
  • LLM summarizes each chart with business-level language, not technical output
  • MERN full-stack with file streaming — handles Excel files up to 50MB
03 / 04 Search · Vector DB
Semantic Search Engine
Vector database-backed retrieval system replacing keyword search. Uses dense embeddings to match intent, not just vocabulary — significantly improving result relevance for domain-specific corpora.
Sentence Transformers Qdrant FastAPI Docker MongoDB
91%
Recall@10
40ms
P99 Latency
System Highlights
  • Fine-tuned bi-encoder on domain-specific query-document pairs for higher precision
  • HNSW index in Qdrant for sub-50ms approximate nearest neighbor search
  • Faceted filtering: combine semantic score with metadata filters in one query
  • Dockerized deployment with horizontal scaling support via load balancer
04 / 04 Backend · LLM Ops
LLM API Backend Architecture
Production-grade backend for serving multiple LLM providers behind a unified API. Includes model routing, fallback logic, cost tracking, prompt versioning, and caching — all production-ready.
Node.js Express Redis PostgreSQL OpenAI Anthropic
60%
Cost Reduction (cache)
99.9%
Uptime (fallback)
System Highlights
  • Unified API gateway — swap providers (OpenAI → Anthropic → Mistral) via config
  • Semantic response caching with Redis: identical-intent queries hit cache not API
  • Prompt version registry — rollback, A/B test, and track prompt performance
  • Per-tenant cost tracking and token budget enforcement in real-time
System Design

Architecture Thinking

These are the core systems I design and reason about. Clean flows, defined responsibilities, observable outputs.

Pipeline 01
RAG Pipeline — Ingestion to Answer
📄 Raw Document
🧹 Extraction & Cleaning
✂️ Chunking Strategy
🔢 Embeddings
🗄️ Vector Store
🔍 Retrieval + Rerank
🤖 LLM + Context
✅ Grounded Answer
The key design decision: chunking strategy determines retrieval quality more than model choice. I use semantic chunking with sliding overlap (128-token overlap on 512-token chunks) to preserve cross-boundary context. Retrieval uses hybrid BM25+dense, reranked before LLM injection.
Pipeline 02
Data Processing Flow — Raw to Production-Ready
📊 Excel / CSV / PDF
🔎 Schema Inference
🧹 Dedup + Nulls
🔧 Type Normalization
✅ Validation Layer
🗃️ Clean Store
🚀 Downstream AI
Data quality gates catch problems before they propagate. Schema inference auto-detects column types; validation rules flag statistical anomalies (outliers beyond 3σ) and structural issues (missing required fields) before any AI system touches the data.
Pipeline 03
LLM Request Flow — Optimized for Cost & Reliability
📥 API Request
🔐 Auth + Rate Limit
💾 Semantic Cache?
📝 Prompt Builder
🤖 Model Router
⚡ LLM Provider
📊 Log + Track Cost
📤 Response
The model router selects provider based on task type, cost budget, and latency SLA. Cache hit rate of ~60% achieved by embedding incoming queries and checking cosine similarity against recent responses — not exact string match. Fallback chain: primary → secondary → queue.
Capabilities

Skills & Tools

AI / LLM
RAG Pipelines
LangChain
OpenAI API
Prompt Engineering
LLM Evaluation
Data Engineering
Python / Pandas
Data Cleaning
ETL Pipelines
Vector DBs (Pinecone, Qdrant)
MongoDB
Backend
Node.js / Express
FastAPI
REST API Design
Redis (Caching)
Docker
Frontend
React.js
JavaScript (ES6+)
Tailwind CSS
Data Visualization
Tools & Ecosystem
Git & GitHub VS Code Postman Jupyter Vercel Render PyMuPDF Cohere Reranker Sentence Transformers HuggingFace HNSW Index PostgreSQL
Engineering Perspective

How I Think

The mental models and design principles I apply when building AI systems.

01
How do I design a RAG system from scratch?
I start with the query, not the documents. What does a "good answer" look like? That drives everything — chunking size, retrieval strategy, and how much context the LLM actually needs.
1.Define answer quality first (what's a good vs bad response?)
2.Audit source documents — types, sizes, structure
3.Design chunking strategy around semantic boundaries
4.Choose retrieval type (dense, sparse, hybrid) based on query diversity
5.Add reranking — always improves precision for minimal cost
02
How do I reduce hallucination in LLM outputs?
Hallucination is mostly a retrieval problem, not a prompting problem. If the right context doesn't reach the LLM, no prompt will fix it. I focus on retrieval precision first.
Improve chunk quality — semantic coherence over arbitrary splits
Add citation constraints in system prompt — force source grounding
Use reranking to filter irrelevant retrieved context
Measure faithfulness score (RAGAs) on every release
03
How do I evaluate LLM output quality systematically?
You can't improve what you don't measure. I build eval pipelines that run on every code change — not manual vibe-checking before release.
Build a golden dataset of 50-100 query-answer pairs per domain
Track: faithfulness, answer relevance, context precision (RAGAs)
Use LLM-as-judge for semantic similarity scoring
Alert on metric regression — treat evals like unit tests
Get in Touch

Let's Build
Something Intelligent

Have an AI system to build? Let's talk architecture first.

Available for AI & Data Engineering Projects