Kshitiz Dhakal Data & AI Engineer · Kathmandu

Kshitiz Dhakal.

I am a Data & AI Engineer. thinking in Product. a Pipeline Architect. a Mentor. a Lakehouse Builder. a researcher in AI and LLMs. an Engineer.

An engineer who works with data, machine learning, and the craft of moving information from where it is to where it can be useful.

Currently Data Engineer
Maitri Services  ·  AI / Healthcare
Based in Kathmandu
Nepal  ·  UTC+05:45
Available for research, engineering,
consulting & speaking
Σ 3+ yrs In production data eng.
5 Companies & fellowships
1 Peer-reviewed paper
ETL pipelines, give or take
§ I  ·  What I'm doing now Section 01
Currently at Maitri Services

Mapping clinical text to structured medical codes.

I am working on an NLP/LLM pipeline that reads unstructured clinical notes and lifts the right ICD-10 and HCC codes out of them — quietly, reliably, and within HIPAA's lines.

January 2026 — present  ·  Remote / Kathmandu

  • 01Multi-stage inference: Named Entity Recognition → context modelling → embeddings → re-ranking → date-of-service detection.
  • 02Fine-tuned LLMs for clinical entity recognition and automated medical code prediction.
  • 03Asynchronous orchestration with AWS SQS + Celery — distributed, fault-tolerant, idempotent.
  • 04HIPAA-compliant handling of PHI, with practical training in US healthcare data privacy.
§ II  ·  The path so far Section 02

My career, step by step.

Each role here marks a bench, a problem worth solving, and what I learned by moving on.

  1. 2026 — present Jan onwards

    Data Engineer, AI & healthcare

    Maitri Services

    Owning the inference architecture that turns clinical narrative into ICD-10 and HCC codes — NER, context modelling, embeddings, re-ranking, DOS detection — wrapped in HIPAA-grade data hygiene.

    LLMs NER AWS SQS Celery HIPAA
  2. 2024 — 2025 Aug → Dec

    Data Engineer

    GrowByData

    Built a batch processing framework for Google Feeds, optimised an existing data acquisition stack, integrated new agencies, migrated Dremio from AWS to Kubernetes, and co-led the data team. Mentored interns. Wrote shell scripts at unholy hours.

    Spark Dremio Kubernetes GraphQL Leadership
  3. 2023 — 2024 Aug → Aug

    Associate Data Engineer

    GrowByData

    Designed Spark/Scala ETL pipelines, built a data-lakehouse on S3 + Glue, ran OAuth2 integrations with multi-threaded producer-consumer patterns, and stood up a Flask-based BI reporting backend with Redis caching and per-client anonymisation decorators.

    Scala Apache Spark S3 · Glue · EMR Flask Redis
  4. 2023 May → Jul
    λ

    Data Engineering Fellow

    GritFeat Solutions

    A summer immersed in modern data tooling: PySpark on Databricks for weather forecasting, dbt for a hotel data project, Pentaho ETL, Snowflake, end-to-end cleaning to dashboards in Tableau and Power BI.

    PySpark dbt Snowflake Tableau
  5. 2022 Jul → Aug
    ƒ

    Data Visualization Intern

    GlobalShala

    First professional brush with data: visualising advertising performance with pandas, pruning campaigns with weak ROAS. The first time SQL felt like a power, not a chore.

    pandas SQL Visualisation
§ III  ·  Education Section 03

Where the toolbelt was first assembled.

2018 — 2023 Five years
78.9% cumulative average

Bachelor in Electronics, Communication & Information Technology

Pulchowk Campus  ·  Institute of Engineering, Tribhuvan University

Five years where the foundations were poured — signal processing, machine-learning fundamentals, hands-on research. Where I first learned that data is most interesting at the edges where the model breaks.

  • Capstone — Text Summarisation

    Latent Semantic Analysis paired with transformer encoders, evaluating where each technique covers the other's blind spots.

  • Research — Sentiment Classification

    A weighted ensemble of SVM and Naïve Bayes; later published in MAT Journals (2025).

DSP ML fundamentals NLP research Communication systems Embedded
§ IV  ·  Selected work Section 04

A field guide to things I have built.

Spanning marketing intelligence, healthcare NLP, and a few stubborn personal projects that wouldn't leave my head.

№ 01  ·   Healthcare NLP

ICD-10 & HCC code prediction.

Multi-stage clinical NLP — NER, context modelling, embedding generation, re-ranking, date-of-service detection — built into a fault-tolerant async pipeline at Maitri Services.

LLMs · NER · AWS · HIPAA 2026 →
№ 02  ·   Marketing science

Media Mix Modeling.

A statistical regression engine that helped marketing specialists plan spend across channels. SLSQP optimisation for revenue, time-series analysis, and forward-looking forecasts that actually got read.

Python · regression · SLSQP GrowByData
№ 03  ·   LLM tooling

LLM-based insights.

OpenAI integration into reporting — turning numerical anomalies into one-paragraph narratives the account team could send to clients without rewriting.

OpenAI · Python · BI GrowByData
№ 04  ·   Research project

Weighted ensemble for sentiment.

A hybrid SVM + Naïve Bayes approach to sentiment classification — weighted to coax higher final accuracy than either model alone. Published in MAT Journals, 2025.

SVM · NB · ensembling Published
№ 05  ·   Research project

Summarisation with LSA + transformers.

A custom transformer for abstractive summary, paired with LSA for extractive baselines. Heading generation by frequency normalisation; significant-word extraction as a side dish.

Transformers · LSA · NLP Undergraduate
№ 06  ·   Personal project

Co-occurrence of medical procedures.

An Apriori-driven exploration of which tests tend to follow which — a small bet on whether classical association rules still have something to say in clinical data.

Apriori · health data Personal
§ V  ·  What I work with Section 05

A cloud of tools and tongues.

Sized by how often I reach for them. Hover any tag to see which family it belongs to — or hover a family to gather its tags.

§ VI  ·  Writing & awards Section 06

Publications, and a few small honours.

2025

A Weighted Ensemble Approach to Sentiment Classification using SVM and Naïve Bayes.

MAT Journals  ·  August 2025

A study in how a weighted blend of two classical classifiers can edge past either alone, with attention to where each model fails the other can cover.

Cert.

Standing certificates.

AWS · Coursera · IBM

AWS Academy Cloud Foundations  ·  Deep Learning Specialization  ·  IBM Data Science Specialization.

§ VII  ·  References Section 07

People who'll vouch for me.

Three people who have signed off on shipped work and weathered hard quarters with me.

Kyle Tremblay

Director of Product Development  ·  Agital USA

Two years of partnership across BI reporting, ML-driven features, and LLM-based analysis on the proprietary platform. Watched the role grow from individual contributor to co-managing the backend team — both shipping new third-party integrations and mentoring the people building them.

Frank Kjaersgaard

Chief Product Officer  ·  Agital USA

Sponsor of the move to a modern lakehouse end-to-end — feed pipelines, query optimisation, model and LLM enhancements. Notes especially the times during team transitions when critical workloads were absorbed independently and operations kept running, quietly, without dropping a beat.

Ambika Adhikari

Engineering Manager  ·  Maitri Services

Supervised the pivot from an Electronics & Communication background into industrial data work — ETL, integrations, statistical modelling, and a Media Mix Modeling forecaster built from scratch. Possesses a researcher's mind: explore several approaches, evaluate honestly, then own the result either way.

§ VIII  ·  Get in touch

Let's talk.

If something here lines up with what you're building — a healthcare data problem, an LLM pipeline, a stubbornly slow ETL — I'd love to hear about it.

also

If your work lives in research — papers waiting to be co-authored, datasets begging to be wrangled, AI & data ideas worth chasing — that letter is welcome too.

Kathmandu, Nepal +977 9860 113482