About — Pratyaxa

The metamorphosis

Five phases. One compounding arc.

Data Scientist and applied ML researcher with 15 years building production AI systems at the intersection of large language models, graph analytics, and enterprise compliance. Currently architecting multi-agent LLM orchestration and knowledge graph infrastructure for regulatory workflows in financial services — and independently researching in-context learning theory and model compression. I write about the gap between ML research and production reality: what the papers don't tell you, what breaks at scale, and what's worth building from scratch.

Below: the arc that got me here — not a career history, but a sequence of transformations where each identity was prerequisite to the next.

2009 — 2014

The enterprise apprentice

Five years writing code against business specs at Cognizant. Installed the domain intuition that every later phase would draw on — you can't architect regulated systems without having lived inside their logic first.

Stopped asking "what does the spec say" — started asking "what does the data say."

2014 — 2016

The model builder

First pure ML role at EXL. Built, validated, and retrained predictive models across the full lifecycle. Learned early: a model without a retraining pipeline is a ticking clock.

Models work. But who's watching them drift?

2016 — 2019

The signal hunter

Applied ML to revenue problems — churn, refinance targeting, order fallout. Mined unstructured product data at scale to extract sentiment, causal drivers, and latent themes. First time owning the full arc from raw data to business recommendation.

Moved to the US. The question became: how do you govern this at enterprise scale?

2019 — 2023

The systems architect

The model-builder became a systems-thinker. Real-time fraud detection via Kafka. Credit scoring for a major bank. COVID-era risk adjustment. Then the graph era — ontologies, RDF knowledge bases, data fabric architecture.

LLMs arrived. The graph work and NLP instincts converged.

2023 — now

Current

The orchestrator-researcher

Dual track. By day: multi-agent LLM orchestration on AWS Bedrock with Neo4j for regulatory compliance. By night: formalizing in-context learning theory, deconstructing GPTQ from scratch, publishing the internals.

Read the full evolution →

Selected Projects

LLM Infrastructure

↗

Multi-Agent Orchestration at Scale

Distributed agent system on AWS coordinating specialized LLMs across Neo4j knowledge graphs and Bedrock inference for enterprise compliance workflows.

AWS Bedrock EKS Neo4j MCP Python

Data Quality

↗

AI-Curated Data Quality Governance at Scale

AI-enabled system that ingested, curated, and refined data quality rules across disparate source systems — translating heterogeneous rule definitions into standardized, configurable Great Expectations YAML for enterprise-wide validation.

Python Streamlit Great Expectations Oracle SQL

Graph Analytics

↗

Structural Entropy in Fraud Networks

Neo4j-based fraud detection using community detection (Louvain, k-core, label propagation) and structural entropy measures across transaction networks.

Neo4j GDS Louvain Cypher Python

Compliance AI

↗

Agentic Regulatory Control Extraction

Decomposed federal deposit insurance mandates into 110 structured IT controls using an 8-pass LLM extraction pipeline — each control anchored to its source regulatory provision, enabling automated gap analysis and audit-ready compliance traceability.

Claude API Cytoscape.js GRC Regulatory Traceability

Stream Processing

↗

Log Mining to Enterprise Event Platform

Mined and curated 12+ security event types from raw, unutilized enterprise logs — evolving into a no-code/low-code detection platform that centralized event observability org-wide via YAML-driven pattern configuration.

Spark Streaming Kafka YAML Scala

NLP Pipeline

↗

Iterative Relation Extraction with LLMs

Multi-pass relationship extraction system using the Claude API with pronoun resolution and iterative refinement across unstructured text corpora.

Claude API NLP Python Neo4j

Research Interests

2025

In-Context Learning Theory

Formalizing ICL success conditions through an information-theoretic lens — mutual information between prompt features and output correctness, tractable bounds, and empirical predictions.

Active

2025

Memorization in Transformer Layers

Empirical study of how memorization concentrates across transformer layers — identifying which layers disproportionately contribute to verbatim recall using GPT-2 on A100 GPU.

Active

2024

Model Quantization & Compression

Hands-on exploration of post-training quantization on 7B-parameter models — per-layer error accumulation, accuracy/compression tradeoffs, and what GPTQ actually looks like from scratch.

Exploring

Building systems that reason at scale