Projects
Selected research and systems work — each is a short case study with figures, numbers, and links.
LLM evaluation · Benchmark auditing
Do truth benchmarks test reasoning, or surface cues? A six-feature audit (SURFACE6) finds the leakage — and a pruning method yields a cleaned TruthfulQA.
View project → Paper · arXiv link soon
Statistics · Algorithms
A multi-dimensional Kolmogorov–Smirnov distance that's a true metric — with the first stable, finite-sample two-sample test, near-linear in 2–4D.
Wearable sensing · Distributed systems
A privacy-first phone + BLE-wristband platform for an NIH physical-activity study — dependable at ~7.7M records a day.
Spatial statistics · Python
Open-source spatial scan statistics in Python — finds anomalous regions in geographic data, fast enough for real data.
View project → Paper · arXiv link soon