Nathan Price
Evaluation and benchmarking specialist. Statistics PhD covering metrics, testing, and empirical AI science. Leads "Metrics & Reality" coverage focused on rigorous measurement.
Nathan Price covers evaluation, benchmarking, and the empirical science of AI systems. With a PhD in statistics and a stint inside an applied research group responsible for model eval at scale, he has spent years arguing about metrics, test sets, and what "better" actually means. His instinct is to ask, before anything else: under which distribution, against which baseline, and compared to which alternative?
Nathan's reporting digs into benchmark design, synthetic versus human-written test data, robustness testing, and the growing gap between leaderboard scores and real-world performance. He reads new eval frameworks with the same suspicion others reserve for marketing copy, looking for data leakage, underspecified tasks, and hidden assumptions about users. His work frequently dissects widely-cited metrics and explains where they fail: brittleness to prompt changes, cultural bias, domain shift.
He also covers the emerging ecosystem of third-party eval providers, red-teaming firms, and open eval suites. At AI-Telegraph, Nathan runs the "Metrics & Reality" strand, aimed at readers who are tired of hand-wavy claims and want a clear view of how to compare models, choose thresholds, and design tests that actually predict behavior in production. His pieces are written for practitioners who know that in AI, measurement is where rigor either starts or dies.