Language Model Evals

Sobering up on AI Progress w/ Dr. Sean McGregor

Sobering up on AI Progress w/ Dr. Sean McGregor

Why AI benchmarks fail, how safety gets measured wrong, and what real evaluation should look like with Dr. Sean McGregor

Dec 29, 2025

Tailored Truths: Persuasive Capabilities of LLMs

Tailored Truths: Persuasive Capabilities of LLMs

Landing page for "Tailored Truths" research paper.

Feb 11, 2025

Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts

Machine Learning

Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts

Landing page for the Benchmark Inflation research paper.

Oct 11, 2024