Research

Data and analysis from the Complement team.

March 17, 2026·retentionspaced-repetitionllm

LLM-Assisted Elaboration Reduces Flashcard Re-Lapse Rates by 48%

We analyze 34,179 lapse events across 18 months of spaced repetition study to measure whether AI chat at the moment of forgetting improves subsequent retention. Using a within-card paired design where 468 cards serve as their own controls, we find a 48% relative reduction in re-lapse probability (p < 0.001), with a dose-response relationship peaking at 3–5 message exchanges.

February 15, 2026·benchmarkusmlellm-evaluation

Frontier LLM Performance on USMLE-Style Medical Questions

We evaluate 7 frontier language models on 40 expert-written USMLE Step 1 and Step 2 CK questions under zero-shot, temperature-0 conditions. Results include per-model accuracy with Wilson score confidence intervals, topic-level breakdown, and comparison against medical student performance.