Research
Data and analysis from the Complement team.
·retentionspaced-repetitionllm
LLM-Assisted Elaboration Reduces Flashcard Re-Lapse Rates by 48%
We analyze 34,179 lapse events across 18 months of spaced repetition study to measure whether AI chat at the moment of forgetting improves subsequent retention. Using a within-card paired design where 468 cards serve as their own controls, we find a 48% relative reduction in re-lapse probability (p < 0.001), with a dose-response relationship peaking at 3–5 message exchanges.
·benchmarkusmlellm-evaluation
Frontier LLM Performance on USMLE-Style Medical Questions
We evaluate 7 frontier language models on 40 expert-written USMLE Step 1 and Step 2 CK questions under zero-shot, temperature-0 conditions. Results include per-model accuracy with Wilson score confidence intervals, topic-level breakdown, and comparison against medical student performance.