2026-06-10
Sentence Spaced Repetition: 4-6x Faster Vocabulary
A 2024 study found sentence-based spaced repetition learns vocabulary 4-6x faster than single-word flashcards while keeping per-word SRS scheduling.
The short answer
Sentence-based spaced repetition can learn new vocabulary roughly 4 to 6 times faster than conventional single-word flashcards, according to a 2024 user study with 26 Danish learners.1 The key idea: dynamically combine multiple due words into one natural sentence, grade each word independently, and keep standard spaced repetition scheduling. Learners saw 3 to 4 times more distinct words per session, retained a similar fraction of what they studied, and reported higher enjoyment when sentences were pulled from a high-quality corpus rather than generated by a language model alone.
This approach sits between two common SRS habits: isolated word cards (maximum scheduling flexibility, minimal context) and fixed sentence cards (rich context, but one card per sentence). Sentence-based SRS aims for both: words schedule on their own intervals, but every review happens inside fresh, level-appropriate context.
What is sentence-based spaced repetition?
Traditional spaced repetition software (Anki, SuperMemo, Mnemosyne) usually presents one of three vocabulary formats:1
| Approach | What you review | Scheduling | Context |
|---|---|---|---|
| Single-word cards | One lemma or translation pair | Independent per word | None or minimal |
| Fixed sentence + word | One target word highlighted in a static example sentence | Independent per word | Same sentence every time |
| Whole-sentence cards | An entire sentence or passage | One interval per sentence | Full, but bundled |
| Sentence-based SRS | A new sentence built from several due words | Independent per word | Fresh context each review |
Researchers at the University of Copenhagen built AllAI (Automated Language Learning with AI) to test this fourth model. The system tracks your vocabulary, finds words due for review, and assembles a short sentence containing as many of those words as possible. After you attempt recall, you mark which individual words you missed. Each word's next review date updates separately, just like on a normal flashcard deck.1
Why context matters for vocabulary retention
Spaced repetition is one of the most evidence-backed tools in computer-assisted language learning.2 But vocabulary rarely lives in isolation. Words learned inside sentences reinforce one another, give inferential hints, and mirror how language is actually used.1
The tension is familiar to anyone who has used Anki for languages:
- Minimum information principle: Each review task should test one atomic fact so scheduling stays precise.1
- Contextual learning: Recalling a word inside a sentence is closer to real comprehension and production than staring at a bare translation.
Sentence-based SRS tries to honor both. You still schedule words independently (like single-word cards), but every exposure happens in a varied sentence (unlike fixed example sentences that repeat forever). That combination is what the 2024 study measured against a conventional baseline.
How AllAI generates sentences
Before running a live user study, the researchers simulated 20 days of study and compared several NLP pipelines. Two methods performed well enough to test with real learners:1
1. Corpus retrieval (BM25)
The system queries a filtered Wikipedia-derived corpus (Wiki-40B) with the learner's due words. A modified BM25 ranking scores sentences that contain more query words, weighting words that are due sooner more heavily. Sentences are capped at 10 words, use only vocabulary the learner already knows (plus a small number of new words), and avoid repeating the same sentence on the same day.1
Human evaluators rated retrieved sentences 100% grammatically correct in simulation. This method is also cheap to run at scale because it selects existing text rather than generating new text.
2. Few-shot language model prompting (GPT-3.5)
An alternative pipeline prompts GPT-3.5-turbo with three Danish examples and asks it to write a short sentence using five due words. The best configuration used low temperature (0.2), filtered incorrect outputs by re-prompting the model, and picked the candidate with the best scheduling score among three generations.1
Generated sentences were mostly correct but not perfect: roughly 15% were rated incorrect by human judges. A harder problem was lemma looping: the model often inflected words differently than the form stored in the learner's deck, so the "due" form never got cleared and kept reappearing.
3. Hybrid (50% retrieval, 50% generation)
A hybrid alternated between BM25 retrieval and GPT-3.5 generation. It reduced looping (retrieval breaks the cycle) but still carried some generation errors. Both the pure retrieval and hybrid pipelines advanced to the user study.
| Method | Scheduling score (lower is better) | Sentences over 10 words | Incorrect (human rating) |
|---|---|---|---|
| GPT-3.5 (best config) | 0.068 | 19.6% | 15% |
| BM25 (best-of-25) | 0.098 | 8.5% | 0% |
| Hybrid | 0.078 | 11.2% | 10% |
Scheduling score measures how much of the spaced repetition timeline is wasted by showing words before they are due or introducing unrequested new vocabulary. Scores below 0.1 mean fewer than one in ten words in a task were out of sync with the scheduler.1
The user study: 4-6x faster vocabulary growth
Twenty-six learners studied Danish for 10 days using a progressive web app. The app used the SM-2 algorithm (the basis of Anki's classic scheduler) with a simplified two-grade scale: recalled or not recalled.1 Participants were split into three groups:
| Group | What learners saw | Distinct tasks (median) |
|---|---|---|
| Baseline (single-word) | One due word highlighted in a fixed example sentence (standard Anki-style) | 15 words seen |
| Retrieval | Dynamic sentences from BM25 corpus search | 55 words seen |
| Hybrid | Alternating retrieval and GPT-3.5 sentences | 78 words seen |
Learning efficiency results
The headline finding is time efficiency: vocabulary growth per minute of study. Retrieval and hybrid groups achieved roughly four-fold greater efficiency than the single-word baseline. Overall vocabulary growth was 4 to 6 times higher, driven mainly by seeing more words per session without a drop in the fraction remembered.1
| Metric | Baseline (single-word) | Retrieval | Hybrid |
|---|---|---|---|
| Time efficiency (words/min, median) | 0.10 | 0.59 | 0.38 |
| Time efficiency (words/min, mean) | 0.14 | 0.60 | 0.54 |
| Vocabulary growth (median) | 1.5 words | 10.0 words | 6.0 words |
| Word effectiveness (retention rate) | 0.05 | 0.17 | 0.12 |
| Distinct words seen (median) | 15 | 55 | 78 |
Word effectiveness (new words remembered divided by words seen) stayed similar or slightly improved in the intervention groups. Learners were not sacrificing retention to go faster. They were simply encountering more vocabulary in the same study time because each sentence packed several due words together.1
Engagement and enjoyment
Self-reported enjoyment was significantly higher in the retrieval group than in both the baseline (p = 0.042) and hybrid (p = 0.028) groups. Efficiency and enjoyment correlated positively (Pearson r = 0.5), suggesting that faster progress made studying feel more rewarding.1
Beginners benefited most: vocabulary growth correlated negatively with prior Danish knowledge (r = -0.4). Dynamic sentences appear especially helpful early on, when every new word needs rich context and frequent exposure.
How this compares to conventional spaced repetition
Most language learners using SRS today fall into one of two camps:
- Word lists on flashcards (fast to create, but words are out of context).
- Fixed example sentences (adds context, but the same sentence repeats until it is memorized as a chunk rather than as flexible vocabulary).
The baseline in the AllAI study mimicked the second approach: each word had one permanently assigned example sentence. Sentence-based SRS beat that baseline on nearly every learning metric while also beating isolated word review on engagement.1
This aligns with broader research showing that higher-involvement, productive tasks (like writing sentences) often outperform passive fill-in-the-blank formats for vocabulary learning.3 Sentence-based SRS sits in the middle: you still recall actively, but the system supplies varied context so you are not writing from scratch every time.
Practical takeaways for language learners
1. Pack multiple due words into one review when possible
If you use Anki or another SRS app manually, consider reviewing words in short sentences you write yourself, or use add-ons that group due cards. The AllAI study suggests the efficiency gain comes from density: more target words per minute of attention.
2. Prefer real sentences over repeated static examples
A fixed example sentence on every card is better than no context, but repeating the same sentence trains sentence-level pattern matching. Varying the context forces genuine word-level recall. Corpus retrieval achieved perfect grammatical correctness in the simulation because it pulled real, attested sentences.1
3. Keep per-word scheduling
Do not sacrifice independent word intervals for the sake of context. The minimum information principle exists because bundling too much into one card makes the scheduler blind to which piece you forgot. Sentence-based SRS works because you grade each word separately after one shared sentence.
4. Be cautious with AI-generated study sentences
Large language models can produce fluent sentences, but morphology errors and lemma mismatches can break SRS scheduling in highly inflected languages. Until generation quality and form control improve, retrieval from a curated corpus (or a dictionary with attested examples) may be the safer default.1
Limitations of the study
Context matters when interpreting these results:
- Small sample: 26 participants, recruited through social networks, studying Danish for 10 days.
- Short duration: Long-term retention beyond the study window was not directly measured (though enjoyment may predict continued use).
- One target language: Danish morphology may amplify or reduce generation errors compared to Spanish, English, or Japanese.
- Multiple comparisons: With 11 metrics across three groups, some p-values would not survive strict Bonferroni correction. The efficiency gap between retrieval and baseline remained significant under that stricter threshold.1
The authors note that newer models (GPT-4 and beyond) may close the correctness gap with retrieval, but that hypothesis needs larger trials.
How to get these results in practice
LinGoat is the only language learning app that implements sentence-based spaced repetition the way this research describes: it combines multiple due words into fresh practice sentences, lets you grade each word independently, and schedules reviews with FSRS. Anki and similar tools rely on single-word cards or fixed example sentences. They do not automatically assemble dynamic sentences around your vocabulary and review schedule.
If you want the 4 to 6x efficiency gain this study measured, use LinGoat. Open the app to start, or see how it works.
References
- Paddags, B., Hershcovich, D., & Savage, V. (2024). Automated Sentence Generation for a Spaced Repetition Software. Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024), 351-364.
- Hao, T., Wang, Z., & Ardasheva, Y. (2021). Technology-Assisted Vocabulary Learning for EFL Learners: A Meta-Analysis. Journal of Research on Educational Effectiveness, 14(3), 645-667.
- Laufer, B., & Shmueli, K. (2016). Comparing Multiple Translation Tasks and Multiple Choice Tasks for Learning Words From Context. Language Teaching Research.