Peer-reviewed methodology
The Science Behind
Human Benchmark
Every test is grounded in peer-reviewed cognitive psychology. Here is how we translate validated laboratory paradigms into browser-based measurements - and where the limits lie.
Test Paradigms & Clinical Basis
Each test maps to a validated cognitive construct from the neuropsychological literature. The table below shows the source paradigm, the cognitive domain measured, and the key dependent variable. Take any test and compare your score to 50M+ sessions in our test library.
| Test | Key Metric |
|---|---|
| Reaction Time | Mean RT (ms) |
| Aim Trainer | Mean click time (ms) |
| Number Memory | Longest correct span |
| Verbal Memory | Hit rate − FA rate |
| Chimp Test | Max number recalled |
| Sequence Memory | Max sequence length |
| Visual Memory | Levels completed |
| Typing Speed | WPM with accuracy |
| Pattern Recognition | Response latency (ms) |
| Processing Speed | Items/min |
| Attention & Focus | % correct responses |
Full reference list available in the methodology notes. All paradigms adapted for browser delivery with documented validity trade-offs (see Limitations).
Global Benchmark Data
Aggregated from 50M+ anonymized test sessions. Trimmed means exclude the bottom 5% and top 1% of scores to reduce outlier contamination. The chart below shows the Reaction Time distribution — the most completed test on the platform, with over 8.4 million scored sessions.
Reaction Time — Score Distribution
Based on 8.4M scores. Vertical axis = % of users in each 20ms bin. Global mean = 284ms, median = 271ms.
| Percentile | Reaction Time | Number Memory | Typing Speed | Aim Trainer |
|---|---|---|---|---|
| Top 1% | <190ms | 12+ digits | 95+ WPM | <200ms |
| Top 5% | <215ms | 10+ digits | 85+ WPM | <250ms |
| Top 10% | <235ms | 9+ digits | 75+ WPM | <280ms |
| Top 25% | <255ms | 8+ digits | 65+ WPM | <320ms |
| Median | 271ms | 7 digits | 52 WPM | 380ms |
| Bottom 25% | >310ms | 6 digits | <38 WPM | >480ms |
All benchmarks derived from Human Benchmark database. Scores may differ from clinical norms due to self-selection bias — online testers skew younger and more tech-savvy than general populations.
Age Effects on Cognitive Performance
Cognitive speed peaks in the mid-20s and declines gradually across the lifespan. Reaction Time is the most age-sensitive metric; Number Memory and Verbal Memory follow a similar but slightly delayed curve. The data below is consistent with cross-sectional findings from large-scale studies including the Midlife in the United States (MIDUS) study and Salthouse (2010).
Reaction Time by Age Group
Mean reaction time (ms) across age brackets. Data from 12M scored sessions with self-reported age.
| Age Group | Mean RT | Number Memory | Typing WPM | Verbal Memory |
|---|---|---|---|---|
| 16–19 | 245ms | 7.2 digits | 55 WPM | 61 words |
| 20–29 (peak) | 240ms | 7.8 digits | 60 WPM | 64 words |
| 30–39 | 255ms | 7.4 digits | 57 WPM | 60 words |
| 40–49 | 277ms | 6.9 digits | 52 WPM | 55 words |
| 50–59 | 309ms | 6.5 digits | 46 WPM | 50 words |
| 60–69 | 355ms | 5.9 digits | 40 WPM | 44 words |
| 70+ | 401ms | 5.3 digits | 33 WPM | 38 words |
Age data is self-reported and therefore noisy. Younger cohorts may be over-represented due to internet demographics. See Limitations.
Device Latency & Measurement Precision
Browser-based testing cannot eliminate hardware latency. We use performance.now() which provides sub-millisecond precision for JavaScript timing, but the total measurement chain includes display response time, input device latency, and OS interrupt scheduling. This is especially relevant for the Reaction Time and Aim Trainer tests, where reported scores may be 15–50ms slower than your true neurological speed.
Estimated Total Hardware Latency by Setup
Added latency from hardware — not your brain. Lower is better. Values are typical ranges; individual hardware varies.
| Latency Source | Typical Range | Mitigation |
|---|---|---|
| Display response time | 1–50ms | Use a gaming or high-refresh monitor |
| Mouse polling rate | 1–8ms | Use a wired mouse at 1000Hz polling |
| OS interrupt scheduling | 0–15ms | Cannot be fully controlled in-browser |
| Browser event loop | 0–4ms | We use requestAnimationFrame where possible |
| JavaScript timer drift | 0–1ms | performance.now() is used throughout |
| Network / server | 0ms | All timing is client-side only |
Statistical Methodology
Trimmed Means
Percentile calculations use a 5%/1% trimmed mean to exclude outliers. This removes scores from users testing in unusual conditions (e.g., running software updates, tab switching during test) without discarding the whole session.
Rolling Baseline
Percentiles are computed against a rolling 90-day window rather than all-time data. This corrects for population drift — the average user today is slightly faster than 5 years ago due to better hardware.
Multi-Attempt Averaging
The Reaction Time test and Aim Trainer require 5 and 30 attempts respectively before reporting a final score. Single-attempt measurements have high variance; averages reduce the standard error by ~√N.
Percentile Rank Reporting
We report "top X%" rather than raw z-scores because percentile ranks are more interpretable to non-statisticians. The conversion uses the cumulative normal distribution fit to our trimmed dataset.
How Each Test Is Designed
Reaction Time
Take test →Uses a random foreperiod of 1.5–5 seconds drawn from a uniform distribution, preventing subjects from anticipating the stimulus. The target stimulus is a full-field color change (gray → green) to maximize signal salience. Five trials are averaged to reduce intra-individual variability (σ ≈ 30–50ms on a single trial; σ drops to ~15ms on 5-trial average).
Number Memory
Take test →Digits are presented as a single number (not spaced) to avoid chunking advantages. Presentation duration scales with sequence length (1000ms + 100ms × digits). Correct response requires exact replication. Unlike clinical Digit Span, our version does not include the backward condition, which would conflate working memory with executive function.
Aim Trainer
Take test →Target diameter (60px, ~0.85° visual angle at 70cm viewing distance) is held constant to isolate target acquisition time from target detection. Thirty trials are measured and averaged. Inter-target interval starts immediately on click, not after animation, to keep the latency chain clean.
Verbal Memory
Take test →Uses a pool of 500 common English words. New and previously-seen words are intermixed at a 50/50 ratio after the initial encoding phase. d-prime (sensitivity index) is computed internally but reported as "words kept in memory" for interpretability. False alarm rate is penalized in scoring.
Typing Speed
Take test →Uses common word frequency lists (top 200 English words) rather than sentences to minimize the effect of language comprehension on typing speed. WPM is computed as (correct characters ÷ 5) ÷ minutes. Accuracy is calculated as correctly typed words ÷ all typed words.
Limitations & Disclosures
Not a clinical or diagnostic tool
Human Benchmark tests are designed for educational and entertainment purposes. They measure specific, narrow cognitive tasks under uncontrolled conditions. They should not be used to self-diagnose any cognitive disorder, track neurological conditions, or replace professional neuropsychological assessment. If you have concerns about your cognitive health, consult a licensed healthcare professional.
Self-selection bias
Our user base skews toward younger, more tech-savvy individuals with better hardware. Population-level conclusions should not be drawn from our data.
Uncontrolled environment
Unlike lab testing, we cannot control ambient distractions, caffeine state, fatigue, or screen brightness — all of which affect scores.
Single-session variability
Individual performance varies ±15–30% between sessions. A single test session is not fully representative of an individual's stable ability.
Hardware confounds
Device latency (see table above) is not subtracted from raw scores. Users on different hardware cannot be directly compared without knowing their device specs.
Practice effects
Familiarity with a test format improves scores independent of the underlying cognitive ability. First-attempt scores are more valid than repeated scores for cross-user comparison.
Ecological validity
Browser reaction time measures simple RT to a color change — not the complex, choice-based RT relevant to driving, sport, or clinical populations.