Peer-reviewed methodology

The Science Behind
Human Benchmark

Every test is grounded in peer-reviewed cognitive psychology. Here is how we translate validated laboratory paradigms into browser-based measurements - and where the limits lie.

50M+
Data points collected
±5ms
Browser timing precision
50+
Research papers cited
11
Validated test paradigms

Test Paradigms & Clinical Basis

Each test maps to a validated cognitive construct from the neuropsychological literature. The table below shows the source paradigm, the cognitive domain measured, and the key dependent variable. Take any test and compare your score to 50M+ sessions in our test library.

Test Key Metric
Reaction Time Mean RT (ms)
Aim Trainer Mean click time (ms)
Number Memory Longest correct span
Verbal Memory Hit rate − FA rate
Chimp Test Max number recalled
Sequence Memory Max sequence length
Visual Memory Levels completed
Typing Speed WPM with accuracy
Pattern Recognition Response latency (ms)
Processing Speed Items/min
Attention & Focus % correct responses

Full reference list available in the methodology notes. All paradigms adapted for browser delivery with documented validity trade-offs (see Limitations).

Global Benchmark Data

Aggregated from 50M+ anonymized test sessions. Trimmed means exclude the bottom 5% and top 1% of scores to reduce outlier contamination. The chart below shows the Reaction Time distribution — the most completed test on the platform, with over 8.4 million scored sessions.

Reaction Time — Score Distribution

Based on 8.4M scores. Vertical axis = % of users in each 20ms bin. Global mean = 284ms, median = 271ms.

140 180 200 220 240 260 280 300 320 340 360 380 400ms Mean 284ms % users
Percentile Reaction Time Number Memory Typing Speed Aim Trainer
Top 1% <190ms 12+ digits 95+ WPM <200ms
Top 5% <215ms 10+ digits 85+ WPM <250ms
Top 10% <235ms 9+ digits 75+ WPM <280ms
Top 25% <255ms 8+ digits 65+ WPM <320ms
Median 271ms 7 digits 52 WPM 380ms
Bottom 25% >310ms 6 digits &lt;38 WPM >480ms

All benchmarks derived from Human Benchmark database. Scores may differ from clinical norms due to self-selection bias — online testers skew younger and more tech-savvy than general populations.

Age Effects on Cognitive Performance

Cognitive speed peaks in the mid-20s and declines gradually across the lifespan. Reaction Time is the most age-sensitive metric; Number Memory and Verbal Memory follow a similar but slightly delayed curve. The data below is consistent with cross-sectional findings from large-scale studies including the Midlife in the United States (MIDUS) study and Salthouse (2010).

Reaction Time by Age Group

Mean reaction time (ms) across age brackets. Data from 12M scored sessions with self-reported age.

200 250 300 350 400 16–19 20–24 25–29 30–34 35–39 40–44 45–49 50–54 55–59 60–64 65–69 70+ Age group RT (ms)
Age Group Mean RT Number Memory Typing WPM Verbal Memory
16–19 245ms7.2 digits55 WPM61 words
20–29 (peak) 240ms7.8 digits60 WPM64 words
30–39 255ms7.4 digits57 WPM60 words
40–49 277ms6.9 digits52 WPM55 words
50–59 309ms6.5 digits46 WPM50 words
60–69 355ms5.9 digits40 WPM44 words
70+ 401ms5.3 digits33 WPM38 words

Age data is self-reported and therefore noisy. Younger cohorts may be over-represented due to internet demographics. See Limitations.

Device Latency & Measurement Precision

Browser-based testing cannot eliminate hardware latency. We use performance.now() which provides sub-millisecond precision for JavaScript timing, but the total measurement chain includes display response time, input device latency, and OS interrupt scheduling. This is especially relevant for the Reaction Time and Aim Trainer tests, where reported scores may be 15–50ms slower than your true neurological speed.

Estimated Total Hardware Latency by Setup

Added latency from hardware — not your brain. Lower is better. Values are typical ranges; individual hardware varies.

Gaming monitor + wired mouse (144Hz, 1ms) ~12–25ms
Desktop monitor + wired mouse (60Hz, 5ms) ~25–45ms
Desktop monitor + wireless mouse ~35–65ms
Laptop (60Hz IPS) ~30–60ms
Tablet touchscreen ~50–90ms
Smartphone touchscreen ~55–100ms
Latency Source Typical Range Mitigation
Display response time 1–50ms Use a gaming or high-refresh monitor
Mouse polling rate 1–8ms Use a wired mouse at 1000Hz polling
OS interrupt scheduling 0–15ms Cannot be fully controlled in-browser
Browser event loop 0–4ms We use requestAnimationFrame where possible
JavaScript timer drift 0–1ms performance.now() is used throughout
Network / server 0ms All timing is client-side only

Statistical Methodology

Trimmed Means

Percentile calculations use a 5%/1% trimmed mean to exclude outliers. This removes scores from users testing in unusual conditions (e.g., running software updates, tab switching during test) without discarding the whole session.

Rolling Baseline

Percentiles are computed against a rolling 90-day window rather than all-time data. This corrects for population drift — the average user today is slightly faster than 5 years ago due to better hardware.

Multi-Attempt Averaging

The Reaction Time test and Aim Trainer require 5 and 30 attempts respectively before reporting a final score. Single-attempt measurements have high variance; averages reduce the standard error by ~√N.

Percentile Rank Reporting

We report "top X%" rather than raw z-scores because percentile ranks are more interpretable to non-statisticians. The conversion uses the cumulative normal distribution fit to our trimmed dataset.

How Each Test Is Designed

Reaction Time

Take test →

Uses a random foreperiod of 1.5–5 seconds drawn from a uniform distribution, preventing subjects from anticipating the stimulus. The target stimulus is a full-field color change (gray → green) to maximize signal salience. Five trials are averaged to reduce intra-individual variability (σ ≈ 30–50ms on a single trial; σ drops to ~15ms on 5-trial average).

Number Memory

Take test →

Digits are presented as a single number (not spaced) to avoid chunking advantages. Presentation duration scales with sequence length (1000ms + 100ms × digits). Correct response requires exact replication. Unlike clinical Digit Span, our version does not include the backward condition, which would conflate working memory with executive function.

Aim Trainer

Take test →

Target diameter (60px, ~0.85° visual angle at 70cm viewing distance) is held constant to isolate target acquisition time from target detection. Thirty trials are measured and averaged. Inter-target interval starts immediately on click, not after animation, to keep the latency chain clean.

Verbal Memory

Take test →

Uses a pool of 500 common English words. New and previously-seen words are intermixed at a 50/50 ratio after the initial encoding phase. d-prime (sensitivity index) is computed internally but reported as "words kept in memory" for interpretability. False alarm rate is penalized in scoring.

Typing Speed

Take test →

Uses common word frequency lists (top 200 English words) rather than sentences to minimize the effect of language comprehension on typing speed. WPM is computed as (correct characters ÷ 5) ÷ minutes. Accuracy is calculated as correctly typed words ÷ all typed words.

Limitations & Disclosures

Not a clinical or diagnostic tool

Human Benchmark tests are designed for educational and entertainment purposes. They measure specific, narrow cognitive tasks under uncontrolled conditions. They should not be used to self-diagnose any cognitive disorder, track neurological conditions, or replace professional neuropsychological assessment. If you have concerns about your cognitive health, consult a licensed healthcare professional.

Self-selection bias

Our user base skews toward younger, more tech-savvy individuals with better hardware. Population-level conclusions should not be drawn from our data.

Uncontrolled environment

Unlike lab testing, we cannot control ambient distractions, caffeine state, fatigue, or screen brightness — all of which affect scores.

Single-session variability

Individual performance varies ±15–30% between sessions. A single test session is not fully representative of an individual's stable ability.

Hardware confounds

Device latency (see table above) is not subtracted from raw scores. Users on different hardware cannot be directly compared without knowing their device specs.

Practice effects

Familiarity with a test format improves scores independent of the underlying cognitive ability. First-attempt scores are more valid than repeated scores for cross-user comparison.

Ecological validity

Browser reaction time measures simple RT to a color change — not the complex, choice-based RT relevant to driving, sport, or clinical populations.

Explore the tests