AGI progress signal
Percent readiness, not a date forecast.
Public signal strength across AGI bottlenecks. The solid line is evidence through 2026; dashed lines are scenario overlays to 2032.
Current
62%
Uneven readiness
Open research
A composite estimate of AGI readiness, computed as a weighted geometric mean across science-first signal categories: reasoning, long-horizon agency, coding and R&D leverage, multimodal grounding, verification, and deployment readiness. The geometric mean is deliberate — weak bottlenecks (agency, reliability) pull the composite down even when benchmark scores are high. This is curated public-signal strength and editorial readiness, not measured AGI attainment or a probability forecast.
AGI progress signal
Public signal strength across AGI bottlenecks. The solid line is evidence through 2026; dashed lines are scenario overlays to 2032.
Current
62%
Uneven readiness
General cognitive AGI estimate
Cognitive and digital AGI: broad strong-human-level intellectual work, transfer across domains, tool use, multi-step planning, and enough reliability for real deployment. This is not superintelligence, full human replacement, or embodied robotics.
Readiness
63.4%
Rounded 63%; uncertainty 58-68%
Why 63%
Reasoning, coding, multimodal work, and economic knowledge tasks are already strong enough to make 55% too low.
Why not 70%+
Long-horizon agency, reliability, calibration, and robust real-world execution remain the binding constraints.
Science-first discount
A science-first AGI estimate falls closer to 53% because verification, reproducibility, scientific autonomy, and lab deployment dominate.
The geometric mean keeps weak bottlenecks visible: agency and reliability pull the composite down even when benchmark and coding scores are high.
100 * 0.72^0.18 * 0.71^0.18 * 0.76^0.14 * 0.70^0.10 * 0.55^0.18 * 0.48^0.16 * 0.55^0.06 = 63.4%General reasoning & knowledge
Strong GPQA, AIME, and ARC-AGI-1 progress; ARC-AGI-2 and HLE still leave headroom.
w 18%
72/100
Economic knowledge work
GDPval shows frontier models approaching expert work products across many occupations.
w 18%
71/100
Coding & tool use
SWE-bench Verified and related tool-use evals are among the most mature public signals.
w 14%
76/100
Multimodal/context handling
Vision, long-context, document, and screen workflows are much stronger, but not yet universal world-modeling.
w 10%
70/100
Long-horizon agency
The main bottleneck: autonomous tasks are still short, scaffolded, or well-specified.
w 18%
55/100
Reliability/calibration/safety
Hallucination, overconfidence, brittle behavior, and hard-to-detect errors still limit delegation.
w 16%
48/100
Deployment / real-world integration
Productivity is already widespread, but not full autonomous replacement of roles.
w 6%
55/100
Percent is curated public signal strength and editorial readiness, not measured AGI attainment or probability.
Benchmark signals
Each readiness bar above aggregates several public benchmark signals. The records below name the evaluations actually used for scoring — what they measure, the latest tracked result, and why they matter for autonomous science.
Sources for each benchmark are linked inline. Scores update with the dataset; see the methodology section for weighting.
Apr 2026
General model intelligence
Composite score across agents, coding, science, reasoning, knowledge, and instruction following
Provides a production-oriented view of which frontier models are strong enough to act as reasoning engines inside scientific agents.
Current signal
Live leaderboard score, provider, price, speed, latency, and context-window tracking
Apr 2026
Capability trends
Benchmark results across 40+ evaluations, with internal and external result provenance
Useful for seeing whether scientific reasoning, agentic work, math, coding, and multimodal capabilities are improving fast enough to change lab workflows.
Mar 2026
Mathematics
Accuracy on extremely difficult math problems and open-problem variants
High-end mathematical reasoning is one of the cleanest proxies for whether models can contribute to formal scientific discovery.
Apr 2026
Scientific reasoning
Expert-level graduate science multiple-choice accuracy
Directly probes PhD-level physics, chemistry, and biology reasoning, though it remains a static question-answer benchmark.
Mar 2026
Scientific coding
Pass rate on scientific programming tasks
Measures whether models can turn scientific specifications into executable code, a core dependency for autonomous analysis and simulation.
Apr 2026
Agentic software engineering
Resolved real GitHub issues
Software-engineering agents are a leading indicator for whether models can operate long-horizon scientific toolchains and repair failed experiments.
Mar 2026
Long-horizon agents
Pass@1 task completion in realistic multi-application workflows
Lab automation requires agents that coordinate files, tools, state, and multi-step objectives rather than answering isolated prompts.
Mar 2026
Cross-domain expert reasoning
Accuracy on hard expert-written questions
A broad stress test for frontier models, useful only when interpreted alongside domain-specific science benchmarks and tool-use evaluations.
Mar 2025
Bioinformatics agents
Open-answer accuracy on real-world bioinformatics analysis scenarios
Measures whether agents can explore biological datasets, run multi-step analyses, and interpret results rather than only answer static science questions.
Current signal
Public benchmark with 53 analysis scenarios and 296 questions for agentic computational-biology workflows
May 2026
Biology research tasks
Performance across 1,892 practical biology-research tasks spanning literature, databases, sequences, protocols, patents, trials, and source quality
Moves biology-agent evaluation toward practical research work, retrieval, file handling, and tool use rather than short-form knowledge recall.
Current signal
Open dataset and harness with published model comparisons across 11 task families
Apr 2026
Bioinformatics agents
Accuracy and reliability on 99 expert-level bioinformatics tasks with objective ground truth from real experimental data
Tests whether frontier agents can produce reproducible scientific conclusions from messy biological data, including problems not solved by expert panels.
Current signal
Anthropic reports Claude-family and expert baselines, including separate human-solvable and human-difficult task sets
Apr 2026
Genomics and quantitative biology agents
Pass rate on 103 multi-stage scientific data analysis tasks across 10 genomics and quantitative biology domains
Probes whether agents can clean assay or clinical data, run exploratory analysis, select statistical models, and produce conclusions that inform downstream scientific decisions.
Jul 2024
biology
task accuracy and open-response accuracy
Measures biology research-agent skills across literature QA, table/figure reasoning, protocols, databases, sequences, and cloning scenarios.
Sep 2024
scientific literature
precision, accuracy, DOI recall, contradiction-detection AUC
Evaluates retrieval-grounded scientific literature QA, synthesis, and contradiction detection against human experts.
Dec 2025
biology
DockQ AUC, DockQ success rate, LDDT, ligand success rate
Independent benchmark of all-atom biomolecular structure prediction across protein-ligand, protein-protein, antibody, and nucleic-acid tasks.
Current signal
AlphaFold 3 leads most measured all-atom structure-prediction tasks.
Nov 2023
materials
new stable crystal structures on updated convex hull
Measures AI-assisted inorganic materials discovery at the scale of DFT-verified stable structures.
Current signal
GNoME reports 381,000 newly stable convex-hull entries from 2.2M candidate structures.
May 2025
algorithmic science
objective-specific best-known construction score
Measures autonomous code-evolution workflows on verifiable mathematical and scientific optimization objectives.
Current signal
AlphaEvolve found a 48-multiplication 4x4 complex matrix multiplication algorithm versus Strassen 49.
Jun 2025
genomics
benchmark tasks with state-of-the-art performance
Aggregates genome-track prediction and regulatory variant-effect prediction measurements against external genomics baselines.
Current signal
AlphaGenome achieved SOTA on 22 of 24 track-prediction tasks and 25 of 26 variant-effect tasks.
Methodology
Weighting
Each signal category is scored 0–100 and weighted by its contribution to general cognitive AGI: reasoning 18%, economic knowledge work 18%, coding and tool use 14%, multimodal/context 10%, long-horizon agency 18%, reliability and calibration 16%, and deployment integration 6%. Weights sum to 100. The science-first view discounts the composite further because verification, reproducibility, and lab deployment dominate trustworthy autonomy.
Aggregation
Categories combine as a weighted geometric mean, not an arithmetic average. A weak bottleneck cannot be hidden by a strong score elsewhere: 90 in reasoning and 30 in agency is not the same as 60 in both. Geometric mean keeps the binding constraint visible.
Score derivation
Each category score is curated from public benchmark results, agent evaluation suites, deployment metrics, and qualitative evidence. The benchmarks above list the specific evaluations behind each signal. Scenario overlays (Steady, R&D acceleration, Verification-gated) are illustrative, not probabilistic.
Verification sits lowest in the composite. That gap is the one Scivity exists to close — see Verification for what we ship against it.
Source policy
Every benchmark record carries explicit source citations. Source tiers (A regulatory/peer-reviewed, B official/preprint, C industry/media, D rumor/social) and editorial policy are documented in the Landscape methodology.
What this is not
This map is not a probability that AGI arrives by a given date. It does not predict capabilities, take a position on intelligence definitions, or claim measured attainment. Treat it as a structured way to read the field, not a forecast.
Open data
License
AGI Progress Signal Map is published by Scivity Labs under CC BY 4.0. You may reuse, remix, and republish with attribution.
Cite as
Scivity Labs (2026, May). AGI Progress Signal Map [Dataset]. scivity.org/agi-progress
@misc{scivity_agi_progress_2026,
author = {{Scivity Labs}},
title = {AGI Progress Signal Map},
year = {2026},
month = may,
howpublished = {\url{https://scivity.org/agi-progress}},
note = {Dataset. CC BY 4.0}
}