Official A.I Ranking
About

An independent ratings authority for AI.

We exist to answer one question without flinching: which AI products are worth using, and which are not. We test, we rate, and we say so plainly.

Official A.I Ranking was founded in 2023 by Eleanor Hargrove, a former appliance- and electronics- testing editor. When generative AI tools began shipping, she saw the same problem she had spent her career fighting: confident marketing, no independent test, and no plain verdict. She started this publication to apply the consumer-testing discipline — a fixed rubric, blind scoring, and a recommendation — to a category that had none of it.

We are deliberately narrow. We pick a category, we test the field against the same battery of tasks, and we publish one definitive call: Recommended or Not Recommended, with a five-star rating and the evidence behind it. Everything on this page is in service of making that call trustworthy.

The publication at a glance

Founded
2023, by Eleanor Hargrove, Editor-in-Chief.
Advisory board
Three external members who review the rubric, not the verdicts.
Funding
Reader subscriptions and licensing of our test data. No sponsorships, no paid placement, no affiliate revenue.
Verdicts to date
Each verdict is dated and re-run on every major release of the products it covers.

Our Testing Facility

Every verdict is produced at our in-house testing facility. The point of a fixed facility is repeatability: the same hardware, the same network conditions, the same accounts, and the same fixed task batteries, so that a result reflects the product and not the day we test it. We run each tool on identical reference machines, log every prompt and response verbatim, and keep a sealed archive of inputs and outputs so a verdict can be re-checked or contested.

Tests are run on dedicated, paid accounts at the tier a normal buyer would use, never on a press or evaluation tier. Two reviewers grade every result against a written rubric without seeing which tool produced it, so a brand name cannot move a mark. The specific apparatus differs by category, but it is the same suite, run the same way, for every product in a field. The standing suites are below.

The Standard Battery

A fixed set of category-specific tasks — the same prompts, briefs, or codebases for every product in a field — run on identical reference machines and graded blind against a written rubric by two reviewers.

The Reliability Run

The hardest tasks repeated across many runs to measure how often a tool returns a correct, usable result without intervention. A tool that nails one demo but drifts on the tenth attempt is marked down for it.

The Control Bench

Directed-editing trials that count the attempts a tool takes to place or correct one element, constrain a format, or revise a single region — measuring how precisely it can be steered, not just what it produces unprompted.

The Licensing & Safety Read

A line-by-line reading of the published terms against the maker's documentation: how the model was trained, what commercial rights and indemnification the paid plan grants, and how the tool handles unsafe or restricted requests.

The Cost Ledger

A month of real, observed usage priced at the tier we tested, then divided by the number of results we judged usable — so a cheap tool that needs five retries is never allowed to look like a bargain.

The team and its credentials

Verdicts are never anonymous. Each ranking and head-to-head carries the name of the reviewer who ran it, drawn from a standing Testing Desk in which every member owns a category. The desk is led on method by Lionel Sackville, our Head of Test Methodology, who owns the rubric and is responsible for keeping the criteria comparable from one verdict to the next. The full roster, with each reviewer's desk, sits at the foot of this page and on the Criteria page.

The board reviews the rubric and our test design once a year and whenever a major change is proposed.

How a verdict is decided

A verdict starts with a category and a written test plan. The reviewer assembles the field, runs every product through the standing suites at our facility, and records a per-criterion result for each tool. Two reviewers grade the same outputs blind; where their marks disagree by more than a half-star, the result is re-graded and the disagreement is logged. The criterion marks are weighted toward what matters most for the category and resolved into an overall five-star rating to the nearest half.

We recommend products rated four stars and above. A product that clears the threshold carries the solid Recommended stamp; anything below it is marked Not Recommended. Before a verdict is published, the Head of Test Methodology checks that every criterion was tested as the plan specified and that every claim is traceable to a primary source. Nothing is final: each verdict is dated and re-run on every major release, and a recommendation is withdrawn when tools regress.

Conflicts of interest

Reviewers may not hold equity in, consult for, or accept anything of value from a company whose product they test, and they recuse themselves from any category where a past relationship could be questioned. Any relationship that cannot be removed is disclosed on the verdict itself.

Independence and how we are funded

We take no sponsorships and no payment for placement. We are funded by reader subscriptions and by licensing our anonymized test data to libraries and institutional buyers. That model is the whole point: the only thing we have to sell is a verdict you can trust, so the verdict is the only thing we will not compromise.

The Testing Desk
Margaret Ashworth
Senior Reviewer, Image & Video

Margaret Ashworth leads testing of image and video generators and the design tools built on top of them. She grades on prompt fidelity, artifact rates, licensing clarity, and the cost of an acceptable final frame.

Theodore Pruitt
Senior Reviewer, Assistants & Code

Theodore Pruitt evaluates general assistants, reasoning models, and coding tools against fixed task batteries. He weighs accuracy and refusal behavior over benchmark scores, and reports what a tool gets wrong before what it gets right.

Constance Whitfield
Reviewer, Productivity & Knowledge

Constance Whitfield covers search, productivity suites, and knowledge and data tools. Her tests favor citations a reader can verify, exportable output, and pricing that holds up past the free tier.

Lionel Sackville
Head of Test Methodology

Lionel Sackville designs the scoring rubric the Testing Desk uses and runs voice, audio, and cross-category tests. He is responsible for keeping the criteria comparable from one verdict to the next.