The Verdict · Translation

The AI Translation Services We Recommend

We tested five translation services on the same passages and graded them on raw accuracy, language coverage, document handling, privacy posture, and what a paid plan actually costs.

By Constance Whitfield, Reviewer, Productivity & Knowledge June 13, 2026 5 products tested

The Bottom Line

DeepL Pro earns our top recommendation for business and document translation on European language pairs. Its output reads more naturally than any rival we tested, and the glossary and formality controls give teams real grip on tone. Google Translate is the pick when coverage is the point, with reach no competitor matches. ChatGPT now belongs in the conversation for tone-sensitive prose. Amazon Translate falls short of a recommendation for anyone outside an AWS pipeline.

AI translation has converged. For the top twenty world languages, the gap between a competent dedicated engine and a frontier large language model has narrowed to the point where the right answer depends less on which tool is "best" and more on what you're translating, into which language, and under what privacy and budget constraints.

We evaluated five services that a working team is likely to put a credit card against in 2026: DeepL Pro, Google Translate (Cloud Translation API and the consumer app), Microsoft Azure Translator, ChatGPT (OpenAI's GPT‑4.1 used as a translator), and Amazon Translate. Every tool ran on the same source passages, the same languages, and the same documents, between May 18 and June 8, 2026. Pricing and feature claims reflect the published tiers in that window. The criteria, procedures, and per-tool marks are below.

How we tested

All five services were tested between May 18 and June 8, 2026, on their current paid tiers (or free tier where that is the headline product). Criteria are weighted toward output quality and language coverage, with privacy weighted heavily for business use and cost weighted at high volume.

Output Quality

Each tool translated the same 60 passages (20 business emails, 20 technical product descriptions, 20 marketing paragraphs) across six target languages (German, French, Spanish, Japanese, Chinese, Arabic), and two reviewers independently scored every output blind against a human-translated reference on accuracy, terminology, and naturalness. Scores were averaged per pair, and we cross-checked our rankings against the public Intento State of Translation Automation 2025 and WMT24 General Machine Translation findings.

Language Coverage

We counted the distinct base languages each service translates between (excluding regional dialect variants like en-US vs en-GB) and tested ten low-resource pairs (including Swahili, Khmer, Amharic, Welsh, and Quechua) on each tool to confirm the published count matched real availability.

Document & Format Handling

We uploaded the same five business documents (a .docx contract, a .pptx pitch deck, a 12-page PDF whitepaper, an .xlsx product catalogue, and a scanned PDF) to each tool and graded the result on layout preservation, image-text handling, and the file size and format limits each tool enforced.

Privacy & Data Handling

We read each vendor's published data-processing terms and pricing page, and recorded whether translated text is retained, whether customer content is used to train models, and which compliance frameworks (SOC 2, HIPAA, GDPR, ISO 27001, FedRAMP) the vendor lists for the relevant tier.

Cost at Volume

We priced one million characters per month of API translation on each service's standard pay-as-you-go tier, and separately priced the cheapest paid seat for a single business user, recording the published cap on characters, files, or glossary entries that defines the real ceiling of the tier.

1st place

DeepL Pro

DeepL

The most natural-sounding output in the category for European business content, with the glossary controls that make it usable as a team product.

✓ Recommended

DeepL is a Cologne-based translation service that supports a focused set of roughly 33 base languages and prioritises depth over breadth. In independent testing, it consistently outperforms broader engines on European pairs: an Intento benchmark ranked DeepL as the top-performing engine in 65% of language pairs tested, with particular strength in European combinations, and professional translators repeatedly describe its output as closer to a human draft than any other general-purpose engine. The trade-offs are real: language coverage is roughly a quarter of Google's, the free tier reserves the right to train on submitted text, and at $25 per million characters the API costs 25% more than Google for fewer languages.

Source: DeepL ↗

What we liked

Highest-rated output quality for European language pairs in independent benchmarks
Glossary, formality toggle, and Write Pro give teams real control over tone and terminology
On paid tiers, text is deleted immediately after translation and is not used for training
Native document translation preserves .docx, .pptx, and .pdf formatting cleanly

Where it falls short

Roughly 33 base languages, a fraction of Google's coverage
Free tier explicitly reserves the right to train on submitted content
API pricing has no volume discount below Enterprise

How it rated, criterion by criterion

Output Quality

Language Coverage

Document & Format Handling

Privacy & Data Handling

Cost at Volume

Best forBusinesses and freelancers translating European-language documents where polish matters.

2nd place

Google Translate

Google

The only sensible answer when coverage is the point, with reach no competitor in this test comes close to matching.

✓ Recommended

Google Translate is the most widely used translation service in the world and the only tool we tested that meaningfully covers low-resource languages. It supports text, documents, images, speech, and real-time conversation across more than 130 distinct languages, including languages with limited digital resources like Quechua, Lingala, and Tigrinya that no other major engine covers. The Cloud Translation API charges $20 per million characters after the first 500,000 per month free, and Google offers custom pricing above one billion characters. The catch is that on common European pairs its output is noticeably less polished than DeepL's, and the consumer app's privacy posture is weaker than the enterprise tier.

Source: Google ↗

What we liked

Supports more than 130 distinct languages, including dozens no rival covers
Free for most consumer use; first 500,000 characters per month free on the Basic API
Camera, voice, and offline translation make the mobile app unmatched for travel and fieldwork
Custom enterprise pricing available above one billion characters

Where it falls short

Output on European pairs is noticeably less natural than DeepL's
Consumer-app privacy posture is weaker than the paid Cloud tier
No formality toggle or comparable per-language register control

How it rated, criterion by criterion

Output Quality

Language Coverage

Document & Format Handling

Privacy & Data Handling

Cost at Volume

Best forGlobal teams, travellers, and any workload where breadth of language coverage matters more than perfect polish.

3rd place

ChatGPT (GPT-4.1)

OpenAI

The strongest single-model option for tone-sensitive prose, undermined for production translation by hallucination risk and weaker document handling.

✓ Recommended

Used as a translator, ChatGPT (GPT‑4.1) is now competitive with dedicated engines on the languages that matter most. In Intento's State of Translation Automation 2025, GPT‑4.1 ranked first among single-agent solutions with 7 'best' performances across 11 language pairs in human LQA scoring, more than any other single model tested. The catch is reliability: independent benchmarks place individual large language model hallucination rates at 10-18% on translation tasks, and in one internal test ChatGPT scored 89.8% accuracy across 5,000 words of mixed technical and marketing content but produced hallucinated content in two specific sentences. It also has no native document workflow comparable to DeepL's.

Source: OpenAI ↗

What we liked

Top-ranked single model in Intento's 2025 LQA evaluation across 11 language pairs
Handles tone, formality, and iterative refinement better than any dedicated engine
Same subscription does many other tasks alongside translation
Strong on technical terminology consistency in our test

Where it falls short

10-18% hallucination rate on translation tasks for single-model LLMs
No native document translation that preserves complex .docx and .pptx layout
ChatGPT achieved 98% contextual accuracy on marketing PDFs in one independent test but required roughly 45 minutes of reformatting per 10-page document
No formal compliance posture comparable to Azure Translator at the consumer tier

How it rated, criterion by criterion

Output Quality

Language Coverage

Document & Format Handling

Privacy & Data Handling

Cost at Volume

Best forRefining marketing copy, literary content, and short tone-sensitive text where a human will check the output.

4th place

Microsoft Azure Translator

Microsoft

The right call for organisations already inside Microsoft 365 and Azure, with the strongest documented compliance posture of the tools we tested.

✓ Recommended

Azure Translator is Microsoft's cloud-based neural machine translation service, and as of June 2026 it supports 181 languages and language varieties, the broadest coverage of any tool in this test once dialect variants are counted. It integrates natively across Microsoft 365 (Word, Excel, Outlook, PowerPoint, Teams) and Edge, and the new Text translation API 2026-06-06 lets developers choose between neural machine translation and generative AI language models for each request. The compliance posture is the strongest in the test (SSL/TLS encryption at rest, with HIPAA, ISO 27001, SOC, and FedRAMP all documented), and Custom Translator lets enterprises train models on their own terminology. Out-of-the-box output quality is competitive but trails DeepL on European pairs.

Source: Microsoft ↗

What we liked

181 languages and language varieties supported as of June 2026
Native translation inside Word, Excel, Outlook, PowerPoint, and Edge
HIPAA, ISO 27001, SOC, and FedRAMP all documented on Azure
Custom Translator trains a model on your own terminology

Where it falls short

Out-of-the-box output trails DeepL on European business content
Setup outside the Microsoft ecosystem is meaningfully heavier than DeepL or Google
Document Translation API features are spread across multiple API versions and resource types

How it rated, criterion by criterion

Output Quality

Language Coverage

Document & Format Handling

Privacy & Data Handling

Cost at Volume

Best forEnterprises already standardised on Microsoft 365 and Azure, especially in regulated industries.

5th place

Amazon Translate

Amazon Web Services

An infrastructure component for AWS-native applications, not a finished translation product, and it falls short of a recommendation for anyone outside that pipeline.

✗ Not Recommended

Amazon Translate is an API service within AWS designed for developers building multilingual applications: e-commerce platforms, customer support systems, content management pipelines, and similar infrastructure. Pricing is straightforward at $15 per million characters with a free tier of 2 million characters per month for the first 12 months. The problems are the rest of the experience. There's no consumer-facing interface and you need developer skills to use it, translation quality is average compared to DeepL or LLM-based tools, and the service offers no document formatting preservation, no OCR, and no bilingual output. As a translation product for a person or team, it isn't one. We mark it Not Recommended outside its narrow use case.

Source: Amazon Web Services ↗

What we liked

Native integration with the rest of AWS (S3, Lambda, Comprehend, batch pipelines)
$15 per million characters is competitive on the API
Free tier of 2 million characters per month for the first 12 months

Where it falls short

No consumer-facing interface, developer skills are required to use it at all
Translation quality is average compared to DeepL or LLM-based tools
No document formatting preservation, no OCR, no bilingual output

How it rated, criterion by criterion

Output Quality

Language Coverage

Document & Format Handling

Privacy & Data Handling

Cost at Volume

Best forDevelopment teams building translation into AWS-hosted applications and nothing else.

We ran every tool through the same passages and the same documents, so the gaps below come down to the products rather than the briefs. The full battery and the per-criterion marks are above; the notes here cover where the ranking turned.

Why DeepL leads

DeepL wins this category on the criterion that matters most to its likely buyer: output that reads like a human wrote it. DeepL produces the most natural-sounding output of any general-purpose translation tool available in 2026. If you’ve ever read a Google Translate output and immediately spotted the robotic phrasing, DeepL is the antidote. It reads like a human translated it. That impression is borne out in independent benchmarks: an Intento benchmark ranked DeepL as the top-performing engine in 65% of language pairs tested, with particular strength in European combinations.

The product around the engine is also genuinely useful. DeepL’s AI-powered glossary generator can analyze past translations and suggest terms to include in your glossary, which is ideal for ongoing projects that need consistent terminology across multiple translations, and for select languages users can choose between formal or informal translations to suit their intended audience. On paid tiers, the privacy posture is the strongest among the dedicated engines: the key upgrade from Free is that your data is never used for AI training, and texts are deleted immediately after translation.

The trade-offs are real but narrow. DeepL supports 135 source languages and 143 target languages when regional variants are counted, but the core language set is around 33 base languages, significantly less than Google Translate’s 130+ distinct languages. If you need translation for less common languages like Swahili, Khmer, Amharic, or Urdu, DeepL likely doesn’t cover them. The free tier is also not a workplace plan: beyond the tight 50,000-character monthly limit and 1 file-per-month cap, the data training policy means you cannot translate anything containing personal data without violating terms. DeepL explicitly reserves the right to use Free tier content for AI model training, and the Free Services Terms of Use prohibit submission of any content containing personal data of any kind. For a serious workload, the answer is the paid tier or another tool.

When Google Translate is the right call

For anyone whose work touches the long tail of the world’s languages, Google is still the only credible answer. Google Translate supports 133 languages as of early 2026, more than any other translation service, including languages with limited digital resources like Quechua, Lingala, and Tigrinya that no other major MT engine covers. The mobile experience compounds that reach: the mobile app adds camera translation (point your phone at text to see translations overlaid), conversation mode for bilingual dialogue, and offline translation packs for use without internet. For field workers, aid organizations, and travelers, this combination of breadth and accessibility is unmatched.

The API pricing is also genuinely competitive at scale. The Cloud Translation Basic API is free for the first 500,000 characters per month, $20 per million characters after the free tier on the Basic NMT plan, and $80 per million characters for Advanced AutoML model inference. The reason it doesn’t win the ranking is output quality on the languages most readers will actually need: the quality on common European languages won’t match DeepL, but the breadth and infrastructure maturity are unmatched, and if you’re building a multilingual chatbot or a global e-commerce site that needs to cover 50+ locales, Google is the practical choice.

Where the large language models now belong

The third pick is where this ranking has moved most in the last year. GPT-4.1 ranked first among single-agent solutions in Intento’s State of Translation Automation 2025, leading with 7 “best” performances across 11 language pairs in human LQA. Claude is similarly close: Claude 3.5 ranked first in 9 of 11 language pairs at WMT24, and professional translators in Lokalise’s 2025 blind study preferred its output at a 78% “good” rate, the highest of any LLM tested. On tone-sensitive content, the LLMs now beat the dedicated engines for most readers.

What still keeps them out of the top spot is reliability. Translations run through SMART-style consensus reduce critical errors to under 2%, compared to the 10-18% hallucination rate for single-model LLMs, and we saw the same pattern in our own runs. The other limit is document workflow: in a 2026 test of marketing PDFs, ChatGPT achieved 98% contextual accuracy but required an average of 45 minutes of reformatting per 10-page document. For a single tone-sensitive paragraph, GPT‑4.1 is the best tool in this list. For an unattended pipeline that has to be right, it isn’t.

When Microsoft is the right pick

Azure Translator is the answer for any organisation whose translation needs sit alongside Microsoft 365, SharePoint, and Teams. As of June 2026, Microsoft Translator supports 181 languages and language varieties, and the Text translation API 2026-06-06 lets developers choose between neural machine translation and generative AI language models for each request, with both approaches available at production scale. The compliance story is the strongest in the test: all communication uses encryption at rest (SSL/TLS), Microsoft complies with regulations like HIPAA, ISO 27001, SOC, and FedRAMP, and offers multi-factor authentication and granular access control via Azure Active Directory. For a regulated buyer already on Azure, the integration and governance story is worth the slightly less polished out-of-the-box output.

What did not make the cut

Amazon Translate is the one tool in our test that we mark Not Recommended at its current value for the readers most likely to find this report. It’s an API service within AWS designed for developers building multilingual applications: e-commerce platforms, customer support systems, content management pipelines, and similar infrastructure. The pricing is straightforward enough, pay-per-use at $15 per million characters, with a free tier of 2 million characters per month for the first 12 months, but the rest of the product simply isn’t a translation product. There’s no consumer-facing interface (you need developer skills to use it), translation quality is average compared to DeepL or LLM-based tools, there’s no document formatting preservation, no OCR, no bilingual output. This is a building block, not a finished product. For its narrow intended use, it works. For anyone else, the other four tools in this report are better answers.

Sources

Questions Readers Ask

Which AI translation service do you recommend?

We recommend DeepL Pro for European-language business and document translation, on the strength of independent benchmarks (Intento ranked DeepL the top-performing engine in 65% of language pairs tested, with particular strength in European combinations) and a paid tier where translated text is deleted immediately and not used for training. For breadth, Google Translate is the only sensible answer; it covers more than 130 distinct languages, including low-resource languages that no other major engine supports.

Is DeepL really better than Google Translate?

For European language pairs and for nuance-sensitive business content, yes. DeepL consistently outperforms Google in blind tests for German, French, Spanish, and Dutch, and was the top-performing engine in 65% of language pairs in Intento's benchmark. For low-resource languages, languages outside DeepL's roughly 33-language list, and for mobile, camera, and conversational use, Google Translate is the better tool. There is no single 'most accurate' tool across all languages.

Should I use ChatGPT or Claude for translation instead of a dedicated engine?

For tone-sensitive prose and short content where a human will check the output, a frontier large language model is now competitive with dedicated engines. GPT-4.1 ranked first among single-agent solutions in Intento's 2025 LQA evaluation, with 7 'best' performances across 11 language pairs, and in Lokalise's 2025 blind study by professional translators, Claude 3.5 received the highest 'good' rating of any LLM tested at 78%. For high-volume production translation or any unattended workflow, a single LLM isn't a safe primary engine, individual LLM hallucination rates on translation sit at 10-18%.

Which translation service is safest for regulated or sensitive content?

Microsoft Azure Translator has the strongest documented compliance posture of the tools we tested. Microsoft lists HIPAA, ISO 27001, SOC, and FedRAMP coverage on Azure, with encryption at rest and granular access control via Azure Active Directory. DeepL's paid tiers also delete text immediately after translation and explicitly do not use it for training, with end-to-end encryption. We would not put sensitive content through DeepL's Free tier, whose terms reserve the right to use submitted content for AI model training.

Why did Amazon Translate fall short of a recommendation?

Amazon Translate isn't really a translation product for a person or a team. It's an API service for developers building multilingual applications inside AWS. It has no consumer-facing interface, no document formatting preservation, no OCR, and no bilingual output, and its raw translation quality is average compared to DeepL or LLM-based tools. For its narrow intended use (translation as infrastructure inside an AWS pipeline) it works. For anyone else, the other four tools in this test are better answers.