AI Token Counter

Paste your text or prompt to count tokens and estimate API costs across GPT-3.5, GPT-4o, and Claude. Token counts update in real time as you type.

Tokens

Words

318

Characters

Input cost

Cost comparison — all models at 0 tokens

Model	Input / 1K	Input cost	Output / 1K	Output cost
OpenAIo3reasoning	$0.0100	$0.00063	$0.0400	$0.0025
o3-minireasoning	$0.0011	$0.00007	$0.0044	$0.00028
o1reasoning	$0.0150	$0.00094	$0.0600	$0.0038
o1-minireasoning	$0.0030	$0.00019	$0.0120	$0.00076
GPT-4o	$0.0025	$0.00016	$0.0100	$0.00063
GPT-4o mini	$0.00015	<$0.0001	$0.00060	<$0.0001
GPT-4 Turbo	$0.0100	$0.00063	$0.0300	$0.0019
GPT-3.5 Turbo	$0.00050	<$0.0001	$0.0015	$0.00009
AnthropicClaude 3.7 Sonnet~	$0.0030	$0.00027	$0.0150	$0.0014
Claude 3.5 Sonnet~	$0.0030	$0.00027	$0.0150	$0.0014
Claude 3.5 Haiku~	$0.00080	$0.00007	$0.0040	$0.00036
Claude 3 Opus~	$0.0150	$0.0014	$0.0750	$0.0068
Claude 3 Haiku~	$0.00025	<$0.0001	$0.0013	$0.00011
GoogleGemini 2.5 Pro~	$0.0013	$0.00011	$0.0100	$0.00091
Gemini 2.0 Flash~	$0.00010	<$0.0001	$0.00040	<$0.0001
Gemini 1.5 Pro~	$0.0013	$0.00011	$0.0050	$0.00046
Gemini 1.5 Flash~	$0.00007	<$0.0001	$0.00030	<$0.0001
Gemini 1.5 Flash-8B~	$0.00004	<$0.0001	$0.00015	<$0.0001
MetaLlama 3.3 70Bvia Together~	$0.00054	<$0.0001	$0.00088	$0.00008
Llama 3.1 405Bvia Together~	$0.0035	$0.00032	$0.0035	$0.00032
Llama 3.1 8Bvia Together~	$0.00006	<$0.0001	$0.00006	<$0.0001
MistralMistral Large 2~	$0.0030	$0.00027	$0.0090	$0.00082
Mistral Small~	$0.0020	$0.00018	$0.0060	$0.00055
Mistral 7B~	$0.00025	<$0.0001	$0.00025	<$0.0001

~ Token count is estimated for non-OpenAI models (±10%).

Click any row to select that model. Prices per provider published rates, mid-2025.

What is a token and why does it cost money?

Language models don't process text character by character or word by word — they split input into tokens, which are chunks of text that can be whole words, parts of words, or individual characters. The word "tokenization" might become two tokens: "token" and "ization". A space is sometimes merged with the following word. Common short words are usually one token; rare or long words may be several.

API pricing is based on tokens because they directly determine computational work — more tokens means more attention heads processing more positions in more transformer layers. As a rough rule: 1,000 tokens ≈ 750 words in English. Code tends to tokenize less efficiently than prose because identifiers, operators, and whitespace all take tokens.

How tokenization differs between GPT and Claude

GPT-3.5 and GPT-4 models use the cl100k_base encoding (also used by GPT-4o). This is a byte-pair encoding (BPE) vocabulary of ~100,000 tokens. This tool uses the gpt-tokenizer library for an exact count on GPT models.

Claude uses Anthropic's own tokenizer, which is not publicly documented. Anthropic states it tokenizes similarly to other modern LLMs at roughly 3–4 characters per token on average for English text. This tool estimates Claude tokens as text.length / 3.5 — accurate to within ~10% for typical English prose. For precise counts on Claude, use the anthropic.messages.countTokens() API method.

Token pricing across major models

Prices per 1,000 tokens, per provider published rates (mid-2025). Input and output tokens are priced separately — output costs more because generation is more compute-intensive than prefill. The interactive table above lets you compare all models live.

Key pricing tiers to know:

Budget frontier: GPT-4o mini ($0.00015/1K), Gemini 1.5 Flash ($0.000075/1K), Claude 3.5 Haiku ($0.0008/1K), Llama 3.1 8B (~$0.00006/1K) — sub-cent costs even for long documents
Mainstream: GPT-4o ($0.0025/1K), Claude 3.5 Sonnet ($0.003/1K), Gemini 2.0 Flash ($0.0001/1K) — best capability-per-dollar for most production use cases
Flagship: o1 ($0.015/1K), Claude 3 Opus ($0.015/1K), GPT-4 Turbo ($0.01/1K) — maximum capability, meaningful cost at scale
Reasoning: o3 ($0.01/1K input but generates many internal thinking tokens), o3-mini ($0.0011/1K) — pricing can be deceptive since reasoning traces consume output tokens you don't see

Prompt caching cuts input costs dramatically for repeated prefixes — Anthropic charges 10% of the normal input rate on cache hits; OpenAI caches automatically for prompts over 1,024 tokens. For RAG and agent systems with a large fixed system prompt, caching is often the single highest-leverage cost optimization.

Strategies to reduce token usage in production

For applications with significant LLM traffic, token efficiency has a direct impact on cost:

Compress system prompts — remove redundant instructions, use bullet points instead of paragraphs, and trim whitespace. A tightly written system prompt can often be cut by 30–50% without losing behavior.
Use prompt caching — for long static prefixes (instructions, documents, examples), cache them with Anthropic's cache_control breakpoints or OpenAI's automatic prefix caching. Cache hits cost ~10% of normal input pricing.
Right-size your context — only include the context that's actually needed for the current request. A RAG pipeline that fetches 20 chunks when 5 would do is spending 4× on input tokens.
Choose the right model tier — use cheaper models (GPT-3.5 Turbo, Claude Haiku) for classification, routing, or extraction tasks. Reserve expensive models for generation and reasoning.
Batch where possible — OpenAI's Batch API offers 50% discounts for asynchronous workloads. Anthropic's Message Batches API offers similar pricing for non-realtime use cases.

From the blog

More tools

View all →

{}

JSON Formatter

Format, minify, and validate JSON with error highlighting.

→🔑

JWT Decoder

Decode JWT tokens and check expiry status.

→⏱

Cron Parser

Translate cron expressions into plain English.

→✍️

Markdown Preview

Write and preview Markdown with GFM support.

→