AI Token Counter
Paste your text or prompt to count tokens and estimate API costs across GPT-3.5, GPT-4o, and Claude. Token counts update in real time as you type.
0
Tokens
52
Words
318
Characters
$0
Input cost
| Model | Input / 1K | Input cost | Output / 1K | Output cost |
|---|---|---|---|---|
| OpenAIo3reasoning | $0.0100 | $0.00063 | $0.0400 | $0.0025 |
| o3-minireasoning | $0.0011 | $0.00007 | $0.0044 | $0.00028 |
| o1reasoning | $0.0150 | $0.00094 | $0.0600 | $0.0038 |
| o1-minireasoning | $0.0030 | $0.00019 | $0.0120 | $0.00076 |
| GPT-4o | $0.0025 | $0.00016 | $0.0100 | $0.00063 |
| GPT-4o mini | $0.00015 | <$0.0001 | $0.00060 | <$0.0001 |
| GPT-4 Turbo | $0.0100 | $0.00063 | $0.0300 | $0.0019 |
| GPT-3.5 Turbo | $0.00050 | <$0.0001 | $0.0015 | $0.00009 |
| AnthropicClaude 3.7 Sonnet~ | $0.0030 | $0.00027 | $0.0150 | $0.0014 |
| Claude 3.5 Sonnet~ | $0.0030 | $0.00027 | $0.0150 | $0.0014 |
| Claude 3.5 Haiku~ | $0.00080 | $0.00007 | $0.0040 | $0.00036 |
| Claude 3 Opus~ | $0.0150 | $0.0014 | $0.0750 | $0.0068 |
| Claude 3 Haiku~ | $0.00025 | <$0.0001 | $0.0013 | $0.00011 |
| GoogleGemini 2.5 Pro~ | $0.0013 | $0.00011 | $0.0100 | $0.00091 |
| Gemini 2.0 Flash~ | $0.00010 | <$0.0001 | $0.00040 | <$0.0001 |
| Gemini 1.5 Pro~ | $0.0013 | $0.00011 | $0.0050 | $0.00046 |
| Gemini 1.5 Flash~ | $0.00007 | <$0.0001 | $0.00030 | <$0.0001 |
| Gemini 1.5 Flash-8B~ | $0.00004 | <$0.0001 | $0.00015 | <$0.0001 |
| MetaLlama 3.3 70Bvia Together~ | $0.00054 | <$0.0001 | $0.00088 | $0.00008 |
| Llama 3.1 405Bvia Together~ | $0.0035 | $0.00032 | $0.0035 | $0.00032 |
| Llama 3.1 8Bvia Together~ | $0.00006 | <$0.0001 | $0.00006 | <$0.0001 |
| MistralMistral Large 2~ | $0.0030 | $0.00027 | $0.0090 | $0.00082 |
| Mistral Small~ | $0.0020 | $0.00018 | $0.0060 | $0.00055 |
| Mistral 7B~ | $0.00025 | <$0.0001 | $0.00025 | <$0.0001 |
~ Token count is estimated for non-OpenAI models (±10%).
Click any row to select that model. Prices per provider published rates, mid-2025.
What is a token and why does it cost money?
Language models don't process text character by character or word by word — they split input into tokens, which are chunks of text that can be whole words, parts of words, or individual characters. The word "tokenization" might become two tokens: "token" and "ization". A space is sometimes merged with the following word. Common short words are usually one token; rare or long words may be several.
API pricing is based on tokens because they directly determine computational work — more tokens means more attention heads processing more positions in more transformer layers. As a rough rule: 1,000 tokens ≈ 750 words in English. Code tends to tokenize less efficiently than prose because identifiers, operators, and whitespace all take tokens.
How tokenization differs between GPT and Claude
GPT-3.5 and GPT-4 models use the cl100k_base encoding (also used by GPT-4o). This is a byte-pair encoding (BPE) vocabulary of ~100,000 tokens. This tool uses the gpt-tokenizer library for an exact count on GPT models.
Claude uses Anthropic's own tokenizer, which is not publicly documented. Anthropic states it tokenizes similarly to other modern LLMs at roughly 3–4 characters per token on average for English text. This tool estimates Claude tokens as text.length / 3.5 — accurate to within ~10% for typical English prose. For precise counts on Claude, use the anthropic.messages.countTokens() API method.
Token pricing across major models
Prices per 1,000 tokens, per provider published rates (mid-2025). Input and output tokens are priced separately — output costs more because generation is more compute-intensive than prefill. The interactive table above lets you compare all models live.
Key pricing tiers to know:
- Budget frontier: GPT-4o mini ($0.00015/1K), Gemini 1.5 Flash ($0.000075/1K), Claude 3.5 Haiku ($0.0008/1K), Llama 3.1 8B (~$0.00006/1K) — sub-cent costs even for long documents
- Mainstream: GPT-4o ($0.0025/1K), Claude 3.5 Sonnet ($0.003/1K), Gemini 2.0 Flash ($0.0001/1K) — best capability-per-dollar for most production use cases
- Flagship: o1 ($0.015/1K), Claude 3 Opus ($0.015/1K), GPT-4 Turbo ($0.01/1K) — maximum capability, meaningful cost at scale
- Reasoning: o3 ($0.01/1K input but generates many internal thinking tokens), o3-mini ($0.0011/1K) — pricing can be deceptive since reasoning traces consume output tokens you don't see
Prompt caching cuts input costs dramatically for repeated prefixes — Anthropic charges 10% of the normal input rate on cache hits; OpenAI caches automatically for prompts over 1,024 tokens. For RAG and agent systems with a large fixed system prompt, caching is often the single highest-leverage cost optimization.
Strategies to reduce token usage in production
For applications with significant LLM traffic, token efficiency has a direct impact on cost:
- Compress system prompts — remove redundant instructions, use bullet points instead of paragraphs, and trim whitespace. A tightly written system prompt can often be cut by 30–50% without losing behavior.
- Use prompt caching — for long static prefixes (instructions, documents, examples), cache them with Anthropic's cache_control breakpoints or OpenAI's automatic prefix caching. Cache hits cost ~10% of normal input pricing.
- Right-size your context — only include the context that's actually needed for the current request. A RAG pipeline that fetches 20 chunks when 5 would do is spending 4× on input tokens.
- Choose the right model tier — use cheaper models (GPT-3.5 Turbo, Claude Haiku) for classification, routing, or extraction tasks. Reserve expensive models for generation and reasoning.
- Batch where possible — OpenAI's Batch API offers 50% discounts for asynchronous workloads. Anthropic's Message Batches API offers similar pricing for non-realtime use cases.