ai workflows

AI Token

A token is the basic unit that a language model reads, thinks in, and writes back. Not words. Not characters. Tokens. OpenAI's tokenizer splits "unbelievable" into two tokens: "unbel" and "ievable." "Hello" is one token. A space before "Hello" is technically a different one. The model doesn't see language the way you do. It sees a stream of numeric IDs, each mapped to a fragment of text, and it processes those IDs in sequence to predict what comes next. The concept exists because computers can't read prose. Tokenization is the translation layer between human text and the arithmetic that actually runs inference.

The most common mistake is treating tokens as words. They are not. English averages around 0.75 words per token, which means 100 tokens is roughly 75 words. That ratio falls apart fast outside standard English. Code is token-heavy because variable names like `handleUserAuthenticationCallback` get sliced into many fragments. Languages like Japanese or Arabic tokenize differently than English, often requiring more tokens to express the same concept. The tokenizer is language-specific and model-specific. GPT-4's tokenizer, called cl100k_base, splits text differently than Claude's, which differs from Llama's.

Tokens are also not the same as context. Context is the bucket. Tokens fill the bucket. Confusing the two is like confusing gallons with the tank. A GPT-4 Turbo context window holds 128,000 tokens. That is roughly 96,000 words, or a short novel. Whether you use that space efficiently depends on how token-dense your inputs are, not on some abstract "context" setting.

In 2023, OpenAI launched GPT-4 with an 8,192-token context window. That felt enormous at the time. Nine months later they shipped GPT-4 Turbo at 128,000 tokens, and the design community largely didn't register the shift, even though it changed what was architecturally possible in a single request. A 128K window can hold an entire brand guidelines PDF, 200 pages of interview transcripts, or 50 rounds of back-and-forth conversation. The token count determines whether your project brief fits in one shot or gets chopped into pieces.

Claude 3.5 Sonnet, released in mid-2024, has a 200,000-token context window. A glossary entry like this one costs roughly 800 to 1,200 tokens to generate, including system prompt overhead. Paste in a full website sitemap, 10 article drafts, and a style guide and you're sitting around 40,000 tokens. You still have 160,000 tokens of headroom. That is not a detail. That is a workflow.

Understanding tokens earns its keep any time you're building prompts at scale or paying for API access by the token. The OpenAI API and Anthropic API both price on input tokens plus output tokens separately. Knowing that a 500-word article draft costs roughly 650 input tokens helps you budget, batch, and avoid burning money on unnecessary verbosity. It also explains the model's apparent memory problem. When a long conversation seems to make the model "forget" earlier instructions, the model didn't forget. The tokens filled the window and earlier content dropped out of scope.

Where token math doesn't matter is casual use inside a consumer chat product like Claude.ai or ChatGPT. The UI handles all of it. You don't count tokens when you write a Slack message. You don't need to count them in a chat window either, unless you're actively bumping against limits and wondering why the model keeps losing the thread.

One thing worth knowing: token count is auditable before you send. OpenAI's Tokenizer playground on the developer platform lets you paste any text and watch it chunk in real time. Third-party tools like Tiktokenizer do the same for multiple models side by side. If you're designing prompt templates that will run thousands of times, spending ten minutes with a tokenizer before committing to a structure will save a disproportionate amount of money later.

A token is not a unit of language. It is a unit of attention, and every model only has so much of it.

Read the full guide

Continue with the parent article

Related terms

Keep exploring

ai workflows

Context Window

The total amount of text, code, and conversation history an AI model can hold in active memory during a single session. Measured in tokens, not words.

ai workflows

Token Reuse

The compounding effect where each new AI response requires reprocessing all previous conversation tokens, increasing latency and cost with every turn.