Every time an engineer opens Cursor or Claude Code and starts building against your design system, the AI is already reading. Before anyone types a word, it has your token names, your component names, your prop signatures, your documentation. That's the starting context. Everything it generates flows from what it finds.
Your design system has been writing prompts for AI tools this whole time. I hadn't thought about it that way until recently.
I think about this differently now than I did a year ago. I've written about naming, and I've written about agents consuming design systems. But what I hadn't articulated until recently is that vocabulary and AI context have collapsed into the same thing. The names you chose for your tokens three years ago, the prop signatures you debated in a pull request nobody remembers, the component descriptions you wrote (or skipped) in your documentation. All of that is active context in an AI session right now. It's influencing generated code across your team, and most of those naming decisions were made long before AI-assisted development was on anyone's radar.
How vocabulary travels
When an AI coding tool encounters your design system, it builds understanding from whatever it can parse. Token names are the most immediate layer. color-background-error tells a tool what that value is for and where it belongs. red-700 tells it a shade, and nothing about when or why to use it. When an engineer prompts the tool to build something that handles error states, those two names produce very different starting points for the generated code.
Component names carry similar weight. Shopify's Polaris has components called EmptyState, InlineNotification, and ResourceItem. You can roughly guess what each one does before reading the docs. Compare that to AlertWrapper, CardBase, or BlankCard, which tell you almost nothing about purpose or behaviour. An experienced engineer on the team might know what those do from memory. An AI tool working from the codebase alone has to guess, and guessing at the component level means the entire structure of the generated output is built on an assumption.
Props work the same way. A component with a variant prop accepting "primary" | "destructive" | "ghost" signals meaning through its API. type: 1 | 2 | 3 signals structure with no meaning attached. When an AI tool has to decide which variant to use, the first set gives it something to reason about. The second is a coin flip, and the engineer won't always catch a bad flip in review.
And then there's documentation. When your docs explain why a component works the way it does, that reasoning becomes part of the context window. When docs describe only visual appearance, the tool has to infer intent on its own. Teams often underestimate how much weight AI tools give to structured documentation. For many tools, it's where understanding starts. A well-written usage guideline can do more work than a dozen well-named props if it captures the intent behind the component's design decisions.
All of these layers feed into the same context window. When they agree, the tool gets a clear signal. When they don't, things get unreliable fast.
The token that made sense to everyone except the tool
I've been running older projects through AI coding tools as part of my AI-readiness work, partly to see where they break. One of them had a spacing scale that I remembered being proud of at the time. space-sm, space-md, space-lg, space-section. The first three were obvious. The fourth was the gap between major content blocks. The team had discussed it, decided on it, and used it consistently.
When I fed the project into Cursor to test layout generation, the tool picked up space-sm, space-md, and space-lg without any trouble. But it kept using space-lg where space-section should have gone. It never reached for space-section on its own, because nothing in the token name or the documentation explained the relationship between that token and the others. To the tool, space-section looked like it belonged to a different system entirely. We'd named three tokens by size and one by purpose. Two naming conventions, one scale, and the tool had no way to reconcile them.
The fix would have been simple. Rename it space-xl, add a usage note explaining it was intended for section-level spacing, and the AI would have a clear path through the scale.
But the experience changed how I think about naming. I used to ask "would a person understand this?". Now I ask a harder version of the same question, "would someone with no ability to ask a clarifying question understand this?". The gap between those two answers is bigger than I expected, and it shows up in AI output almost immediately.
The new hire test
That experience turned into a mental model I've been using in audit work since.
Imagine a capable engineer on their first day. They have no prior context, no Slack history to search, nothing beyond what the system itself communicates. If they read your token names, would they understand what those tokens are for? If they scanned your component names and prop structures, would they have a reasonable sense of what to reach for and how to use it?
If the answer is mostly no, an AI tool is having the same experience.
A new hire asks questions. They message a colleague, they jump into a thread, they find someone who can explain the gap between what the system says and what it means. They fill in missing context through conversation. An AI tool pattern-matches on what's available. The gaps don't generate questions. They get filled through inference, and inference based on ambiguous signals produces ambiguous output.
I ran into a good example of this during a recent audit. The design team called a colour color-background-error. The codebase had $errorColor. The documentation said "use red for errors". One concept, three vocabularies. The team navigated this without effort because they'd internalised the mapping years ago. It was tribal knowledge, the kind of thing you absorb after a few weeks on the job without anyone formally teaching it to you.
But when an AI tool hit those three signals, it treated them as potentially unrelated. It generated code that referenced $errorColor in one place and created a new error-bg variable in another, because nothing in the system confirmed they described the same thing.
It took ten minutes to fix. But that correction reset the AI's context for the next generation pass, so the engineer had to re-establish intent before the tool could try again. And this was one token, in one component, for one engineer. Multiply that by every ambiguous signal across the system, across every engineer on the team using AI tools, and you start to see how fragmented vocabulary creates a kind of drag that's hard to measure but impossible to ignore once you notice it.
How this scales across a team
One engineer hitting one ambiguous token is a minor annoyance. But a team of fifteen engineers, all using AI-assisted workflows, all hitting the same ambiguities in different contexts, produces a pattern that's hard to trace back to its source.
The output looks inconsistent. Engineers are generating code that references different variable names for the same concept, different component patterns for the same use case. Code reviews catch some of it. Linting catches some more. But some of it ships, and once it's in production, the codebase itself becomes a new source of ambiguous signal for the next AI session. The tool reads the existing code as context, sees the inconsistencies you've already shipped, and treats them as valid patterns.
I've seen teams blame the AI tooling for this. "The tool keeps generating the wrong thing". But when you trace it back, the tool was reading your system accurately. It just couldn't tell which of three conflicting signals to trust, and it had nobody to ask.
This accumulates in a way that's hard to see from inside a single sprint. You only notice it when you zoom out and look at the codebase after a few months of AI-assisted development. By then the vocabulary has diverged further, and the tool is working from a noisier signal than when it started.
What coherence buys you
When your token names, component names, documentation, and code variables all describe the same concept in the same language, an AI tool working across those layers gets a consistent signal. It can make accurate decisions because the vocabulary lines up. Engineers spend less time correcting output because the tool had less to guess about in the first place.
I didn't expect this when I started paying attention to AI tools in design system contexts. The prompt matters far less than what was already there. Teams that have maintained consistent vocabulary across their system, often for reasons that had nothing to do with AI, are now getting noticeably better results. The consistency was paying for itself in onboarding speed and code review efficiency for years, and now it's paying dividends in a completely different context.
Consistent vocabulary won't fix a poorly structured component or a token architecture with no semantic layer. Plenty of other factors influence output quality. But names are what a tool encounters first in every layer of your system, and ambiguity in names cascades further than ambiguity in almost anything else.
Who reviews this?
Most design system teams have a process for reviewing component APIs before they ship. Some have processes for reviewing token names. Very few have a process for reviewing how the system reads to a tool that's encountering it for the first time, with no context beyond what's written down.
That gap matters more now than it did two years ago, and it's going to matter more again next year. As engineering workflows run increasingly through AI-assisted sessions, the vocabulary of your design system is functioning as live context in those sessions.
I don't think the answer is a new role or a formal review process. I think the answer is the new hire test, applied deliberately and periodically. Walk through your system the way a tool would, reading only what's there, with no ability to ask a follow-up question. Notice what's clear. Notice where you'd need someone to explain something that the system doesn't explain itself.
Those gaps are where your system is writing prompts on your behalf, in every AI-assisted session, across every engineer on the team.
Thanks for reading! If you enjoyed this article, subscribing is the best way to keep up with new posts. And if it was useful, passing it on to someone who'd find it relevant is always appreciated.
You can find me on LinkedIn, X, and Bluesky.
Is your design system ready for AI? AI agents are already consuming design systems. Find out if yours is structured to be understood by them.

Member discussion