I've been watching the AI-powered design-to-code revolution unfold with equal parts excitement and dread. The excitement is obvious β AI tools can now read Figma files as structured data β not just screenshots. It feels like magic.
The dread? That comes from years of working with design systems that look polished on the surface, but are quietly rotting underneath. π¬
And now, that dread is getting exposed β not by your team, but by your tools.
Model Context Protocol (MCP) servers give AI access to structured, contextual data β not just surface-level cues. In development, they expose relationships in codebases. In infrastructure, they reflect architecture. And now, in design, they offer a direct line into the structure beneath your files.
That's what makes Figma's Dev Mode MCP server so significant. It feeds real design system context β components, tokens, constraints, naming β into tools like Cursor, Claude, and Windsurf. AI can interpret your file the way a developer would: by reading what it's actually made of.
This isn't just a productivity layer. It turns your design system into a live, parseable interface. And if that system is messy, inconsistent, or drifting? The AI sees it β and builds accordingly.
The entropy hiding in plain sight
Even the most meticulously maintained design systems accumulate what I call design system entropyΒ β small inconsistencies that compound over time. A duplicated button here, a slightly different spacing value there, a component that's been detached and modified just enough to "make it work".
Humans are brilliant at working around this chaosΒ β when we see btn_primary
in one file, and button[variant=primary]
in another, our brains can connect the dots automatically. When we see spacing.large in Figma and space-lg in code, we know theyβre supposed to be the same thing βΒ weβve become expert translators of our own inconsistencies.
But AI doesnβt adapt quite the same. It can assume, but it doesnβt forgive. It will reflect exactly what youβve built βΒ mess and all.
How AI actually sees your system
With tools like Figma's MCP server, AI can access component metdata, auto-layout constraints, layer names and hierarchy, and design tokens. This is incredibly valuable βΒ instead of hallucinating code from flat .png files, AI can understand the tokenised values assigned to a component. It can see relationships between elements, understand layout logic, and preserve semantic meaning.
But here's the thing: if your design file is inconsistent β detached components, raw hex values, missing tokens β that's exactly what AI sees. It doesn't pause when things get messy; it tries to fill in the blanks. Sometimes the guesses are close enough. Other times, they drift just far enough to break your intended logic.
Every subtle drift becomes another deviation to maintain β another fork in the system that chips away at consistency. The trouble doesn't come from malice or malfunction, but from a machine doing its best with incomplete structure.
Guesswork is not reuse
AI systems like Cursor or Claude donβt just regurgitate your design files. They parse them, analyse them, and often make surprisingly competent assumptions β especially when your system gives them clear patterns to work with.
But even the best AI canβt guess your intent when your design system contradicts itself, hides meaning behind visual styling, or drops semantic structure altogether. Thatβs when things start to slip. Not because the model is unintelligent, but because the signal youβre giving it is noisy.
Below, Iβve detailed four real-world patterns where AI has a strong tendency to βfill in the blanksβ.
Visual twins, semantic strangers
Imagine there are two buttons βΒ both blue, both labelled βSubmitβ. One uses the official Button/Primary
component with --color-button-primary
. The other is a one-off, hand-drawn rectangle with a raw hex value of #3B82F6
applied to its background.
To a human, these buttons are functionally identical. But to a machine, one is a known system component βΒ the other is just a styled frame. If the AI recognises the structure from elsewhere in your codebase, it might reuse the component. But if it canβt identify the similarities and infer the context, it will often rebuild it from scratch βΒ and thatβs when hardcoded styles and one-off logic can begin to creep in.
π Why it matters: Visual similarity doesn't equal structural parity. AI doesn't "see" that these two buttons are meant to be the same β it sees a known component and a mystery box. Without a structural link to your system (like a component instance or importable reference), the AI has no safe way to reuse logic. That means more duplicate code, more inconsistent behaviours, and a growing maintenance burden every time someone tweaks a one-off instead of the source.
Token ambiguity
When one frame uses color.brand.primary
and another applies #3B82F6
directly, the AI might assume theyβre equivalent. Sometimes itβs correct. Other times, it might assign the wrong token, or create a redundant one.
When multiple tokens point to the same raw value β say, both color.brand.primary
and color.interactive.hover
reference #3B82F6
β skipping the token removes semantic clarity. The AI sees the hex value, but not the intent. That opens the door to wrong guesses, misplaced reuse, or redundant logic.
π Why it matters: Tokens are how your system encodes intent. When they're skipped, the AI has to guess whether #3B82F6
is meant to signal brand identity, interactivity, or something else entirely. This leads to incorrect mapping, duplicate tokens in code, or subtle inconsistencies that break down theming, dark mode, accessibility, and localisation efforts. Consistent token usage ensures every colour, spacing, and motion choice is deliberate β and machine-readable
Component ghosts
In a fast-paced environment βΒ where speed and exploration go hand-in-hand β itβs common to duplicate and detach components, flatten layouts, nudge labels, change colours, etc. in the name of exploration. As βfinalisedβ layouts begin being pieced together, and that detached Modal or Checkbox creeps into your interface, thatβs no longer a component. Itβs just grouped rectangles and text.
While some models can detect visual similarity, they wonβt assume intent unless youβve preserved structure. That means: no import statements, no prop mapping, no reuse. Just a fresh blob of bespoke code.
π Why it matters: Detached components lose the metadata AI relies on to understand meaning β like slots, props, and variant logic. To the machine, that detached Modal is just rectangles and text. So it generates fresh code, hardcodes styles, and severs the connection to your system source. That's how system drift starts: not with intent to deviate, but by losing its integrity in the name of speed. Eventually, you end up managing five versions of the same thing β and none of them stay in sync.
The drift multiplier
Imagine tweaking a component instance in Figma, then your coworker needs to reference your design for a new feature theyβre working on. Suddenly, you have three visually similar components, with slight variations in spacing, radius or behaviour. The AI doesnβt know which one is canonical, so it generates new logic for each.
Itβs not a mistake, itβs just building what it sees. It assumes intent.
π Why it matters: Drift is exponential. What starts as a one-off tweak quickly spreads into parallel versions of the same component, each slightly more βoff' than the last. AI tools treat these as distinct inputs, generating new logic for each deviation. That fragments your codebase, increases test surface, and forces your team to debug differences that were never intentional. The longer the drift is left unchecked, the harder it becomes to re-align β especially when AI is silently codifying every variant into your stack.
How to reduce friction: What to fix first
Instead of blaming AI for not being "smart enough," or assuming the tools aren't "quite there yet," the better question is: are we giving it the right signals?
Those signals aren't magic β they're the fundamentals: clear naming, consistent use of tokens, proper component structure, and alignment between design and code. When those things are in place, AI tools don't have to guess β they can infer. And the quality of the code output improves dramatically.
If your button component uses meaningful variants, exposes semantic slots, and references tokens instead of raw values, the AI can confidently generate a match. If your design file is a scattered mess of detached layers, duplicated logic, and hardcoded styles, it will reflect that back in your code β not because the model is broken, but because the signal is noisy.
Quality follows structure
Throughout testing, well-structured Figma files with consistent naming, layout conventions and token usage produced decent AI-generated code: proper imports, token references and clean component structure.
But as soon as structure disappeared βΒ detached instances, missing tokens, ambiguous layer names βΒ the output quality collapsed. It became difficult to determine whether the AI was making correct assumptions, justifiably creating new token values, or assigning the incorrect tokens because they mapped to the same raw value.
The AI didn't fail. It was just working with what it was given.
The real test: is your system AI-ready?
Here's a quick audit you can run right now:
Component integrity: Are your components actually being used, or are teams constantly detaching and duplicating them?
Token consistency: Are you using design tokens systematically, or falling back to raw values when things get tricky?
Structural alignment: Do your component variants make sense as API props, or are they buried in confusing layer hierarchies?
Design-code parity: Do your Figma components actually match what developers are building?
Future-proofing: Have you tested your system with tools like Windsurf, Claude, or Cursorβs MCP integration?
The interface is structural, not visual
As AI becomes the bridge between design and development, structure becomes the new interface. Every missing token, every detached component, every inconsistent naming pattern becomes a break in translation. AI tools surface different types of context βΒ pattern metadata, interactivity models, and content relationships βΒ but they can only work with the structure you provide.
This isn't a limitation of AI tooling β it's a feature. The tools are showing you exactly where your system needs work, forcing a level of discipline that's been optional until now.
I've seen teams get frustrated when AI generates suboptimal code from their "perfectly good" designs. But when we dig deeper, we always find the same culprits: inconsistent token usage, detached components, or naming conventions that made sense to humans but confuse machines.
Discipline over prompt engineering
The most impressive AI-generated code doesn't come from teams writing groundbreaking prompts, or the most detailed documentation. It comes from teams with the most disciplined design systems β the ones who've been doing the unglamorous work of maintaining consistency, updating tokens, and keeping components aligned.
AI doesn't fix your design system β it amplifies it. If your system is solid, AI makes it more powerful. If it's inconsistent, AI makes that inconsistency impossible to ignore.
That reflection, uncomfortable as it might be, is exactly what we need. Because the teams that take this feedback seriously β that use AI as a diagnostic tool to identify and fix systemic inconsistencies β those are the teams that will build the next generation of truly scalable design systems.
System discipline isn't just good practice anymore β it's the price of entry. Build for AI, or get outbuilt by those who do. If we treat AI as a system partner β not a silver bullet β we'll build systems that not only scale, but adapt.
The promise of AI is real β but only if your system is ready for it. Build with structure, or spend your time cleaning up after your own ambiguity.
Thanks for reading! This article is also available on Medium, where I share more posts like this. If you're active there, feel free to follow me for updates.
I'd love to stay connected β join the conversation on X, Bluesky, or connect with me on LinkedIn to talk design, digital products, and everything in between.