The hidden cost of design system entropy

How AI tools surface every shortcut and inconsistency

Jun 19, 2025

I've been watching the AI-powered design-to-code revolution unfold with equal parts excitement and dread. The excitement is obvious – AI tools can now read Figma files as structured data – not just screenshots. It feels like magic.

The dread? That comes from years of working with design systems that look polished on the surface, but are quietly rotting underneath. 😬

And now, that dread is getting exposed – not by your team, but by your tools.

Model Context Protocol (MCP) servers give AI access to structured, contextual data – not just surface-level cues. In development, they expose relationships in codebases. In infrastructure, they reflect architecture. And now, in design, they offer a direct line into the structure beneath your files.

That's what makes Figma's Dev Mode MCP server so significant. It feeds real design system context — components, tokens, constraints, naming – into tools like Cursor, Claude, and Windsurf. AI can interpret your file the way a developer would: by reading what it's actually made of.

This isn't just a productivity layer. It turns your design system into a live, parseable interface. And if that system is messy, inconsistent, or drifting? The AI sees it – and builds accordingly.

The entropy hiding in plain sight

Even the most meticulously maintained design systems accumulate what I call design system entropy – small inconsistencies that compound over time. A duplicated button here, a slightly different spacing value there, a component that's been detached and modified just enough to "make it work".

Humans are brilliant at working around this chaos – when we see btn_primary in one file, and button[variant=primary] in another, our brains can connect the dots automatically. When we see spacing.large in Figma and space-lg in code, we know they’re supposed to be the same thing – we’ve become expert translators of our own inconsistencies.

But AI doesn’t adapt quite the same. It can assume, but it doesn’t forgive. It will reflect exactly what you’ve built – mess and all.

How AI actually sees your system

With tools like Figma's MCP server, AI can access component metadata, auto-layout constraints, layer names and hierarchy, and design tokens. This is incredibly valuable – instead of hallucinating code from flat .png files, AI can understand the tokenised values assigned to a component. It can see relationships between elements, understand layout logic, and preserve semantic meaning.

But here's the thing: if your design file is inconsistent – detached components, raw hex values, missing tokens – that's exactly what AI sees. It doesn't pause when things get messy; it tries to fill in the blanks. Sometimes the guesses are close enough. Other times, they drift just far enough to break your intended logic.

Every subtle drift becomes another deviation to maintain – another fork in the system that chips away at consistency. The trouble doesn't come from malice or malfunction, but from a machine doing its best with incomplete structure.

Guesswork is not reuse

AI systems like Cursor or Claude don’t just regurgitate your design files. They parse them, analyse them, and often make surprisingly competent assumptions – especially when your system gives them clear patterns to work with.

But even the best AI can’t guess your intent when your design system contradicts itself, hides meaning behind visual styling, or drops semantic structure altogether. That’s when things start to slip. Not because the model is unintelligent, but because the signal you’re giving it is noisy.

Below, I’ve detailed four real-world patterns where AI has a strong tendency to “fill in the blanks”.

Stylised robot assembling mismatched puzzle pieces on a dark background, symbolising AI attempting to piece together incomplete or inconsistent systems.

Visual twins, semantic strangers

Imagine there are two buttons – both blue, both labelled ‘Submit’. One uses the official Button/Primary component with --color-button-primary. The other is a one-off, hand-drawn rectangle with a raw hex value of #3B82F6 applied to its background.

To a human, these buttons are functionally identical. But to a machine, one is a known system component – the other is just a styled frame. If the AI recognises the structure from elsewhere in your codebase, it might reuse the component. But if it can’t identify the similarities and infer the context, it will often rebuild it from scratch – and that’s when hardcoded styles and one-off logic can begin to creep in.

👉 Why it matters: Visual similarity doesn't equal structural parity. AI doesn't "see" that these two buttons are meant to be the same – it sees a known component and a mystery box. Without a structural link to your system (like a component instance or importable reference), the AI has no safe way to reuse logic. That means more duplicate code, more inconsistent behaviours, and a growing maintenance burden every time someone tweaks a one-off instead of the source.

Two identical cubes in different colours on split backgrounds, symbolising visual similarity with different underlying meaning.

Token ambiguity

When one frame uses color.brand.primary and another applies #3B82F6 directly, the AI might assume they’re equivalent. Sometimes it’s correct. Other times, it might assign the wrong token, or create a redundant one.

When multiple tokens point to the same raw value – say, both color.brand.primary and color.interactive.hover reference #3B82F6 – skipping the token removes semantic clarity. The AI sees the hex value, but not the intent. That opens the door to wrong guesses, misplaced reuse, or redundant logic.

👉 Why it matters: Tokens are how your system encodes intent. When they're skipped, the AI has to guess whether #3B82F6 is meant to signal brand identity, interactivity, or something else entirely. This leads to incorrect mapping, duplicate tokens in code, or subtle inconsistencies that break down theming, dark mode, accessibility, and localisation efforts. Consistent token usage ensures every colour, spacing, and motion choice is deliberate – and machine-readable

Component ghosts

In a fast-paced environment – where speed and exploration go hand-in-hand – it’s common to duplicate and detach components, flatten layouts, nudge labels, change colours, etc. in the name of exploration. As ‘finalised’ layouts begin being pieced together, and that detached Modal or Checkbox creeps into your interface, that’s no longer a component. It’s just grouped rectangles and text.

While some models can detect visual similarity, they won’t assume intent unless you’ve preserved structure. That means: no import statements, no prop mapping, no reuse. Just a fresh blob of bespoke code.

👉 Why it matters: Detached components lose the metadata AI relies on to understand meaning – like slots, props, and variant logic. To the machine, that detached Modal is just rectangles and text. So it generates fresh code, hardcodes styles, and severs the connection to your system source. That's how system drift starts: not with intent to deviate, but by losing its integrity in the name of speed. Eventually, you end up managing five versions of the same thing – and none of them stay in sync.

Minimalist illustration of a ghost standing over a fading, pixelated floor, symbolising lost structure or disappearing data.

The drift multiplier

Imagine tweaking a component instance in Figma, then your coworker needs to reference your design for a new feature they’re working on. Suddenly, you have three visually similar components, with slight variations in spacing, radius or behaviour. The AI doesn’t know which one is canonical, so it generates new logic for each.

It’s not a mistake, it’s just building what it sees. It assumes intent.

👉 Why it matters: Drift is exponential. What starts as a one-off tweak quickly spreads into parallel versions of the same component, each slightly more ‘off' than the last. AI tools treat these as distinct inputs, generating new logic for each deviation. That fragments your codebase, increases test surface, and forces your team to debug differences that were never intentional. The longer the drift is left unchecked, the harder it becomes to re-align – especially when AI is silently codifying every variant into your stack.

Abstract isometric illustration of interconnected trees growing from floating platforms, symbolising branching systems and networked complexity.

How to reduce friction: What to fix first

Instead of blaming AI for not being "smart enough," or assuming the tools aren't "quite there yet," the better question is: are we giving it the right signals?

Those signals aren't magic – they're the fundamentals: clear naming, consistent use of tokens, proper component structure, and alignment between design and code. When those things are in place, AI tools don't have to guess – they can infer. And the quality of the code output improves dramatically.

If your button component uses meaningful variants, exposes semantic slots, and references tokens instead of raw values, the AI can confidently generate a match. If your design file is a scattered mess of detached layers, duplicated logic, and hardcoded styles, it will reflect that back in your code – not because the model is broken, but because the signal is noisy.

Quality follows structure

Throughout testing, well-structured Figma files with consistent naming, layout conventions and token usage produced decent AI-generated code: proper imports, token references and clean component structure.

But as soon as structure disappeared – detached instances, missing tokens, ambiguous layer names – the output quality collapsed. It became difficult to determine whether the AI was making correct assumptions, justifiably creating new token values, or assigning the incorrect tokens because they mapped to the same raw value.

The AI didn't fail. It was just working with what it was given.

Cute cartoon robot with a confused expression and a floating question mark, set against a purple-blue gradient background.

The real test: is your system AI-ready?

Here's a quick audit you can run right now:

Component integrity: Are your components actually being used, or are teams constantly detaching and duplicating them?
Token consistency: Are you using design tokens systematically, or falling back to raw values when things get tricky?
Structural alignment: Do your component variants make sense as API props, or are they buried in confusing layer hierarchies?
Design-code parity: Do your Figma components actually match what developers are building?
Future-proofing: Have you tested your system with tools like Windsurf, Claude, or Cursor’s MCP integration?

The interface is structural, not visual

As AI becomes the bridge between design and development, structure becomes the new interface. Every missing token, every detached component, every inconsistent naming pattern becomes a break in translation. AI tools surface different types of context – pattern metadata, interactivity models, and content relationships – but they can only work with the structure you provide.

This isn't a limitation of AI tooling – it's a feature. The tools are showing you exactly where your system needs work, forcing a level of discipline that's been optional until now.

I've seen teams get frustrated when AI generates suboptimal code from their "perfectly good" designs. But when we dig deeper, we always find the same culprits: inconsistent token usage, detached components, or naming conventions that made sense to humans but confuse machines.

Abstract isometric illustration of a pink bridge connecting digital platforms with oversized computers and screens, set in a cloud-filled, tech-inspired landscape.

Discipline over prompt engineering

The most impressive AI-generated code doesn't come from teams writing groundbreaking prompts, or the most detailed documentation. It comes from teams with the most disciplined design systems – the ones who've been doing the unglamorous work of maintaining consistency, updating tokens, and keeping components aligned.

AI doesn't fix your design system – it amplifies it. If your system is solid, AI makes it more powerful. If it's inconsistent, AI makes that inconsistency impossible to ignore.

That reflection, uncomfortable as it might be, is exactly what we need. Because the teams that take this feedback seriously – that use AI as a diagnostic tool to identify and fix systemic inconsistencies – those are the teams that will build the next generation of truly scalable design systems.

System discipline isn't just good practice anymore – it's the price of entry. Build for AI, or get outbuilt by those who do. If we treat AI as a system partner – not a silver bullet – we'll build systems that not only scale, but adapt.

The promise of AI is real – but only if your system is ready for it. Build with structure, or spend your time cleaning up after your own ambiguity.

Thanks for reading! This article is also available on Medium, where I share more posts like this. If you're active there, feel free to follow me for updates.

I'd love to stay connected – join the conversation on X, Bluesky, or connect with me on LinkedIn to talk design, digital products, and everything in between.