Design system teams are governing AI output. Nobody asked them to.

Design system teams are absorbing governance responsibility for AI-generated output. Nobody decided they should. There was no conversation about scope or resourcing. AI coding tools started producing UI at scale, nobody else in the organisation had the context to govern it, and the work landed on the team closest to the system.

The Zeroheight 2026 Design Systems Report captures the pressure building underneath that. Buy-in satisfaction dropped from 42% to 32% in a single year. Resource constraints remain the top complaint. Teams are being asked to do more with less, and the "more" keeps expanding without anyone formally deciding it should.

Grant Blakeman, staff design engineer at LinkedIn, described the scope change plainly in Figma's recent piece on what's redefining design systems teams: "Our scope now includes any tools that builders without traditional product titles are using to contribute to the product". That's a significant expansion of what a design systems remit looks like. It also arrived without a headcount conversation.

This is the context most governance writing skips past. The articles about MCP servers and token semantics and cursor rules assume a team with capacity to implement them. The practical reality for most design system practitioners is that the governance problem landed before anyone asked whether they had the capacity to handle it, and the tooling advice sits on top of an unresolved question about who this work actually belongs to.

What the job description said, and what it says now

Design system governance has always been a coordination problem. The traditional models, centralised, federated, community-driven, were all solving for a specific failure mode: inconsistency created by humans making independent decisions without enough context. Someone builds a custom dropdown because they didn't know the one in the system existed. A designer creates a one-off component because the existing one didn't flex the way they needed. Governance was the mechanism for surfacing those decisions, routing them through people with system-level context, and integrating the good ones or redirecting the bad ones.

That model assumed a few things: that the volume of out-of-system decisions was manageable, that those decisions were being made by people who could in principle be reached, and that the decisions would surface through normal channels like PRs, design reviews, and Figma comments.

Vibe coding breaks all three assumptions. When a developer uses Cursor to generate a new onboarding flow at speed, they're not asking who owns the card component. They're not checking your contribution guidelines. The component they generate might not reference your tokens, might handle states your design team never specified, and might never surface as a decision at all. It just exists in the codebase now. The code is plausible enough to pass review, and the reviewer often doesn't have the design system context to catch what's wrong.

Teams using AI tools report 41% higher code churn and decreased delivery stability. For design system teams, that figure is a governance signal: the work being created and recreated at that rate includes UI decisions that belong to the system, and most of those decisions are not being made by anyone who knows that.

The specific hardship nobody is talking about

Romina Kavcic wrote recently about what a design system team's actual scope looks like in 2026: expected to run the component library, documentation site, design ops, and Figma governance while supporting five or more product teams with conflicting priorities. "There's no executive sponsor who shields you from scope creep", she noted. That was before accounting for AI output governance as an additional layer.

The hardship runs deeper than workload. The mandate expansion is invisible to the people who control resourcing decisions. A team can explain to leadership that they maintain 200 components. They can show adoption metrics, document contribution rates, point to the time it takes to run a design system review. The work of governing AI output is much harder to make legible, monitoring what tools product teams are using, watching for component drift in PRs that move fast, trying to constrain generation at source rather than catch violations downstream. It doesn't map cleanly onto existing job descriptions, so it doesn't map cleanly onto budget conversations.

The Zeroheight 2026 report notes that while 69% of teams now encourage open contribution, more than two in five are still operating as gatekeepers, managing a review queue that was never designed for the volume AI creates. The teams in that second group aren't failing at governance. They're applying a governance model to a problem it wasn't built for, and the gap between what the process can handle and what's being generated is widening every quarter.

The Telerik Workflows in the Age of AI report captured the underlying tension: AI delivers its strongest value in speed-oriented activities, but needs to be embedded into systems that enforce quality and consistency, not layered on top of them. The design system team is often the only group capable of doing that embedding. Nobody formally handed them the responsibility, but the work exists, and if they don't do it, it doesn't get done.

There's also something worth naming about what it feels like to have your existing work systematically approximated at scale. A design system represents accumulated decisions: the loading state that took three weeks to get right, the token architecture that makes dark mode possible, the component API that handles every edge case your products actually encounter. When AI generates a version of that component that looks right but isn't, those decisions get overwritten without anyone noticing. The careful work becomes invisible even faster than it used to.

Governing something that doesn't come through the front door

The governance mechanisms most design system teams have built assume a human is asking for help or submitting a contribution. Office hours, RFC processes, Slack channels, contribution guidelines with tiered approval. These are all front-door mechanisms: someone brings something to the system, and the system routes it.

AI-generated UI comes in through the wall. It doesn't submit a PR through your contribution process. It doesn't attend office hours. The only way to govern it is to get upstream of the generation, into the tools and context that shape what gets produced, or to build the kind of automated enforcement that catches violations before they ship.

The upstream approach is what Figma's MCP server, Storybook's @storybook/addon-mcp, and IDE instruction files like Copilot's .github/copilot-instructions.md or Cursor's rules system are all attempting. The underlying logic is sound: if you can expose your design system's components, tokens, and usage rules to the tools doing the generating, you can influence what gets produced before it exists rather than auditing it after.

Spotify rebuilt their Encore design system with exactly this in mind. Developers were going to Cursor first, getting non-Encore output, and shipping it, because the design system had become too complex for AI to reason about efficiently. Their response was to create an MCP server for Encore and redesign their component architecture into three independent layers, separating foundations, styles, and behaviours so AI could reason about each independently rather than parsing a monolithic component file. The result so far: 220,000+ shared style uses, 50% year-over-year growth, and 93% developer satisfaction. The governance investment is what made that possible.

The downstream approach is what linting and CI enforcement have always been, applied more deliberately to AI-specific failure modes. Atlassian's ESLint and Stylelint plugins for their design system enforce token usage at the code level, with auto-fixers that can update a codebase automatically. Salesforce built Prizm to re-architect code review as an automated system specifically because AI output volume had exceeded what human reviewers could meaningfully assess. When a PR contains 400 lines of AI-generated UI, reviewers check whether the component looks right. They're not checking whether it's referencing your spacing token or hardcoding 12px. The review passes because the surface area of the change has outgrown the bandwidth of a person. The token is invisible. The plumbing is invisible. The PR gets approved.

Neither approach eliminates the governance work, it changes where it sits. The investment moves from downstream reviewing to upstream configuration, and that configuration work requires the same deep system knowledge design system teams already have. That's one of the few legitimate arguments for why this work belongs to design systems rather than general platform engineering: you need to know what the system intends, not just what it contains.

Making the mandate visible

The most useful thing a design system team can do with this situation, beyond the technical work of implementing guardrails, is make the expanded mandate visible to the people who make resourcing decisions. That means naming what you're now responsible for in terms leadership can evaluate: the surface area of AI-generated UI in your products, the rate at which it's diverging from canonical components, the cost of that divergence when it surfaces as accessibility debt, update overhead, or production inconsistency.

This is easier to argue if you're tracking it. A simple audit comparing canonical component usage against AI-generated approximations in a sample of recent PRs gives you a number: here is how much UI was generated outside the system last quarter. Here is what that means for our maintenance overhead. Here is what it costs when a brand token changes and 30 components don't update because they were never wired to the system.

The conversation with leadership isn't "AI is making our job harder", which will receive limited sympathy in most organisations right now. The conversation is "AI is generating product decisions that belong to design infrastructure, and we're the team with the context to govern them. Here's what that governance costs, and here's what it prevents". The framing makes the work legible without making it sound like a complaint.

Figma's piece quoted LinkedIn's Grant Blakeman on the opportunity in this: "We're actually able to guide teams, and now also their tools, through each step of the product process proactively". That framing is particularly useful because it positions the expanded scope as influence rather than overhead. The design system team's understanding of what the system means to own, what it intends rather than just what it contains, is exactly what's needed to make AI output coherent at scale. That's a capability argument.

The teams managing this aren't the ones who've solved it. Nobody has. They're the ones who've stopped waiting for the mandate to be formalised and started doing the work anyway, which is, if you think about it, exactly what design system teams have always done.


Thanks for reading! If you enjoyed this article, subscribing is the best way to keep up with new posts. And if it was useful, passing it on to someone who'd find it relevant is always appreciated.

You can find me on LinkedIn, X, and Bluesky.

If there's something you'd like to see me write about, reply and let me know, or reach out directly via social media.

Free tool · Murphy Trueman
CTA Image

Is your design system ready for AI? AI agents are already consuming design systems. Find out if yours is structured to be understood by them.

Take the free assessment →
Share this post