Designing with AI

I made AI prototypes more consistent. Stakeholders assumed they were final. Two years of working with AI taught me that the obvious move isn't always the right one.


The context

Most conversations about AI in design focus on making things better. Better prototypes, better documentation, better output. I've spent the past two years asking a different question: what does AI actually break, and is that always a problem worth solving?

One piece of work produced an unexpected finding. Making AI prototypes more consistent with our design system made stakeholders assume the designs were final. The polished output was undermining the design process. The right answer turned out to be keeping the prototypes deliberately rough. Sometimes inconsistency is a feature.

That kind of counterintuitive finding is what I'm interested in. Here are three pieces of work that shaped how I think about AI as both a design material and a design tool.


Designing the AI experience for Morpheus

OutSystems built an AI-powered app generator called Mentor, running on a platform called Morpheus. The product took a natural language prompt and generated a working low-code application.

My role was designing the content for the onboarding experience and the prompt writing guidance. The core design challenge came from a surprising research finding. Users assumed that more detailed, thorough prompts would produce better results. Many were uploading full product requirements documents, spending hours preparing them before generating an app.

But research showed the opposite was true.

More content with more specific implementation detail didn't increase output quality. It actually increased disappointment, because the generated app diverged from the specific requirements users had painstakingly written. This expectations mismatch was the biggest detractor in satisfaction scores.

My job was to help users understand that before they ever wrote a prompt. I did three things.

First, I designed a three-screen onboarding flow that walked users through what Mentor could do, how to make changes after generation, and how to write a prompt that worked. PMs wanted a single tooltip-style callout for this critical information. I pushed back. Tooltips are a pattern for supplemental information, not foundational context users need before taking action. Testing validated the concern: users skimmed the tooltip and still wrote prompts that were too implementation-heavy. The three-screen flow slowed them down intentionally, with each screen addressing a distinct mental model failure we'd seen in testing.

Second, I designed and wrote example prompt chips directly below the input field, that, when clicked, injected example prompts that followed our established best practices into the input.

The goal was to show the right level of abstraction before users had written a single word, not tell them, show them.

Third, I created the prompt best practices modal with three sections:

  1. Be concise and focus on concepts

  2. Outline your data model

  3. Set your expectations.

The last section was the most important. Explicitly telling users what the AI couldn't do, mobile apps, complex logic, edge cases, made the experience more trustworthy, not less. Honesty about limitations built more confidence than overselling capability ever would.

We tested again and immediately saw increased satisfaction because of education and expectation setting. While users still felt their requirements weren’t being translated correctly their expectations were much lower. One user said, “it didn’t correctly parse my data model, but it did a pretty decent job…despite it’s limitations.”

More educated and context-aware users shifted requirement translation satisfaction scores from 2.8 to 3.6.


Can design system guidelines make AI prototyping more consistent?

OutSystems designers were using Cursor to generate prototypes using AI. They hadn’t connected it to the design system yet, and I wanted to answer a specific question: if we embedded our content guidelines, design system docs, token JSON files, and front-end CSS files directly into Cursor as prompt context, would the output be more consistent with the design system?

The answer was yes. But it created a problem nobody anticipated.

To start the experiment, first, I captured a baseline, a hifi screen in Figma. I made sure that it was fully following our design system standards with correct tokens applied.

Then, in Cursor, I cloned our design system repo which included our token JSON files, set up Figma MCP and then selected the baseline screen and asked Cursor to recreate it.

I ran a code diff on values the recreated screen had verses our tokens in GitHub.

For the second iteration, I converted all of our design system documentation, including the content layer, into a few .md files added them to Cursor along side some guardrails in the Cursor rules file that explicitly stated to respect the existing design tokens in our JSON files.

I then selected the baseline Figma node again and asked it to recreate the screen.

With some wonky exceptions, better documentation did produce more consistent, guideline-aligned prototypes, but there were still plenty of inconsistencies with the design system. After this iteration, a few designers started using this repo to build their prototypes with Cursor.

While designers were testing a Cursor-driven design cycle, I got our component CSS file from our front-end team and added it to the repo, added a few extra rules, then tried again.

However, on a sync with the designers using Cursor, I learned that the more “finished” the AI output looked, the more stakeholders assumed it was a final design. Designs meant to be exploratory were being treated as commitments. Designers were spending more time managing stakeholder expectations than doing design work.

That led to a harder question: why were we trying to make AI prototypes better in the first place?

For our org, there was no good way to push that prototype to the next level of the design process. Hifis still needed to be built in Figma, so the value of a prototype was that it could be iterated on fast, but still looked unfinished. Deliberate inconsistency was actually protecting the design process, keeping stakeholders from locking in before the real design work had started. The work ended not with a solution but with a reframe: the goal isn't AI prototypes that look more like final designs. It's AI prototypes that are clearly and intentionally rough. 

That reframe led to a different question: if polished output wasn't the goal, could we give every designer the same starting point instead? I began working with a small group of designers to build an AI-tool-agnostic starter kit. A set markdown files connected to our design system styles, tokens, and guidelines that any designer could pull into their AI tool of choice specifically for prototyping. The idea was that consistency didn't need to come from polished output. It could come from a shared foundation everyone started from.

The project was in early stages when my role was terminated, but the lesson behind it was striking. Enterprises can’t just say “go faster by designing with AI” they also have to rethink their design and development processes to support AI-driven design. 


Content Desk: an AI content review agent

The hardest content design problem on a large product isn't writing the content. It's maintaining consistency as the product grows and teams multiply. Even with a solid content design system layer, every new designer makes micro-decisions about copy. Over time those decisions compound into inconsistency.

I co-designed and led the model training for Content Desk, a proof-of-concept AI agent trained on OutSystems' design system documentation and content guidelines.

The idea was to make design system guidance conversational. Instead of searching through docs, a designer could ask a question and get a specific, guidelines-grounded answer.

The first version of the agent was trained on raw documentation and returned answers that were technically correct, but too long and too general to be useful in the middle of a design decision. I refined the training to prioritize context finding first and then deliver short, actionable answers with direct references to specific guidelines. That iteration made the difference between a tool designers opened once and one that felt like a partner in the design process.

The interface was a chat UI with a project sidebar, and configurable assistant settings. We ran real queries through it during development. One example from the session logs: a designer asking about a toast message error for duplicate entity names during import. Content Desk responded with a specific answer about the constraint and asked clarifying questions to give better guidance.

We were also proofing out a analytics dashboard that would help the design system see where designers were most confused and give us data we needed to identify gaps and prioritize work within the design system.


The project was the first step in a bigger design system review agent that, ultimately, would give feedback on all parts of a design (behavior, design tokens, states, etc) The project was paused when my role was terminated. The thinking behind it, that design systems need a conversational interface to be truly usable at scale, is something I'd continue in any role.

The through-line

Each piece of work pushed against a simple assumption about AI.

Morpheus pushed against the idea that AI will figure it out. Users need to understand the model to work with it well.

The Cursor work pushed against the idea that better AI output is always better. Sometimes roughness is a feature. It also pushed against that idea that just injecting AI tools into a process will automatically make it better and faster.

Content Desk pushed against the idea that documentation is enough. At scale, guidance needs to be findable in the moment, not filed away somewhere.

The pattern across all three: AI changes what's possible but it doesn't remove the need for clear thinking about what you're actually trying to do.