How I Built an AI-Powered UX Research Tool With Claude Code

After 20 years of designing products, I've watched user research happen the slow way more times than I can count. You recruit participants. You schedule sessions. You run interviews. You tag highlights. You synthesize findings. Then you do it all over again for the next feature. It works, but it eats weeks. Sometimes months.
So I built something to fix that.
It's an agentic web app that automates the entire research interview process and handles the synthesis at the end. You set up a project, the AI personas respond to your questions like real users would, and the system turns everything into structured insights you can share with your team. No recruiting. No scheduling. No manual tagging.
I built the whole thing using Claude Code as my pair programmer inside VS Code. This post is the story of how it came together, what's under the hood, and what I'd do differently.
The Aha Moment
The idea didn't actually start with me. It started with my manager.
We were chatting one day about how slow our research cycles had become, and he asked a pretty direct question: "Do you think it's possible to build an AI agent that can run user interviews and synthesize the findings for us, so we can move faster?"
That question stuck with me for days. The more I thought about it, the more I realized large language models are basically very good at role playing. They've read more about software engineers, product managers, and IT admins than any researcher could ever interview. So in theory, you could spin up a few "fake" users, point them at a Figma prototype, and have them tell you where it falls apart.
That weekend I opened VS Code, fired up Claude Code, and started building. I picked Claude Code over the alternatives because it actually understands what I'm trying to do across multiple files at once. It doesn't just autocomplete. It reasons about my codebase. Honestly, having a thoughtful pair programmer that never gets tired changed how fast I could move.
A few months of nights and weekends later, the prototype was working.
What The App Actually Does (In Plain English)
Here's the simplest version: you create a research project, add some questions, attach your designs, and assign a few AI personas. The system then "interviews" each persona, collects their responses, and turns it all into structured insights you can share with your team.
Here's the high-level flow:
That's it. You go from a blank project to a full research report without ever scheduling a single Zoom call.
But the interesting stuff is in the details, so let me walk you through each piece.
The Two-Layer Idea: Personas and Agents
This is the part that took me the longest to wrap my head around, and it's worth understanding because it's the heart of how the whole thing works.
There are two layers** here: Personas and Agents.
A persona is the character. Think of it like a casting sheet for an actor: name, role, experience level, job responsibilities, domain knowledge, expertise areas, and behavior traits. "Sarah, a 35 year old IT admin with 10 years of experience who hates cluttered dashboards" is a persona.
An agent is the brain that plays the character. Behind the scenes, every persona is powered by an AI agent, which is a Claude instance loaded with a carefully crafted system prompt that basically says: "You are this person. With this background. With these traits. Respond to questions as they would."
So when you run a session with 5 personas and 10 questions, the system is actually orchestrating 50 separate agent calls. One for each persona answering each question. Each call gets the persona's full profile, the specific question, and any design content (Figma screenshots, scraped pages, uploaded images). The agent then responds in character, with a structured answer, the reasoning behind it, and a confidence score.
But here's the clever bit. There's also a separate synthesis agent, and this one doesn't play a character at all. It acts as a senior UX researcher. After all the persona agents have finished responding, the synthesis agent receives every answer and turns it into structured insights. Different prompt, different role, but powered by the same underlying AI.
This separation matters a lot. Just like in real research, the interviewer and the analyst are different roles. The persona agents generate raw data. The synthesis agent interprets it. Mixing the two leads to muddled output. Keeping them apart leads to insights that actually feel like they came out of a proper research process.
On the frontend, you (the admin) just see a friendly persona manager. You can create them, edit them, assign them to projects, and view their avatars and profiles. The agent orchestration is completely invisible. To you, they're just personas. Behind the curtain, they're a small army of AI agents working in parallel.
How You Actually Use It (The Project Wizard)
Creating a project is a step by step wizard. You give it a name, a description, your research goals, and pick a research type like usability testing, exploratory research, validation, or A/B testing.
Then you choose your starting point. There are two paths:
Path 1: Start From Scratch
This is the main flow. You assign personas, organize your research into categories (like "Onboarding", "Navigation", or "Checkout"), add questions for each category, and attach your designs. Designs can be:
- Figma prototype links, where the tool grabs the structure and a screenshot
- Regular URLs, where the tool visits the page in a headless browser and captures it
- Uploaded screenshots, just straight image files
Hit "Create & Start Session" and the AI runs everything in the background. You can close the tab, grab a coffee, come back, and find your results waiting.
Path 2: Import From Dovetail
If you already have real research data sitting in Dovetail (a popular research repo tool), you can paste a project or folder URL. The tool fetches all your notes, highlights, and tags through the Dovetail API, hands them to Claude, and generates fresh synthesized insights from your existing data. No personas needed for this path. It's pure analysis.
This is great when you've done a bunch of interviews already but never had time to properly synthesize them.
What Happens Under the Hood (The Architecture)
Okay, here's where it gets interesting for the developer crowd. Let me walk through the full architecture.
When you click "Start Session", a chain of events kicks off:
1. The backend orchestrator wakes up. It pulls your project config (personas, categories, questions, attached designs) and starts looping. For every persona, it goes through every category, and for every category it goes through every question.
2. It fetches the design content. This is where it depends on what you attached:
- Figma link? It calls the Figma API to extract the component structure and exports a PNG screenshot.
- Regular URL? It uses Playwright (basically a headless browser) to visit the page, grab the rendered content, and take a screenshot.
- Uploaded image? It just reads the file and converts it to base64.
3. It bundles everything for Claude. The persona profile, the specific question, the design content, and the screenshot all get packaged into a single agent call. Claude is multi-modal, so it can look at the image and read the text in the same prompt. That's a huge win for richer responses.
4. The agent responds in character. Claude returns a structured JSON object: the answer (in the persona's voice), the reasoning behind it, and a confidence score. Every response gets saved to the database.
5. Once everyone has answered everything, the session is complete. Status flips. You get a notification. Time for insights.
Insight Generation
This is where the synthesis agent takes over. The tool collects every response from every persona, hands them to a fresh agent prompted as a principal UX researcher (not a persona), and asks for structured insights.
Claude categorizes findings into six types:
- Pain Points, what frustrates users
- Needs, what users actually want
- Behaviors, patterns in how they act
- Patterns, recurring themes across personas
- Recommendations, concrete next steps
- Themes, bigger ideas worth exploring
Each insight comes with a title, description, severity rating, and evidence: actual quotes from the persona responses, plus impact and effort scores, and which personas it affects.
On the frontend, you get filterable insight cards, an impact effort matrix, theme distribution charts, and an executive summary. You can also export everything as a PDF for your stakeholders.
The Tech Stack (And Why I Picked Each Piece)
Here's everything I used:
- Next.js for the full stack React framework. One codebase for both the frontend and the backend API routes. Less context switching, faster iteration.
- Claude (via AWS Bedrock) for the AI backbone. I went with Bedrock because it gives me enterprise grade access using my existing AWS credentials. No separate API keys to juggle.
- Prisma + SQLite for a simple database setup. Prisma gives you type safe queries and easy migrations. SQLite keeps things lightweight during development. Easy to swap for Postgres later.
- Tailwind CSS + shadcn/ui for fast, consistent UI without writing custom CSS. Looks polished out of the box.
- Playwright for scraping web pages and taking screenshots when designs come in as URLs.
- Figma API to extract design structure and export images straight from Figma files.
- Dovetail API to pull in real research data for the import flow.
- Sonner for toast notifications that persist across page navigation, especially during long running insight generation.
The Late Nights (What Actually Broke)
I want to be honest. This wasn't all smooth sailing. A few things slowed me down significantly:
1. AWS credentials kept expiring. I started with temporary access keys, and they'd expire mid session. Every. Single. Time. The fix was setting up AWS SSO profiles so the SDK auto refreshes tokens. If you're going down this path, do that first.
2. Stuffing too much data into Claude. The first time I tried importing from Dovetail, I sent every note in full. Hit the context window. Crashed. Had to learn to intelligently truncate by calculating per item character budgets based on the total count, instead of just chopping everything off at a fixed length.
3. Long running operations breaking the UI. Insight generation can take 30 to 60 seconds. Users (i.e. me, testing) would refresh the page and lose status. The fix was a "fire and forget" pattern with a React Context tracking generation status across pages, plus persistent toasts so you always know what's happening.
4. Claude going off script. Without strict prompting, agent responses came back as inconsistent prose. I learned to always ask for structured JSON output in a specific schema. Made parsing reliable and the frontend predictable.
5. The infinite loop incident. At one point I had an orchestration bug that kicked off agent calls in a loop. I burned through a not insignificant amount of Bedrock credits in about 20 minutes before I caught it. Add safety limits early. Trust me.
But Can You Trust AI-Generated Research?
This is the question I get asked the most, and honestly, it's the right one. Let me be straight with you.
This tool is not a replacement for talking to real users. Full stop. It's a simulation. The agents don't have real experiences, real frustrations, or real "I'd never use this" moments. What they do have is the ability to role play personas based on detailed profiles and reason through problems with the context you give them.
The underlying logic is this: each persona agent gets a system prompt that constrains its worldview, its knowledge, experience, and behavior traits. When it encounters a design or question, it reasons through that lens. The synthesis agent then looks for patterns across all the responses, exactly like a human researcher would look for themes across interview transcripts.
Every response includes a confidence score and explicit reasoning. Every insight is backed by direct quotes from the persona agents. The synthesis agent is forced to ground every finding in actual data. No speculation allowed.
Think of it as a structured brainstorming tool, not an oracle. It helps you:
- Spot potential pain points you might miss
- Generate hypotheses before investing in real research
- Pressure test ideas in early stages when recruiting users isn't worth it
- Prioritize which questions to actually ask real users later
The real value is speed and cost. You can run a simulated study in minutes instead of weeks. Use it to figure out what to test with real users, not to skip testing altogether.
Used that way, it's genuinely useful. Used as a replacement for real research, it'll mislead you. Be honest about what it is.
What I'd Tell Anyone Building Something Similar
A few hard won tips:
- Use AWS SSO from day one. Temporary keys are a productivity killer.
- Truncate intelligently. Calculate per item budgets, don't just chop at a fixed length.
- Run long operations in the background. Use a Context for status tracking, persistent toasts for visibility.
- Send images alongside text. Claude is multi-modal, so design screenshots in base64 give you much richer feedback.
- Always ask for structured JSON. Your parsing logic will thank you.
- Add cost safety limits. Especially when orchestrating loops of AI calls.
- Treat Claude Code like a thoughtful colleague. Tell it why, not just what. Give it context, point it at the right files, and review its suggestions like you would a junior teammate's PR.
What's Next
This app is a working prototype that I built to explore how AI can speed up the research process. It's not about replacing researchers. It's about giving teams a way to think through problems faster, especially in the messy early stages where recruiting real users is overkill.
I'm still iterating on it: better persona training flows, smarter handling of multi step user journeys, and richer integrations with research tools. There's a lot more to dig into.
If you want a deeper look at how teams are using this (concrete use cases, before and after stories, the actual numbers on time and cost saved), head over to my Case Study page for the in depth version.