Figma challengers
Figma is the dominant software for UI/UX design today. It’s now being challenged by agent-first products like Pencil, Stitch and others — tools built to be used principally by and through agents. The designer’s role is to steer, not to use the underlying tool. Understanding how those new kinds of products work under the hood — vs something built for humans like Figma — gives some interesting insights into defensibility of both incumbents and challengers.
Pencil is one Figma challenger that’s been gaining traction lately. You interact with it primarily via Claude Code. You tell Claude what you want to design (e.g. screens for a new user flow), it then creates and mutates pen files using the Pencil MCP, and renders screens from those on a canvas. It takes screenshots to validate the work and iterates until it likes the output or more input is given by the user.
The file format is one innovation here. pen files are json-like and easy for LLMs to generate. They can be version-controlled and rendered in IDEs. The MCP tells Claude how those files work and provides methods to read and mutate them. The main method, batch_design, allows for atomic updates across multiple nodes in the design which can be rolled back in one go.
The second innovation is the built-in feedback loop via screenshots. The MCP provides two methods, get_screenshot (visual snapshot of a node or the whole canvas) and snapshot_layout (returns bounding boxes of nodes on the canvas). The latter allows the agent to understand where things are positioned, which isn’t information it can easily infer from a png image.
Designing the MCP around batch operations turns out to be key to get the feedback loop to work well. Multi-modal models are still pretty limited in terms of visual intelligence. They will often struggle to reason from visual output alone what specific operation caused the end result to deviate from expectations. Backtracking a whole batch of operations is often necessary to make progress [1]. The agent could in theory take smaller steps (down to single operations) but taking screenshots and feeding them into an image model at each step is inefficient and expensive. The feedback loop would be agonizingly slow as a result.
Taken together, Claude + Pencil produces quality output overall. It still ends up missing some obvious issues here and there (e.g. text overflow, inconsistent styling across screens), but nudging it to fix those still ends up being faster than doing the designs yourself from scratch.
Claude knows a lot about UI design patterns and there’s often no need to be super specific about what you want. It will often suggest layouts that are better than what I could’ve come up with for example. Still, design involves judgement and having the human in the loop turns out to be important. Auto-generating/one-shotting designs sort of works but for great, polished UX you need to bring taste, product intuition and a deep understanding of your customer base. All things the models don’t have (yet).
Figma also has a bi-directional MCP now. The main method is use_figma, which takes a code parameter and executes arbitrary javascript against the Figma Plugin API. The code runs remotely in a Figma plugin sandbox. The approach is different from Pencil's batch_design. There’s no explicit update/move/delete/etc primitives that operate on a structured node tree. Instead, Claude writes and debugs javascript to implement a design. The files it operates on are hosted on Figma’s servers, so every read/write is a network call, while pen files are local. Taken together, this tends to make for slower iterations.
I’ve used Claude + Figma and Claude + Pencil side by side, working on non-trivial design problems like user flows with multiple levels of nesting, mental model ambiguity and other intricacies. Pencil was faster and more accurate, and required significantly less re-prompting. I got it to work with Figma as well eventually, but the experience was more tedious. Claude obviously knows how to write javascript well but like a human designer, it benefits from a more predictable and targeted tool. And it told me as much afterwards when I asked it to do a post-mortem of the parallel sessions [2].
Why didn’t Figma release a better MCP? The reason I think is simple: Figma was built for humans. And it was built exceptionally well for that demographic. But its dominance now becomes a liability as UI work shifts both upstream (to product managers) and downstream (to engineers).
Figma has a large installed base and an apparent distribution advantage, and they may still provide a better agent experience down the line. But a distribution advantage isn’t worth as much if your existing customer base is being disintermediated, and the userbase broadens and shifts. Many of the people using Pencil et al. were never Figma pro users to begin with. They are product engineers like myself who had perhaps some grasp of Figma but always wished for something more tightly integrated with the codebase, and a faster feedback loop that didn’t involve costly design handoffs. It makes sense for designs to live close to code now that agents can generate both. At a minimum this cuts down on synchronization issues and helps reduce a large class of translation errors.
The obvious challenge for the new breed of Figma competitors on the other hand is vertical integration by foundation model providers. And indeed, Claude Design is already out in research beta. I like Pencil and its fellow challengers, but it’s not obvious to me what’s defensible about their products, or indeed how they’re going to monetize. They’re subject to significant supplier power — you can’t use them without a foundation model. One option is to attempt to vertically integrate themselves, another is to bet on product focus and attention. Anthropic sure has the resources to build a similar offering but does it have the leadership attention to build one of the same quality? History shows this is not always the case for incumbents.
Footnotes
[1] Not just because of limited visual intelligence, but also because of frequent API errors. In my experiments, the dominant class of errors Claude made turned out to be using the MCP incorrectly - syntax errors, unsupported operations, null-errors. I’m guessing those will go away over time as Pencil optimizes the MCP.
[2] These post-mortems/self-analyses are insightful but must be taken with a grain of salt. Depending on how you frame the analysis, models will happily hallucinate themselves into an opinion that confirms your leading questions. Even a hint of preference will bias them.