Andreas Fragner Andreas Fragner

Using fewer parts

Fewer parts make for better software and better products.

The best-performing firms make a narrow range of products very well. The best firms’ products also use up to 50 percent fewer parts than those made by their less successful rivals. Fewer parts means a faster, simpler (and usually cheaper) manufacturing process. Fewer parts means less to go wrong; quality comes built in. And although the best companies need fewer workers to look after quality control, they also have fewer defects and generate less waste.

— Yvon Chouinard, Let my people go surfing

Chouinard’s observation applies to software products almost verbatim. Using fewer parts makes for better software: Easier to maintain, easier to extend, better margins. But what does “fewer parts” mean? And how do you know which ones to remove?

Fewer parts means making parts reusable. A good design minimizes number of components at constant functionality. That means avoiding duplication and making things reusable. If you can reimplement a system with a smaller number of components (functions, classes, services, etc.), that’s a sign that the original solution was either over- or under-engineered. Over-engineered because it introduced abstractions that weren’t necessary; under-engineered because it failed to identify reusable parts. It can be tempting to make fewer but larger components but those almost always end up being less re-usable. You might have fewer functions in such a design but you don’t have fewer parts.

Fewer parts means fewer representations of the data. All else equal, the amount of logic required to support n representations of the same data scales like . It’s not uncommon for teams to maintain protobuf models, SQL schemas, Open API specs, GraphQL schemas, etc. all to support a single product. They might have a source of truth that defines the “core” data models (e.g. in protobuf), but still end up spending a ton of bandwidth on maintaining model converters and crafting migrations. Most people intuitively prefer to have fewer data representations, but the challenge is that different applications typically need different views or different derived properties of the data. That can lead to a proliferation of derived models which may not have strict one-to-one relationships with the original models.

Fewer parts means fewer languages and fewer tools. There is almost never a good enough reason to add another language to your stack. The increase in complexity and maintenance burden is consistently underestimated vs. the benefits. The same goes for databases. Performance reasons are often not strong enough to justify adding a new type of DB to cater to your latest special use case.

Fewer parts means smaller teams. Smaller teams spend less time coordinating and more time building and owning things. In most start-ups, a small number of engineers (3-4) build the first iteration of the product, which ends up generating 80% of the lifetime value of the product. It’s clearly possible to build complex things with a small, focused team. But as more money is raised, engineering teams balloon because they lose focus and add components that are not directly aligned with creating customer value. It’s Parkinson’s law at work. Companies perceive things to be mission-critical for the product, then craft a budget based on that, which must then be used once allocated, so more people are hired who then produce yet more parts, and so on.

Fewer parts means fewer counterparties. Most things break at the boundaries (especially if they’re external). The greater the surface area, the riskier and the harder to maintain a system becomes. Prefer to deal with a small number of high-quality vendors, and be prepared to pay a premium. The obvious interjection here is concentration risk: If a key vendor goes into administration or decides to drop the product you rely on, that might pose an existential risk to you. Such counterparty risk can indeed matter greatly and needs to be considered, but I’ve found in practice it’s often more manageable than people think. There are SLAs and contractual notice periods, and the majority of counterparties will honor them, giving you time to adjust. If you do need to replace a vendor, you start out with a much clearer picture of the requirements and the scope of the integration, which cuts down on time-to-market.

If using fewer parts is a good idea, how come modern software production appears to be so bloated? Dozens of vendors, a stack that’s 7 layers deep and includes 4 languages, teams of 60+ developers, etc. feel like the norm. Clearly, companies believe they need this many parts to deliver value to customers. Few people are deliberately trying to waste resources after all. But the problem is that people lose sight of what activities actually create value. As a company grows, a disconnect starts to develop between the activities performed by its employees and the value that is delivered to customers. In a 10 person firm, everyone speaks to customers, everyone knows the value chain and everyone uses the product. In a 1000 person firm, by definition most employees have never spoken to customers and may work on parts of the system that are increasingly far removed from what the customer sees. This is one instance where great management can make a huge difference. In well-managed firms, management goes to great lengths to communicate the link between firm activities and value creation. The focus is on customers and the problems they face, rather than process and efficiency gains. If you focus on serving your customers better, efficiency will take care of itself.

A few principles I follow to keep the number of parts small:

  1. Hire fewer but better people and pay them more.

  2. Work with fewer but better vendors and be willing to pay a premium. Be systematic about selecting them and understand the risks.

  3. Each project you decide to allocate resources to must have a 3-4 sentence description of how it creates value for customers. People often struggle with this if the work is abstract or far removed from what the customer sees (say, work on infrastructure) but I’ve found it’s always possible if the work is worth pursuing.

Read More
Andreas Fragner Andreas Fragner

Early-stage engineering

Early on you need to be fast. And to do that you have to have the confidence to break with best practices.

Early on you need to be fast. Your team, your stack, your infrastructure — they all need to be set up for that. To do that, you have to have the confidence to break with best practices. That confidence comes from knowing what risks actually matter in your context. The risks you care about when you’re building version 1.0 are very different from the risks a large organization cares about. The way you approach engineering has to reflect that.


When you start out building something new, everything is in flux. Your requirements aren’t understood yet, your data model will evolve, your API boundaries will shift, your interfaces haven’t firmed up yet, etc. That’s natural, and it’s key to embrace that uncertainty when working on something new. At that point it’s all about optimizing feedback loops. Make them as tight and fast as possible. What does that look like in practice:

  • Instrument everything — If you move fast, things break more often. You need to be able to figure out quickly and easily what’s wrong. People often interject here that adding instrumentation is extra work you can’t afford at this stage. The trick is to make it a total no-op. It should take minimal developer effort to get things like tracing, metrics and log aggregation in place. Like, zero is the goal here. Tracing in particular is such an easy win because it doesn’t require any thinking. Adding a span to a method takes at most one or two lines of (templated) code. Over time you’ll want to capture more information on a span and that takes more thinking, but just knowing the code path something took typically solves like 80% of the puzzle.

  • Make deployments automated and continuous — This one should be non controversial at this point. Every merge into main should trigger an image build, which gets deployed automatically. No action required. No release cycles. At Kappa we do on the order of 100s of “deployments” a day. A change is live in the dev cluster within 2 minutes of being merged. You get (near) immediate feedback.

  • Make running things locally easy and cheap — Even faster than deploying things is to just run them locally. Make it as easy as possible to run services locally and to connect to other services running remotely. Running a whole cluster locally can sometimes be hard given hardware constraints but that’s almost never necessary (Corollary: Buy good machines for everyone, see below). One interesting development here are services like Modal which try to abstract away the gap between local vs. cloud infra completely.

  • Make writing tests easy and cheap — The reason people don’t write more tests is because it’s hard and takes time. So it makes sense to invest to bring that cost down — e.g. by auto-generating mocks for your services, or writing sample data generators to give you representative data for your domain. It’s pretty clear at this point that LLMs have changed the game for unittests. It takes all of two clicks / two copy-pastes now to generate a reasonable test suite. Most of the time there are issues/mistakes the model makes, but they’re typically easy to fix. Net-net it can still be a big time saver.

  • Integration tests over unit tests — Integration tests that run on every build or multiple times a day give you fast, meaningful feedback. Modern systems are distributed and it’s the boundaries were most of the bugs sit. Unit tests are fine but if you have to choose on what to spend your time on, write integration tests. The components of your software obviously need to work in isolation but it’s really the interactions where things go wrong. Especially if those interactions are asynchronous.

  • Minimize wait times — Waiting for CI to finish, waiting for a code review, waiting for something to build, etc. — these things are especially detrimental to productivity because they keep you from getting closure on one piece of work and discourage you from moving on to the next task. Even if the work itself is done, it still lingers until it’s deployed. This is one strong argument for choosing a language that compiles and builds quickly.

  • No branch protections — One way to eliminate PR approval wait times is to not require them. Sounds crazy but you wouldn’t believe how much time is wasted waiting for a review on a trivial change (the true cost is even higher than wall time because waiting (and checking) breaks your flow state). So trust your engineers. We’re all adults here. If your team is 5 people with experience, you can coordinate your work often well enough without PRs, just over Slack. You do end up with merge issues at times, but they’re typically infrequent and easy to resolve because early on people tend to work on fairly orthogonal things. Most definitely the time spent on resolving those is easily made up for the increase in velocity.

  • Minimize task overhead — This one is almost tautological at this point. Maximize interrupted blocks of time for people to focus. Minimize meetings and process.

  • Automate stack upgrades — A lot of time can be wasted when you don’t update dependencies until you’re forced to for compatibility reasons. That’s when you have to deal with a potentially large number of issues all at once, usually at the worst possible time. This is easy to fix: Just set up Dependabot.

  • Buy good machines for everyone — The added cost of getting high-spec machines for everyone amortizes literally in a day. Remove the constraint of local hardware as much as you can. The added cost for a team of 5 is totally negligible compared to what you pay on cloud compute.

  • Hire owners and generalists — The person making a change is also responsible for ensuring that it actually works once deployed. Integration tests go a long way here, but sometimes you actually have to make an API call or open the app and check UX impact. If you wait for QA to catch issues, you’ve wasted 3 days to find out you had a bug somewhere. And because you’re often out of context at that point, it becomes harder to fix.

  • Understand your team’s strengths — While everyone agrees that hiring great ICs is important, far too little thought goes into team composition. In fact, it’s often completely absent from recruitment plans. This is strange, since in areas outside software engineering, like professional sports, it gets at least as much attention. Building a technical product from scratch is a high performance team sport. You need great individual performers, but you also need them to complement each other, technically and personality-wise.

Some of the points above may sound crazy to someone in a mature engineering org. And for good reason! Your approach has to evolve as your product matures. The key is to understand what risks you need to care about at the point you’re at. Zero-risk deployments, well-managed sprints, carefully groomed tickets, etc. — these things all sound great in isolation, but the risk-adjusted return of doing them is just too low in the beginning.

The only risks you should care about early on are existential ones: (1) Running out of cash before you launch, (2) launching too late to get enough proof points, (3) shipping too late to iterate meaningfully, (4) being too slow to incorporate feedback. The risks that are considered existential in a larger org are just fundamentally different. Reputation, competitive threats, losing customers, losing market share, product stability, service uptime — those things matter when you have an existing product with good traction. But early on, you don’t have many customers yet, and those you do have are (hopefully) more forgiving. There’s typically also little to no competition to worry about. If not, you may want to reconsider what you’re working on.

I believe a significant number of startups die because they cling to all the best practices of later stage engineering — doing what big companies do. What these companies do is solve for their problems, not yours. Blindly following their advice means you end up over indexing on the wrong risks. The material in books and on blogs is heavily biased towards late stage engineering. People simply have more time to write when there’s an existing product with stable cash flow and growth. And that’s why it’s so important to think for yourself and understand your idiosyncratic risks.

Read More
Andreas Fragner Andreas Fragner

Overconfidence and future-proofing

“Just to future-proof things” is perhaps the most common argument I hear to defend questionable design decisions. Future-proofing only makes sense if you have some idea of what the future will look like and you understand what the tradeoffs are.

Kevin Kelly points out that trying to future-proof your technology stack is often a bad investment. Since it’s hard to predict how technology will evolve, you’re better off buying just-in-time. Similarly, since it’s hard to predict how your requirements will change, you might be better off building just-in-time. It’s an interesting question why future-proofing is so pervasive then, and if indeed we do too much of it.

Future-proofing makes sense if you can predict the future in some way. If you have an idea of how your needs will change, it makes sense to invest today to reduce or avoid the cost of making changes later. Future-proofing can also make sense as a hedge. If the impact of a change in requirements is catastrophic, it’s sensible to take measures now, even if you don’t have an idea of what the probability is of that change happening. For example, turning customers away because of stock-outs (→ manage your inventory more conservatively), losing user trust by turning off a service when it’s most needed (→ provision more compute than you need), or trading off of stale information due to a data feed outage (→ have multiple redundant sources).

In writing software, future-proofing takes many forms:

  • Make an interface more general than it needs to be

  • Decouple components more than you need to

  • Provision more storage and compute than you need

  • Make scalability the primary design goal, even if you’re nowhere near scale

  • Make performance a primary design goal, even if it’s not important to your value prop

  • Implement speculative features instead of improving the core of your product

“Just to future-proof things” is perhaps the most common argument I hear to defend questionable design decisions. And conversely, “over-engineering” is the most common counterargument.

I think there are two reasons why we future-proof too much:

  • We’re overconfident in our ability to predict the future — Even with a lot of relevant experience and information, people routinely fail to anticipate future needs correctly. One aspect of this is over-optimism. We tend to overweigh positive scenarios (e.g. we will grow and scale rapidly), and underweigh negative ones (e.g. we will fail to find product-market fit and run out of money). Future-proofing tends to be concerned only with the former.

  • We’re ignorant of the tradeoffs — Arguments in favor of future-proofing align with our inherent bias for risk aversion (it’s the safe thing to do), so they’re easy to get on board with. Understanding the costs and tradeoffs is much more subtle and harder to get intitution for.

The second error — ignoring the tradeoffs — is especially detrimental in early stage ventures where opportunity cost can be infinite. If things don’t ship in time, the whole venture can fail. Not only was the effort that went into future-proofing wasted, the entire effort was wasted. Shipping something that works reasonably well today is often a far better outcome than shipping something late that works in all future scenarios. While this seems to be well-understood, it’s amazing how few founding teams manage to internalize this. A huge amount of great technology has been built that never made it into customers’ hands.

This is not to say we should never future-proof. The key is to appreciate the tradeoffs of doing so — and our biases. Great founders — great engineers — understand the constraints the business as a whole is under. A rough expected ROI analysis can go a long way. While it’s impossible (and pointless) to try to get an exact answer, it forces you to think about costs and the parts of the distribution you’d typically ignore. How much is doing X going to cost us in engineering time (NB: multiply by 3)? If we don’t do X, will the product still work? What’s the impact on our value prop? If we don’t do X, what else could we do? These are the kinds of questions you need to ask. Don’t future-proof because it’s the right thing to do in isolation. Understand the context.

Read More
Andreas Fragner Andreas Fragner

The LLM productivity puzzle

Code generation is arguably one of the most interesting applications of LLMs, and one of the first with real commercial use (Copilot/Codex, Codegen, etc.). If you spend time on the internet these days you’ll see people claim productivity gains ranging from 0 to 100x, selection-biased to the high end (1). Whenever you see several orders of magnitude of disagreement, it’s worth trying to understand why.

Code generation is arguably one of the most interesting applications of LLMs, and one of the first with real commercial use (Copilot/Codex, Codegen, etc.). If you spend time on the internet these days you’ll see people claim productivity gains ranging from 0 to 100x, selection-biased to the high end (1). Whenever you see several orders of magnitude of disagreement, it’s worth trying to understand why.

While the extremes can almost certainly be explained as either deliberate hyperbole (promoters with no real experience writing code) or uninformed contrarianism (naysayers who have not made any serious attempt at using LLMs), there is a simpler and less cynical explanation for the divergence: It’s a reflection of the diversity of tasks involved in writing software.

Software development means a lot of different things, and it’s only natural to expect a new tool to be more suited to some types of engineering than others. If you build a standard component from scratch (e.g. a web dashboard with simple UI), odds are the requirements can be specified in a reasonable size prompt. If you’re building on top decades of legacy code with lots of non-obvious design decisions baked in, then (a) communicating that context to the model is hard (i.e. would require a long sequence of carefully crafted prompts), and (b) even if you manage to, it might not be able to make sense of it. As far as we know, LLMs don’t understand the structure of code at any fundamental level and it’s not clear that they can pick up on the non-local context required to speed up development on complex tasks by, say, 10x.

All this means that your mileage will vary depending on the kind of engineering you do. From what I’ve seen, LLM enthusiasts tend to work on things that have a high degree of isolation and require relatively little context, while the naysayers work on systems with lots of proprietary frameworks. To be sure, LLMs can be useful for either type of work but it’s clear that you’ll find it easier to get good results on the former. The key lies in using your intuition as an engineer — and your understanding of how LLMs work — to pick the right tasks.

A simple example that highlights the divergence in perceived usefulness is code translation. People variously report perfect results (translated code compiles and works as intended) to useless fragments (translated code doesn’t run, needs a lot of fixes). I’ve experienced both ends of the spectrum, even within the same language pair. Translating utility functions works flawlessly almost always. On the other hand, a recent attempt at translating a method from Node to Go using ChatGPT failed miserably since the function was using protobuf-generated objects and the model wasn’t able to figure out how attribute assignment differed between the Node and Go bindings.

It’s early days for LLM code generation and I’m certain we’ll see a lot of improvement over time. How quickly this happens remains to be seen. The fact that LLMs perform well on program synthesis is considered to be the result of “emergence”: Training on large amounts of commented code gives the model a weak supervised signal for code generation (2). If you make the model large enough and the datasets big enough, the ability to generate code from prompts emerges. I remain skeptical that the kind of understanding of non-local context needed for complex engineering tasks can emerge simply by scaling to ever larger models and datasets (3).


If you write a lot of code and use LLMs to do it, reach out on Twitter or email me. I’m keen to collect more data and hear about other people’s experiences.

Footnotes

(1) The loudest people tend to either have had the most success in using them or have a vested interest in raising attention (e.g. promoting their coding bootcamp or youtube channel).

(2) https://arxiv.org/abs/2203.13474

(3) [Edit 2023-04-01] Steve Yegge at Sourcegraph highlights an intriguing approach to overcoming context size limitations of LLMs — using code search to optimally populate context for a given prompt.

Read More