Using fewer parts
Fewer parts make for better software and better products.
The best-performing firms make a narrow range of products very well. The best firms’ products also use up to 50 percent fewer parts than those made by their less successful rivals. Fewer parts means a faster, simpler (and usually cheaper) manufacturing process. Fewer parts means less to go wrong; quality comes built in. And although the best companies need fewer workers to look after quality control, they also have fewer defects and generate less waste.
— Yvon Chouinard, Let my people go surfing
Chouinard’s observation applies to software products almost verbatim. Using fewer parts makes for better software: Easier to maintain, easier to extend, better margins. But what does “fewer parts” mean? And how do you know which ones to remove?
Fewer parts means making parts reusable. A good design minimizes number of components at constant functionality. That means avoiding duplication and making things reusable. If you can reimplement a system with a smaller number of components (functions, classes, services, etc.), that’s a sign that the original solution was either over- or under-engineered. Over-engineered because it introduced abstractions that weren’t necessary; under-engineered because it failed to identify reusable parts. It can be tempting to make fewer but larger components but those almost always end up being less re-usable. You might have fewer functions in such a design but you don’t have fewer parts.
Fewer parts means fewer representations of the data. All else equal, the amount of logic required to support n representations of the same data scales like n². It’s not uncommon for teams to maintain protobuf models, SQL schemas, Open API specs, GraphQL schemas, etc. all to support a single product. They might have a source of truth that defines the “core” data models (e.g. in protobuf), but still end up spending a ton of bandwidth on maintaining model converters and crafting migrations. Most people intuitively prefer to have fewer data representations, but the challenge is that different applications typically need different views or different derived properties of the data. That can lead to a proliferation of derived models which may not have strict one-to-one relationships with the original models.
Fewer parts means fewer languages and fewer tools. There is almost never a good enough reason to add another language to your stack. The increase in complexity and maintenance burden is consistently underestimated vs. the benefits. The same goes for databases. Performance reasons are often not strong enough to justify adding a new type of DB to cater to your latest special use case.
Fewer parts means smaller teams. Smaller teams spend less time coordinating and more time building and owning things. In most start-ups, a small number of engineers (3-4) build the first iteration of the product, which ends up generating 80% of the lifetime value of the product. It’s clearly possible to build complex things with a small, focused team. But as more money is raised, engineering teams balloon because they lose focus and add components that are not directly aligned with creating customer value. It’s Parkinson’s law at work. Companies perceive things to be mission-critical for the product, then craft a budget based on that, which must then be used once allocated, so more people are hired who then produce yet more parts, and so on.
Fewer parts means fewer counterparties. Most things break at the boundaries (especially if they’re external). The greater the surface area, the riskier and the harder to maintain a system becomes. Prefer to deal with a small number of high-quality vendors, and be prepared to pay a premium. The obvious interjection here is concentration risk: If a key vendor goes into administration or decides to drop the product you rely on, that might pose an existential risk to you. Such counterparty risk can indeed matter greatly and needs to be considered, but I’ve found in practice it’s often more manageable than people think. There are SLAs and contractual notice periods, and the majority of counterparties will honor them, giving you time to adjust. If you do need to replace a vendor, you start out with a much clearer picture of the requirements and the scope of the integration, which cuts down on time-to-market.
If using fewer parts is a good idea, how come modern software production appears to be so bloated? Dozens of vendors, a stack that’s 7 layers deep and includes 4 languages, teams of 60+ developers, etc. feel like the norm. Clearly, companies believe they need this many parts to deliver value to customers. Few people are deliberately trying to waste resources after all. But the problem is that people lose sight of what activities actually create value. As a company grows, a disconnect starts to develop between the activities performed by its employees and the value that is delivered to customers. In a 10 person firm, everyone speaks to customers, everyone knows the value chain and everyone uses the product. In a 1000 person firm, by definition most employees have never spoken to customers and may work on parts of the system that are increasingly far removed from what the customer sees. This is one instance where great management can make a huge difference. In well-managed firms, management goes to great lengths to communicate the link between firm activities and value creation. The focus is on customers and the problems they face, rather than process and efficiency gains. If you focus on serving your customers better, efficiency will take care of itself.
A few principles I follow to keep the number of parts small:
Hire fewer but better people and pay them more.
Work with fewer but better vendors and be willing to pay a premium. Be systematic about selecting them and understand the risks.
Each project you decide to allocate resources to must have a 3-4 sentence description of how it creates value for customers. People often struggle with this if the work is abstract or far removed from what the customer sees (say, work on infrastructure) but I’ve found it’s always possible if the work is worth pursuing.
The LLM productivity puzzle
Code generation is arguably one of the most interesting applications of LLMs, and one of the first with real commercial use (Copilot/Codex, Codegen, etc.). If you spend time on the internet these days you’ll see people claim productivity gains ranging from 0 to 100x, selection-biased to the high end (1). Whenever you see several orders of magnitude of disagreement, it’s worth trying to understand why.
Code generation is arguably one of the most interesting applications of LLMs, and one of the first with real commercial use (Copilot/Codex, Codegen, etc.). If you spend time on the internet these days you’ll see people claim productivity gains ranging from 0 to 100x, selection-biased to the high end (1). Whenever you see several orders of magnitude of disagreement, it’s worth trying to understand why.
While the extremes can almost certainly be explained as either deliberate hyperbole (promoters with no real experience writing code) or uninformed contrarianism (naysayers who have not made any serious attempt at using LLMs), there is a simpler and less cynical explanation for the divergence: It’s a reflection of the diversity of tasks involved in writing software.
Software development means a lot of different things, and it’s only natural to expect a new tool to be more suited to some types of engineering than others. If you build a standard component from scratch (e.g. a web dashboard with simple UI), odds are the requirements can be specified in a reasonable size prompt. If you’re building on top decades of legacy code with lots of non-obvious design decisions baked in, then (a) communicating that context to the model is hard (i.e. would require a long sequence of carefully crafted prompts), and (b) even if you manage to, it might not be able to make sense of it. As far as we know, LLMs don’t understand the structure of code at any fundamental level and it’s not clear that they can pick up on the non-local context required to speed up development on complex tasks by, say, 10x.
All this means that your mileage will vary depending on the kind of engineering you do. From what I’ve seen, LLM enthusiasts tend to work on things that have a high degree of isolation and require relatively little context, while the naysayers work on systems with lots of proprietary frameworks. To be sure, LLMs can be useful for either type of work but it’s clear that you’ll find it easier to get good results on the former. The key lies in using your intuition as an engineer — and your understanding of how LLMs work — to pick the right tasks.
A simple example that highlights the divergence in perceived usefulness is code translation. People variously report perfect results (translated code compiles and works as intended) to useless fragments (translated code doesn’t run, needs a lot of fixes). I’ve experienced both ends of the spectrum, even within the same language pair. Translating utility functions works flawlessly almost always. On the other hand, a recent attempt at translating a method from Node to Go using ChatGPT failed miserably since the function was using protobuf-generated objects and the model wasn’t able to figure out how attribute assignment differed between the Node and Go bindings.
It’s early days for LLM code generation and I’m certain we’ll see a lot of improvement over time. How quickly this happens remains to be seen. The fact that LLMs perform well on program synthesis is considered to be the result of “emergence”: Training on large amounts of commented code gives the model a weak supervised signal for code generation (2). If you make the model large enough and the datasets big enough, the ability to generate code from prompts emerges. I remain skeptical that the kind of understanding of non-local context needed for complex engineering tasks can emerge simply by scaling to ever larger models and datasets (3).
If you write a lot of code and use LLMs to do it, reach out on Twitter or email me. I’m keen to collect more data and hear about other people’s experiences.
Footnotes
(1) The loudest people tend to either have had the most success in using them or have a vested interest in raising attention (e.g. promoting their coding bootcamp or youtube channel).
(2) https://arxiv.org/abs/2203.13474
(3) [Edit 2023-04-01] Steve Yegge at Sourcegraph highlights an intriguing approach to overcoming context size limitations of LLMs — using code search to optimally populate context for a given prompt.