The siren call of wrong-way dependencies
Dependencies that point in the wrong direction are one of the easiest and most costly design mistakes one can make.
Dependencies that point in the wrong direction are one of the easiest and most costly design mistakes one can make. Easy because they’re often the path of least resistance. Costly because they make it harder to make changes.
What’s a wrong-way dependency? Higher level, more abstract components that depend on lower level, more concrete ones. Dependencies should be inwards pointing, in the sense that the more fundamental parts of your system sit on the inside and the implementation details on the outside [1].
Introducing such wrong-way dependencies is extremely easy and I see it done all the time, even by experienced engineers. They give you short-term gain for long-term pain.
Consider the following example: Suppose your system interfaces with a bunch of different third-party service providers which add different functionality to your product (say, different payment gateways for different payment methods). Further, suppose that your customers need to be registered in advance with each provider prior to being able to use the functionality offered by it. For any given customer, you need to be able to tell at runtime whether the functionality is available to them (e.g. they can use a given payment method). There are two ways to address this:
Add a boolean property to your customer model for each provider that indicates whether a customer has been registered with that provider, defaulting to false.
Add a way to query each service provider for whether a customer has been registered with it. The provider’s API might support this directly, or if not we’d wrap it and maintain a list of registered customers ourselves.
Option 1 is tempting because it involves less work at the outset. But now suppose we want to make any of the following changes:
Introduce a new service provider
Change the identifier or reference for a service provider
Add more fine-grained permissions
All of these now entail touching customer code or data, even though they have no bearing on the core logic for managing customers. Changes are now more costly and riskier. Option 1 was faster to implement but has increased the blast radius of changes significantly.
How do you avoid wrong-way dependencies? Think about what kinds of product changes you will need to make in the future, then avoid dependencies that make them difficult. Of course it’s hard to predict how requirements might change, but the big axes are usually fairly obvious. Avoid speculation and focus on the obvious ones.
[1] Bob Martin makes this point in many ways in Clean Architecture. “Depend in the direction of stability” is a particularly succinct one.
Summer reading list 2024
Books have a nice kind of survivorship bias: If something is still being read after decades or centuries, it must contain some universal truths or be useful at a fundamental level.
Books have a nice kind of survivorship bias: If something is still being read after decades or centuries, it must contain some universal truths or be useful at a fundamental level. So focusing on the classics tends to be good reading advice. While I don’t have any actual classics on this year’s list, many books do turn out to be 30+ years old and that’s probably not a coincidence.
Sam Walton — Made in America: Walton wrote his autobiography in the final year of his life when he was terminally ill, but you wouldn’t be able to tell from reading it. Full of optimism, drive and passion for “minding the store”, Walton was a world-class operator who understood the principles of great business are simple yet hard to execute well: Customer obsession, controlling your costs, hiring the right people at the right time and making everyone an owner.
Akio Morita — Made in Japan: Written 6 years before Walton’s autobiography, this is Morita’s account of how he started Sony in the rubble of post-war Japan and built it into one of the largest consumer electronics companies at the time. Sony’s rise mirrors that of the Japanese economy so this reads well as a front row account of the country’s evolution from low-cost producer of low-value-add goods to industrial powerhouse — stopping at its de-facto peak in 1986.
Bernard Moitessier — The Long Way: Moitessier’s account of his 1968/69 non-stop solo sail around the world. I’ve long been fascinated by endurance, physical and mental, and what drives people to seek out such extremes. The Long Way proved to be especially insightful, capturing the psychological push and pull of being out at sea for months without contact with the rest of world. One with his boat and one with the universe, Moitessier describes even the most dangerous situations with a meditative calm.
Edward Chancellor (ed.) — Capital returns: Collection of investment reports from Marathon Asset Management. Capital cycle theory — the lagged relationship between supply and demand, and the resulting oscillation between periods of underinvestment/high returns and overinvestment/low returns — is another great example of a set of simple yet powerful ideas that are hard to execute on. I came across it when reading Nick Sleep and Qais Zakaria’s collection of investor letters earlier this year (also highly recommended).
Kent Beck — Tidy first: I’ve been writing code for 15+ years but only recently started to think about software design more systematically. Beck’s book is known for its simple and pragmatic take on improving software quality (first half of the book), and it’s indeed refreshingly actionable if you’ve read any of the standards on the topic. But what really stands out is the second half. Using ideas from option pricing, Beck shows that well-designed software is valuable primarily because it increases optionality. But whether the premium you pay for that option is worth it depends on how volatile the requirements at. At sub 150 pages, takes the prize for highest value per page this year.
It’s hard to grow without trust
Trust is a key enabler of growth (and conversely, lack of trust a key inhibitor).
Ben Kuhn makes a great observation in his latest post: Trust is a key enabler of growth (and conversely, lack of trust a key inhibitor). If you don’t trust someone’s work, you’re going to spend extra effort double checking their output — reviewing their designs, second-guessing their technical decisions, fine-combing their PRs, etc. This can dramatically impact bandwidth and output of your team. The worst case occurs when trust is low and the cost of verifying correctness is high. People will just default to rolling their own solutions in that case instead of building on top of other people’s work, resulting in lots of duplicated effort. Lack of trust is also why the true cost of a bad hire far exceeds the nominal value of their salary and the opportunity cost of the hiring decision — they consume more bandwidth than they add because their work can’t be trusted.
How do you build trust? There’s no shortcut — trust must be earned. To become a trusted engineer or operator, you need to:
Consistently produce high quality work
Pay attention to details, think about edge cases, anticipate failure modes
Be highly responsive and take the initiative — be the person on support rota that makes everyone sleep well at night
Be serious about reviewing other people’s work and give detailed, actionable feedback
I think the last point is generally under appreciated. The most effective engineers and operators are not just good ICs but actively work to build trust in others by helping them improve. They want to be trusted, but they also want to be able to trust others because they know it will make the whole team more productive. A high degree of ownership ties in directly with this. If you take ownership of your team’s product/system, you’ll spend more time on quality control and ensuring changes to it can be trusted.
Care about how the work is done
High performers care deeply not just about doing the work but how it’s done.
High performance can manifest itself in different ways and it’s important to recognize that. But one thing that all high performers I know do is to approach their work with a kind of metacognition: They care deeply not just about doing the work but how it’s done. They obsess about efficiency — from small details (keyboard shortcuts, editor configuration, etc.) to general strategies (how to structure their day, how to minimize distractions, knowing when they’re at peak performance, etc.). They understand the power of iteration and that you can only get better if you constantly examine and revise your own work. Great engineers, for example, have a habit of reviewing their own PRs because it gives them a top-down view of their output and a chance to see whether they could’ve solved the problem differently.
Importantly, high performers take a long-term view in their pursuit of efficiency. They view their work as a craft to be refined over years, not something that has a definitive end state. This long-term orientation means they tend to strike a better balance between exploration and exploitation. Sometimes it makes sense to take the longer route because you get to learn a new technique that will speed you up in the future. The dividends of such calibrated detours often materialize more quickly than people think, but the pressure to use what you already know to ship something now can be immense.
People often brush the kinds of details high performers care about aside as not mattering in the bigger scheme of things. I’d argue these people don’t understand how high performance comes about. It’s the accumulation of a hundred micro optimizations that add up logarithmically. Doing any one thing won’t make a difference on its own, but combined they compound and get you from 70th to 99th percentile. Better keyboard shortcuts and a clean desk for example won’t make you magically produce outstanding work but they reduce context switching and keep you in flow state, which is essential to solving hard problems.
Because people tend to misunderstand how high performance is achieved, its markers are also not tested for during interviews. Looking at things like typing speed and how quickly someone moves between their editor and browser is often ignored in interviews, but in my experience, can give useful signal. Plus the nice thing about these aspects is that you can glean them passively by observation. No need to come up with tricky questions to uncover them.
So, if you care about performing at the highest level, analyze your own output and pay careful attention to how you do things. And just to state the obvious: Don’t mistake the map for the territory. Obsessing about efficiency and self-improvement are markers of high performance but they are not the end goals. Know what you want to achieve first, then work on getting there faster.
Notes:
Obsessing about efficiency is something that high-performing knowledge workers have in common with top athletes. Details matter. Practice and constant tweaking get you there. But it happens to be a lot harder to recognize that this is true for knowledge work compared to sports. I suspect this might be because performance is harder to measure and there is no direct, one-to-one competition involved. But while “winning” may not be as clearly defined for knowledge work, approaching it with a similar rigor as an athlete can still lead to dramatically different outcomes over the course of a career — regardless of whether measured in wealth, happiness, impact or whatever else matters to you.
Using fewer parts
Fewer parts make for better software and better products.
The best-performing firms make a narrow range of products very well. The best firms’ products also use up to 50 percent fewer parts than those made by their less successful rivals. Fewer parts means a faster, simpler (and usually cheaper) manufacturing process. Fewer parts means less to go wrong; quality comes built in. And although the best companies need fewer workers to look after quality control, they also have fewer defects and generate less waste.
— Yvon Chouinard, Let my people go surfing
Chouinard’s observation applies to software products almost verbatim. Using fewer parts makes for better software: Easier to maintain, easier to extend, better margins. But what does “fewer parts” mean? And how do you know which ones to remove?
Fewer parts means making parts reusable. A good design minimizes number of components at constant functionality. That means avoiding duplication and making things reusable. If you can reimplement a system with a smaller number of components (functions, classes, services, etc.), that’s a sign that the original solution was either over- or under-engineered. Over-engineered because it introduced abstractions that weren’t necessary; under-engineered because it failed to identify reusable parts. It can be tempting to make fewer but larger components but those almost always end up being less re-usable. You might have fewer functions in such a design but you don’t have fewer parts.
Fewer parts means fewer representations of the data. All else equal, the amount of logic required to support n representations of the same data scales like n². It’s not uncommon for teams to maintain protobuf models, SQL schemas, Open API specs, GraphQL schemas, etc. all to support a single product. They might have a source of truth that defines the “core” data models (e.g. in protobuf), but still end up spending a ton of bandwidth on maintaining model converters and crafting migrations. Most people intuitively prefer to have fewer data representations, but the challenge is that different applications typically need different views or different derived properties of the data. That can lead to a proliferation of derived models which may not have strict one-to-one relationships with the original models.
Fewer parts means fewer languages and fewer tools. There is almost never a good enough reason to add another language to your stack. The increase in complexity and maintenance burden is consistently underestimated vs. the benefits. The same goes for databases. Performance reasons are often not strong enough to justify adding a new type of DB to cater to your latest special use case.
Fewer parts means smaller teams. Smaller teams spend less time coordinating and more time building and owning things. In most start-ups, a small number of engineers (3-4) build the first iteration of the product, which ends up generating 80% of the lifetime value of the product. It’s clearly possible to build complex things with a small, focused team. But as more money is raised, engineering teams balloon because they lose focus and add components that are not directly aligned with creating customer value. It’s Parkinson’s law at work. Companies perceive things to be mission-critical for the product, then craft a budget based on that, which must then be used once allocated, so more people are hired who then produce yet more parts, and so on.
Fewer parts means fewer counterparties. Most things break at the boundaries (especially if they’re external). The greater the surface area, the riskier and the harder to maintain a system becomes. Prefer to deal with a small number of high-quality vendors, and be prepared to pay a premium. The obvious interjection here is concentration risk: If a key vendor goes into administration or decides to drop the product you rely on, that might pose an existential risk to you. Such counterparty risk can indeed matter greatly and needs to be considered, but I’ve found in practice it’s often more manageable than people think. There are SLAs and contractual notice periods, and the majority of counterparties will honor them, giving you time to adjust. If you do need to replace a vendor, you start out with a much clearer picture of the requirements and the scope of the integration, which cuts down on time-to-market.
If using fewer parts is a good idea, how come modern software production appears to be so bloated? Dozens of vendors, a stack that’s 7 layers deep and includes 4 languages, teams of 60+ developers, etc. feel like the norm. Clearly, companies believe they need this many parts to deliver value to customers. Few people are deliberately trying to waste resources after all. But the problem is that people lose sight of what activities actually create value. As a company grows, a disconnect starts to develop between the activities performed by its employees and the value that is delivered to customers. In a 10 person firm, everyone speaks to customers, everyone knows the value chain and everyone uses the product. In a 1000 person firm, by definition most employees have never spoken to customers and may work on parts of the system that are increasingly far removed from what the customer sees. This is one instance where great management can make a huge difference. In well-managed firms, management goes to great lengths to communicate the link between firm activities and value creation. The focus is on customers and the problems they face, rather than process and efficiency gains. If you focus on serving your customers better, efficiency will take care of itself.
A few principles I follow to keep the number of parts small:
Hire fewer but better people and pay them more.
Work with fewer but better vendors and be willing to pay a premium. Be systematic about selecting them and understand the risks.
Each project you decide to allocate resources to must have a 3-4 sentence description of how it creates value for customers. People often struggle with this if the work is abstract or far removed from what the customer sees (say, work on infrastructure) but I’ve found it’s always possible if the work is worth pursuing.
Early-stage engineering
Early on you need to be fast. And to do that you have to have the confidence to break with best practices.
Early on you need to be fast. Your team, your stack, your infrastructure — they all need to be set up for that. To do that, you have to have the confidence to break with best practices. That confidence comes from knowing what risks actually matter in your context. The risks you care about when you’re building version 1.0 are very different from the risks a large organization cares about. The way you approach engineering has to reflect that.
When you start out building something new, everything is in flux. Your requirements aren’t understood yet, your data model will evolve, your API boundaries will shift, your interfaces haven’t firmed up yet, etc. That’s natural, and it’s key to embrace that uncertainty when working on something new. At that point it’s all about optimizing feedback loops. Make them as tight and fast as possible. What does that look like in practice:
Instrument everything — If you move fast, things break more often. You need to be able to figure out quickly and easily what’s wrong. People often interject here that adding instrumentation is extra work you can’t afford at this stage. The trick is to make it a total no-op. It should take minimal developer effort to get things like tracing, metrics and log aggregation in place. Like, zero is the goal here. Tracing in particular is such an easy win because it doesn’t require any thinking. Adding a span to a method takes at most one or two lines of (templated) code. Over time you’ll want to capture more information on a span and that takes more thinking, but just knowing the code path something took typically solves like 80% of the puzzle.
Make deployments automated and continuous — This one should be non controversial at this point. Every merge into
main
should trigger an image build, which gets deployed automatically. No action required. No release cycles. At Kappa we do on the order of 100s of “deployments” a day. A change is live in the dev cluster within 2 minutes of being merged. You get (near) immediate feedback.Make running things locally easy and cheap — Even faster than deploying things is to just run them locally. Make it as easy as possible to run services locally and to connect to other services running remotely. Running a whole cluster locally can sometimes be hard given hardware constraints but that’s almost never necessary (Corollary: Buy good machines for everyone, see below). One interesting development here are services like Modal which try to abstract away the gap between local vs. cloud infra completely.
Make writing tests easy and cheap — The reason people don’t write more tests is because it’s hard and takes time. So it makes sense to invest to bring that cost down — e.g. by auto-generating mocks for your services, or writing sample data generators to give you representative data for your domain. It’s pretty clear at this point that LLMs have changed the game for unittests. It takes all of two clicks / two copy-pastes now to generate a reasonable test suite. Most of the time there are issues/mistakes the model makes, but they’re typically easy to fix. Net-net it can still be a big time saver.
Integration tests over unit tests — Integration tests that run on every build or multiple times a day give you fast, meaningful feedback. Modern systems are distributed and it’s the boundaries were most of the bugs sit. Unit tests are fine but if you have to choose on what to spend your time on, write integration tests. The components of your software obviously need to work in isolation but it’s really the interactions where things go wrong. Especially if those interactions are asynchronous.
Minimize wait times — Waiting for CI to finish, waiting for a code review, waiting for something to build, etc. — these things are especially detrimental to productivity because they keep you from getting closure on one piece of work and discourage you from moving on to the next task. Even if the work itself is done, it still lingers until it’s deployed. This is one strong argument for choosing a language that compiles and builds quickly.
No branch protections — One way to eliminate PR approval wait times is to not require them. Sounds crazy but you wouldn’t believe how much time is wasted waiting for a review on a trivial change (the true cost is even higher than wall time because waiting (and checking) breaks your flow state). So trust your engineers. We’re all adults here. If your team is 5 people with experience, you can coordinate your work often well enough without PRs, just over Slack. You do end up with merge issues at times, but they’re typically infrequent and easy to resolve because early on people tend to work on fairly orthogonal things. Most definitely the time spent on resolving those is easily made up for the increase in velocity.
Minimize task overhead — This one is almost tautological at this point. Maximize interrupted blocks of time for people to focus. Minimize meetings and process.
Automate stack upgrades — A lot of time can be wasted when you don’t update dependencies until you’re forced to for compatibility reasons. That’s when you have to deal with a potentially large number of issues all at once, usually at the worst possible time. This is easy to fix: Just set up Dependabot.
Buy good machines for everyone — The added cost of getting high-spec machines for everyone amortizes literally in a day. Remove the constraint of local hardware as much as you can. The added cost for a team of 5 is totally negligible compared to what you pay on cloud compute.
Hire owners and generalists — The person making a change is also responsible for ensuring that it actually works once deployed. Integration tests go a long way here, but sometimes you actually have to make an API call or open the app and check UX impact. If you wait for QA to catch issues, you’ve wasted 3 days to find out you had a bug somewhere. And because you’re often out of context at that point, it becomes harder to fix.
Understand your team’s strengths — While everyone agrees that hiring great ICs is important, far too little thought goes into team composition. In fact, it’s often completely absent from recruitment plans. This is strange, since in areas outside software engineering, like professional sports, it gets at least as much attention. Building a technical product from scratch is a high performance team sport. You need great individual performers, but you also need them to complement each other, technically and personality-wise.
Some of the points above may sound crazy to someone in a mature engineering org. And for good reason! Your approach has to evolve as your product matures. The key is to understand what risks you need to care about at the point you’re at. Zero-risk deployments, well-managed sprints, carefully groomed tickets, etc. — these things all sound great in isolation, but the risk-adjusted return of doing them is just too low in the beginning.
The only risks you should care about early on are existential ones: (1) Running out of cash before you launch, (2) launching too late to get enough proof points, (3) shipping too late to iterate meaningfully, (4) being too slow to incorporate feedback. The risks that are considered existential in a larger org are just fundamentally different. Reputation, competitive threats, losing customers, losing market share, product stability, service uptime — those things matter when you have an existing product with good traction. But early on, you don’t have many customers yet, and those you do have are (hopefully) more forgiving. There’s typically also little to no competition to worry about. If not, you may want to reconsider what you’re working on.
I believe a significant number of startups die because they cling to all the best practices of later stage engineering — doing what big companies do. What these companies do is solve for their problems, not yours. Blindly following their advice means you end up over indexing on the wrong risks. The material in books and on blogs is heavily biased towards late stage engineering. People simply have more time to write when there’s an existing product with stable cash flow and growth. And that’s why it’s so important to think for yourself and understand your idiosyncratic risks.
Why I don’t give investment advice
When people ask for advice on personal investing, I’ve found they are either looking for confirmation that what they do is great (it’s usually not) or some sort of secret sauce for outperforming the market (which doesn’t exist). So people are inevitably disappointed by the answer. The best advice for personal investing is simple and boring: Diversify, invest with a long time horizon, minimize churn and transaction costs, know your risk profile and liquidity needs.
When people ask for advice on personal investing, I’ve found they are either looking for confirmation that what they do is great (it’s usually not) or some sort of secret sauce for outperforming the market (which doesn’t exist). So people are inevitably disappointed by the answer. The best advice for personal investing is simple and boring: Diversify, invest with a long time horizon, minimize churn and transaction costs, know your risk profile and liquidity needs.
When it comes to investing, people are their own biggest enemy. Lured by the siren call of get-rich-quick schemes, they waste precious time and resources (and nerves) on things like chart reading, momentum-chasing or over-exposing themselves to the latest fad. Gambling is fine, but you should recognize it as such.
Another error I see people often make is trying to copy strategies they can’t possibly compete in. Personal investing is a fundamentally different game from professional investing. Many of the approaches developed in a professional context simply don’t work or don’t make sense for your personal account. They rely on superior market access (e.g. shorts available at low cost, cheap financing, OTC supply), data that is hard to get (e.g. proprietary data sources or very expensive non-proprietary ones) and technology that is hard to build (e.g. ML models capable of dealing with non-stationary effects). You can’t hope to compete on HFT market making or bond arbitrage with the resources of an individual investor. It is highly improbable that you’ll have an analytical or informational advantage here [1].
Strategies deployed by professional investors are also developed with very different objectives in mind. As a professional investing other people’s money, your objective is to maximize management and performance fee income. You (or your employer) run a business after all. Investment management is rife with agency problems — both due to the way fees are structured and the kinds of subscription and liquidity terms investors demand. The interesting thing is that these issues can work to your advantage as an individual investor “competing” against professional investors. Since fees are structured to accrue on a quarterly or annual basis and since clients typically insist on monthly liquidity, there’s a strong short-term bias in the kinds of strategies being used. If your performance is benchmarked against annual returns, investing in a way that optimizes for a 5-10 year horizon just doesn’t make sense. The intermittent drawdowns are going to take you out of business — even without leverage, as clients will see the relative underperformance and pull out. So this is a long-winded way of saying: take advantage of your longer time horizon if you can. This is the one dimension where professionals have a competitive disadvantage vs. you as a personal investor.
Five paragraphs in, I realize I’ve contradicted the title of this post. But better to give some advice in writing once and save the conversations with friends and family for more exciting topics.
Notes
[1] I am talking from my own experience as a public markets investor, but similar considerations around market access and data apply to private markets. Angel investing is an interesting case — I believe it’s possible to have an informational advantage if you have a personal connection to the founding team, or an analytical one if you’ve spent your career in the industry that the founders are trying to disrupt. But clearly those advantages don’t scale. I’m not aware of a good dataset on personal angel investing returns, but would venture to say most non-professional angels lose money in the long run, just like most day traders do in public markets.
Writing summaries is more important than reading more books
One thing I’ve learned over time is to read fewer books but to take the time to write summaries for the good ones. The ROI of spending 2h writing a synopsis is much higher than spending those 2h powering through the next book on your list. Reading is not about page count or speed. What matters is how it changes your thinking and what you take away form it. Optimize for comprehension, not volume.
One thing I’ve learned over time is to read fewer books but to take the time to write summaries for the good ones. The ROI of spending 2h writing a synopsis is much higher than spending those 2h powering through the next book on your list. Reading is not about page count or speed [1]. What matters is how it changes your thinking and what you take away from it. Optimize for comprehension, not volume.
If your goal is to maximize comprehension, you need to ask questions while you read — questions that you yourself must try to answer in the course of reading. This is something I believe curious people do naturally. Forcing yourself to ask questions and to answer them also makes it easy to write a synopsis: When you’re done, simply write down the most important questions you’ve encountered and how the book has answered them. This is the template I use:
In 1-2 sentences, what is the book about as a whole?
What are the 3-4 central questions it tries to answer?
Summarize the answers in one paragraph each.
What are the most important things you have learned personally?
While the end product is short and concise, it takes time and focus to write it. Which is of course why it’s effective: It forces you extract and re-formulate the book’s insights in your own words.
Not coincidentally, I use a similar framework for writing essays: I structure them around questions I’m trying to answer, typically no more than 3-4. If I can’t formulate those concisely, or if there are more than 3-4, it’s usually not worth posting the piece. Without that clarity, it ends up either rambling or shallow and not offering any coherent insights.
Since summarizing leaves you with less time to read, you’ll have to get better at selecting books. I use a combination of two simple techniques for this, topical reading and inspectional reading [2]:
Topical Reading — Each quarter, I select 4-5 topics I care about and want to gain a deeper understanding of. Start wide and get a sense of what the important works are for each topic. Collate a broad list of works.
Inspectional Reading — Use inspectional reading to prune the list for each topic down to max. 2 books. Inspectional reading is simply systematic skimming or pre-reading: Read the summary on the back of the book, and the preface or introduction, study the table of contents to get a general sense of the book’s structure, read the summary statements at the beginning or end of each chapter. This typically takes no more than an hour and can be extremely effective at filtering out works that are not useful or irrelevant to you.
It’s surprising how even many of the most prolific readers I know are unaware of the value of inspectional reading. Most readers start on page one of a book and plow through until they’re done or decide to cut their losses — without ever reading the table of contents or the preface.
One great alternative to writing summaries is to talk about the books you’re reading. Explaining the ideas you’re reading about to someone else is one of the best ways to engage with the material, since (a) it forces you to formulate it in your own words, and (b) they might challenge the ideas and get you to examine them more critically.
Footnotes
[1] Speed reading — very hyped in tech circles a few years ago — is largely a scam in my view. Beyond a certain point, there is simply a hard tradeoff between speed and comprehension.
[2] Adler & Van Doren, How to read a book (1972)
Overconfidence and future-proofing
“Just to future-proof things” is perhaps the most common argument I hear to defend questionable design decisions. Future-proofing only makes sense if you have some idea of what the future will look like and you understand what the tradeoffs are.
Kevin Kelly points out that trying to future-proof your technology stack is often a bad investment. Since it’s hard to predict how technology will evolve, you’re better off buying just-in-time. Similarly, since it’s hard to predict how your requirements will change, you might be better off building just-in-time. It’s an interesting question why future-proofing is so pervasive then, and if indeed we do too much of it.
Future-proofing makes sense if you can predict the future in some way. If you have an idea of how your needs will change, it makes sense to invest today to reduce or avoid the cost of making changes later. Future-proofing can also make sense as a hedge. If the impact of a change in requirements is catastrophic, it’s sensible to take measures now, even if you don’t have an idea of what the probability is of that change happening. For example, turning customers away because of stock-outs (→ manage your inventory more conservatively), losing user trust by turning off a service when it’s most needed (→ provision more compute than you need), or trading off of stale information due to a data feed outage (→ have multiple redundant sources).
In writing software, future-proofing takes many forms:
Make an interface more general than it needs to be
Decouple components more than you need to
Provision more storage and compute than you need
Make scalability the primary design goal, even if you’re nowhere near scale
Make performance a primary design goal, even if it’s not important to your value prop
Implement speculative features instead of improving the core of your product
“Just to future-proof things” is perhaps the most common argument I hear to defend questionable design decisions. And conversely, “over-engineering” is the most common counterargument.
I think there are two reasons why we future-proof too much:
We’re overconfident in our ability to predict the future — Even with a lot of relevant experience and information, people routinely fail to anticipate future needs correctly. One aspect of this is over-optimism. We tend to overweigh positive scenarios (e.g. we will grow and scale rapidly), and underweigh negative ones (e.g. we will fail to find product-market fit and run out of money). Future-proofing tends to be concerned only with the former.
We’re ignorant of the tradeoffs — Arguments in favor of future-proofing align with our inherent bias for risk aversion (it’s the safe thing to do), so they’re easy to get on board with. Understanding the costs and tradeoffs is much more subtle and harder to get intitution for.
The second error — ignoring the tradeoffs — is especially detrimental in early stage ventures where opportunity cost can be infinite. If things don’t ship in time, the whole venture can fail. Not only was the effort that went into future-proofing wasted, the entire effort was wasted. Shipping something that works reasonably well today is often a far better outcome than shipping something late that works in all future scenarios. While this seems to be well-understood, it’s amazing how few founding teams manage to internalize this. A huge amount of great technology has been built that never made it into customers’ hands.
This is not to say we should never future-proof. The key is to appreciate the tradeoffs of doing so — and our biases. Great founders — great engineers — understand the constraints the business as a whole is under. A rough expected ROI analysis can go a long way. While it’s impossible (and pointless) to try to get an exact answer, it forces you to think about costs and the parts of the distribution you’d typically ignore. How much is doing X going to cost us in engineering time (NB: multiply by 3)? If we don’t do X, will the product still work? What’s the impact on our value prop? If we don’t do X, what else could we do? These are the kinds of questions you need to ask. Don’t future-proof because it’s the right thing to do in isolation. Understand the context.
The LLM productivity puzzle
Code generation is arguably one of the most interesting applications of LLMs, and one of the first with real commercial use (Copilot/Codex, Codegen, etc.). If you spend time on the internet these days you’ll see people claim productivity gains ranging from 0 to 100x, selection-biased to the high end (1). Whenever you see several orders of magnitude of disagreement, it’s worth trying to understand why.
Code generation is arguably one of the most interesting applications of LLMs, and one of the first with real commercial use (Copilot/Codex, Codegen, etc.). If you spend time on the internet these days you’ll see people claim productivity gains ranging from 0 to 100x, selection-biased to the high end (1). Whenever you see several orders of magnitude of disagreement, it’s worth trying to understand why.
While the extremes can almost certainly be explained as either deliberate hyperbole (promoters with no real experience writing code) or uninformed contrarianism (naysayers who have not made any serious attempt at using LLMs), there is a simpler and less cynical explanation for the divergence: It’s a reflection of the diversity of tasks involved in writing software.
Software development means a lot of different things, and it’s only natural to expect a new tool to be more suited to some types of engineering than others. If you build a standard component from scratch (e.g. a web dashboard with simple UI), odds are the requirements can be specified in a reasonable size prompt. If you’re building on top decades of legacy code with lots of non-obvious design decisions baked in, then (a) communicating that context to the model is hard (i.e. would require a long sequence of carefully crafted prompts), and (b) even if you manage to, it might not be able to make sense of it. As far as we know, LLMs don’t understand the structure of code at any fundamental level and it’s not clear that they can pick up on the non-local context required to speed up development on complex tasks by, say, 10x.
All this means that your mileage will vary depending on the kind of engineering you do. From what I’ve seen, LLM enthusiasts tend to work on things that have a high degree of isolation and require relatively little context, while the naysayers work on systems with lots of proprietary frameworks. To be sure, LLMs can be useful for either type of work but it’s clear that you’ll find it easier to get good results on the former. The key lies in using your intuition as an engineer — and your understanding of how LLMs work — to pick the right tasks.
A simple example that highlights the divergence in perceived usefulness is code translation. People variously report perfect results (translated code compiles and works as intended) to useless fragments (translated code doesn’t run, needs a lot of fixes). I’ve experienced both ends of the spectrum, even within the same language pair. Translating utility functions works flawlessly almost always. On the other hand, a recent attempt at translating a method from Node to Go using ChatGPT failed miserably since the function was using protobuf-generated objects and the model wasn’t able to figure out how attribute assignment differed between the Node and Go bindings.
It’s early days for LLM code generation and I’m certain we’ll see a lot of improvement over time. How quickly this happens remains to be seen. The fact that LLMs perform well on program synthesis is considered to be the result of “emergence”: Training on large amounts of commented code gives the model a weak supervised signal for code generation (2). If you make the model large enough and the datasets big enough, the ability to generate code from prompts emerges. I remain skeptical that the kind of understanding of non-local context needed for complex engineering tasks can emerge simply by scaling to ever larger models and datasets (3).
If you write a lot of code and use LLMs to do it, reach out on Twitter or email me. I’m keen to collect more data and hear about other people’s experiences.
Footnotes
(1) The loudest people tend to either have had the most success in using them or have a vested interest in raising attention (e.g. promoting their coding bootcamp or youtube channel).
(2) https://arxiv.org/abs/2203.13474
(3) [Edit 2023-04-01] Steve Yegge at Sourcegraph highlights an intriguing approach to overcoming context size limitations of LLMs — using code search to optimally populate context for a given prompt.
Grit Multipliers
People sometimes argue that you’re more likely to build a successful business as a solo founder. The argument goes that without co-founders, you get faster decision making, and since speed is one of most important advantages you have, you get a better shot at iterating yourself to success. There is an important point that’s often missed here: Having co-founders significantly lowers the probability that you’ll give up.
People sometimes argue that you’re more likely to build a successful business as a solo founder. The argument goes that without co-founders, you get faster decision making, and since speed is one of most important advantages you have as a startup, you get a better shot at iterating yourself to success. There is an important point that’s often missed here: Having co-founders significantly lowers the probability that you’ll give up.
If you commit to building something big and risky, odds are that you’re mentally tough and resilient. Your psychology will, on average, push you to persevere. But there can be a lot of volatility around that average as you go from exuberance to desperation, sometimes over the course of a single day. When the odds are stacked against you, the temptation to give up can become overwhelming.
Every venture I’ve worked on, I’ve gotten within striking distance of giving up. Pretty much every founder I know will tell you the same (at least in private). When you hit a low, you start to rationalize quitting. It starts to feel like the right thing to do because, rationally, it often is. Having co-founders counteracts this thinking in two ways.
Rationality override
In a strong team, you don’t feel like you have a license to give up. There’s a sense of responsibility towards your co-founders that can override any rational argument in favor of quitting, even if quitting would maximize your own risk-return function in the short term. Compassion can act as a strong deterrent. And so does reputation, which shows that this behavior may not be entirely altruistic (or irrational): If you develop a reputation for letting people hang, that makes it much less likely you’ll convince great people to work with you in the future. When I look for co-founders and team members, I index heavily on personal references. Having strong references from failed ventures is a much more powerful signal than having weak references from a successful one.
Volatility reduction
There is a second effect of having co-founders that makes you less likely to rationalize quitting to begin with. If your co-founders sense you’re at risk of quitting, the right thing for them to do is to project optimism. That’s especially true in the early days. It’s hard to survive the departure of a team member if that means you’ll lose 25% of your workforce. Your co-founders will actively (and often subconsciously) push your psychology to revert to its natural state of perseverance. I’ve seen this many times. When I’m gloomy, my co-founders tend to be optimistic, and vice versa. This reduces downside volatility of the emotional rollercoaster you’re on. The lows become less low (or shorter), so you don’t end up rationalizing quitting to begin with. (1)
Grit multipliers
Seeing these two effects in action is a sign of the strength of a founding team. Great co-founders act as grit multipliers. While every team member on their own would’ve probably given up long ago, together you push through. I think that’s where the real value of having co-founders lies. Complementary skillsets, diversity of thought, etc. — those things are important, but you can also get those by hiring well. Compared to other factors such as speed of decision-making, I think grit is a far bigger determinant of startup success. It seems true that in theory being faster means you can afford to be less gritty since the periods of pain can be be shorter, but it’s not clear it reduces their intensity. And that’s what I’ve found matters most.
Epilogue
Grit multiplier effects also exist in other domains. In life partnerships, for example, where stressful experiences like raising kids become not only manageable but hugely enjoyable. My wife at least 10x’s my perseverance on that front. Another example is endurance sports with a team component, such as relay races or triathlons with team scoring. Even if there is no explicit team element, fellow racers (and spectators) can often have the same effect and help you push through the wall.
“I'm convinced that about half of what separates successful entrepreneurs from the non-successful ones is pure perseverance.” — Steve Jobs
“Not quitting is infectious. When you have a leader with that attitude [an absolute refusal to quit] at the head of the company, that is a big deal.” —Marc Andreessen
Thank you to Blaise Buma and Stephanie Ng for thoughtful comments on this post.
Notes:
(1) My co-founder Blaise points out that even if you are emotionally strongly-correlated with your co-founders, you tend to work on different aspects of the business, and good news in one part can offset bad news in another. Finally, people often react differently to idiosyncratic events, even if they react the same way on average. Emotional diversification at work.