-
Notifications
You must be signed in to change notification settings - Fork 0
Commentary:WhereDoesBadCodeComeFrom
This page contains my commentary on "Where Does Bad Code Come From" by CaseyMuratori.
Casey outlines several ways in which software has gotten worse over the last few decades:
- Binaries are huge and take forever to build (the example he cites is TensorFlow taking 4 hours to build on a modern machine)
- Dependency stacks are deep and brittle (imagine the typical NPM project, or a build that must be containerized to work reliably)
- Code is overly complex and software is slow at doing even simple tasks, like displaying a list of images.
Casey makes it clear that it's important to him to fix the problems he outlines, but also disclaims certain things:
- Fixing software's badness won't necessarily help software companies make money, because the bar for competition is so low.
- Fixing software's badness won't necessarily make software development more enjoyable (or less enjoyable).
His main claim is just that we are underperforming our potential as an engineering discipline. We could be making software that is more performant and easier to work with, but we aren't.
Casey lays the groundwork for the rest of the talk by establishing a couple metaphors for software development. The first metaphor is what I'll call the "training metaphor", which describes how people learn to write code.
He says that learning to code requires two things: you have to learn to type syntactically valid programs into the computer, and you have to learn to filter out programs that are "bad" (won't work, don't do anything useful, etc.) That is, much like training an AI, you need to train your generator function (which writes programs) and your fitness function (which evaluates programs). By writing programs and seeing them work (or not work) you get feedback that gradually improves these two functions until you're an expert programmer.
The second metaphor in the talk describes the experience of writing large, complex software systems. Casey likens software development to exploration in unfamiliar terrain. Explorers have a goal, perhaps a map, perhaps tools like a compass. Armed with these, they try to reach their destination. This is an iterative process: you don't know the exact route in advance, so you set out in what you hope is the right direction, constantly checking your tools and your map to see how you're doing. You may get lost and have to backtrack. Sometimes the route seems impassable. Sometimes you get hopelessly lost and the project fails.
With these metaphors established, Casey critiques popular software design heuristics (he picks on SOLID especially). He says these heuristics train our filter function to pattern-match on things that don't actually matter. When we evaluate whether code conforms to the SOLID principles, we aren't directly evaluating useful properties like readability, maintainability, or performance. We're evaluating proxies that might sometimes align with these goals.
He likens this to giving our metaphorical explorers tools that won't actually help them navigate. Instead of a map and a compass, we give them a hammer and a ruler.
He also criticizes SOLID for not stating what measurable outcome it's trying to achieve (personally, I think he overstates this case, as proponents of SOLID talk about readability, maintainability, and testability a lot). He suggests that we ought to measure the things we want, like execution performance and development time, and adjust our approach to optimize those metrics.
Though he claims not to like catchy acronyms, Casey comes up with a pretty good one: WARMED. This stands for the properties good code should have, the ones he says we should explicitly seek. WARMED is a list of things you should be able to do with code:
- Write
- Agree (on what it should look like)
- Read
- Modify
- Execute (Casey focuses on execution speed)
- Debug
He argues that this is better than SOLID because these are things we actually want to do with code, and moreover, they're measurable: we can measure how long it takes to write code, modify it, agree on it, read it, run it, or debug it.
The main point of the talk is that software development approaches are often CargoCults that have little to no leverage against the goals the software is trying to achieve (e.g. correctness, performance, maintainability). The main example given is that engineers misapply the SOLID principles in ways that do nothing to help the software project achieve its goals. Casey's proposal is that we should measure the desired outcomes and optimize our approaches toward those metrics.
I broadly agree that the problem of CargoCult software development is real, and that metrics may be helpful in solving it. However, I think there are a few flaws in Casey's argument:
Casey criticizes SOLID as if it's the primary cause of software's badness. But I can think of only one example from my experience of code that was bad because of a misapplication of SOLID.
Now, I'm not claiming (for the purposes of this discussion) that SOLID is good. Whether SOLID is good or bad actually isn't relevant, because SOLID doesn't seem to be widely known or used in the industry. Therefore I don't see how it could be a major contributor to software's badness. I'd bet money that if you surveyed the programmers currently working in Silicon Valley, not more than 10% of them would be able to name all five SOLID principles.
Moreover, there's no clear link between SOLID and the specific kinds of software badness Casey cites (huge codebases and binaries; slow TensorFlow builds; fragile, Jenga-like stacks of dependencies; webpages with loading spinners). Indeed, he identifies other causes, like proliferating glue code, to explain those effects. What problems does SOLID cause, exactly?
Casey mentions that programming for a 64k personal computer in the 80s constrained designs so much that bad code couldn't happen.
This argument only makes sense if you accept that:
- "bad" is synonymous with "big and slow". Other forms of badness (crashes, cryptic errors, lack of features, lack of process isolation) are irrelevant.
- personal computers are the computers that matter—bad code running e.g. on mainframes is irrelevant.
In other words: things were better in the 80s only in the sense that programs for PCs ran faster than similar programs do today. Slow programs for mainframes existed long before personal computers were even a thing—e.g. Tony Hoare wrote this about an ALGOL compiler written in the '60s:
Our delight was short-lived; the compiler could not be delivered. Its speed of compilation was only two characters per second which compared unfavorably with the existing version of the compiler operating at about a thousand characters per second. We soon identified the cause of the problem: It was thrashing between the main store and the extension core backing store which was fifteen times slower. It was easy to make some simple improvements, and within a week we had doubled the speed of compilation - to four characters per second. In the next two weeks of investigation and reprogramming, the speed was doubled again - to eight characters per second. We could see ways in which within a month this could be still further improved, but the amount of reprogramming required was increasing and its effectiveness was decreasing [...]
Casey suggests that by measuring the desirable attributes of our code (including performance and maintainability) we can reduce software's badness. He claims that too many developers focus on abstract principles, e.g. SOLID, when making design decisions, instead of focusing on what actually matters.
I agree that we should always remain focused on the attributes of software that actually matter, and that measuring those attributes should be part of the solution. However, measurement alone clearly isn't enough, because:
- It can only tell us in hindsight whether we did well or poorly. By itself, it can't help us choose a course of action. We need some heuristics to tell us which search paths to prioritize.
- It is only really helpful if we can compare measurements across projects. This is because measuring different approaches within a project isn't feasible if we're also trying to actually develop a product. E.g. if we're trying to improve our code's maintainability, we're not going to a/b test five different design approaches and see which one results in the fastest feature development, because that would take more time than just picking one at random. Comparing measurements across projects is hard, though—you've almost always got different people, solving a different problem, using different tools and techniques.
- Teams already measure this stuff. We measure performance in production. We measure development velocity. Casey's claim that our code is bad is made about a world in which this measurement is already happening.
- Measurements give us a number, but don't tell us what we ought to expect the number to be—e.g. it took someone 6 hours to read the code that does X. Is that good or bad? It took this page 2 seconds to load—is that good or bad? It took 10 days to implement this feature—good or bad? Very often, teams that do measure such things say "this is good enough" when it is clear to me that it isn't good enough and that there is a way to do it better.
The core of the problem is that no one knows how much anything in software development should cost.
- this is true for both DevelopmentTime and ExecutionTime
- if we did know how much stuff should cost, then we could set a budget, and say "let's try to find a way to do Y in X amount of time". We try to do this sometimes with estimates, but research has shown that software estimates are subject to very strong anchoring effects—i.e. we give the estimates we think people expect. That is very different from knowing how much something should really cost.
- people don't grok how fast computers are.
- as a result, they don't know what kind of performance to expect from their web server, test suite, or compiler.
- programmers attribute poor performance to the wrong things—e.g. they optimize computation instead of IO.
- they're wrong about how much complexity they can handle.
- they have low standards for comprehensibility and their own confidence in the code. "I tried it and it worked" is the industry standard level of pre-release software verification.
- they are wrong about what makes code easy to understand.
- e.g. they split complex code up into functions that look superficially simple but really just obfuscate the complexity for anyone who is actually trying to understand what's going on (e.g. the IdeaFragment anti-pattern)
- they don't know about KranzsLaw.
- they are wrong about what the most important activity of software development is—the part that's costly, but valuable.
- it's not typing code
- it's not, generally, thinking about algorithms (as Casey points out)
- it's thinking about the qualities and types of behavior that are required of the system and choosing the organizing principles that will lead to those qualities and behaviors.
- programmers often cannot describe what the organizing principles of their software are, beyond "it's a Rails app" or "it's a React app".
- Even if the programmers do choose good organizing principles initially, the requirements often change in a way that invalidates those decisions (See EssentialStateWantsToBeGlobal). Non-technical people have no way of understanding why the requested changes cost so much, and the programmers have no power to reject the proposed changes and, say, build another app with different organizing principles.
- we have no software architects (in the sense in which Fred Brooks used the term in The Mythical Man-Month)
- software architects are responsible for maintaining the conceptual integrity of the system—basically, for refusing to implement features that contradict the organizing principles of the software. This allows programmers to create software that is fast, reliable, and maintainable because it has consistent organizing principles.
- programmers keep hoping for a silver bullet
- whether that's functional programming, or OO, or TDD, or domain-driven design... you cannot just do the rituals associated with these things and expect that they will solve your problems. You have to understand why they work, what they're supposed to do, and in what circumstances they're applicable.
- people are wrong about how competent the average library developer is
- "code written by a random person on the internet is sure to be correct"
- people are wrong about how necessary third-party libraries and frameworks are
- Ruby on Rails made me fear HTTP. It made HTTP out to be this big complicated scary thing. Rails, and browser APIs, made me wonder "what is a cookie, really?". This fear and wonderment was dispelled years later by, of all things, actually reading the text of an HTTP request. Because an HTTP request is just slightly-structured text. You'd never guess that, from all the fuss the Rails APIs make of it.
- people fear gaining deep understanding of the technologies on which they depend
- there seems to be a moral aspect to people's thinking on this:
- "I shouldn't have to understand this"
- or "someone else has already understood this and made a library to abstract it away, so it is my duty not to understand it."
- or "I should write a library to do this so no one else will ever have to understand it."
- or even "How could a mere mortal like you or I possibly comprehend the designs of a god like KenThompson? How dare you try to understand Unix? It's blasphemy!" (okay, this is a bit of a parody—but I do think that people treat e.g. Unix as a god-given, inexplicable thing that has been handed down on stone tablets and can never be changed or questioned)
- because these moral beliefs are deeply held, people interpret events so as not to contradict the beliefs.
- "Of course the code is more readable like this—see how much like an English sentence it is?"
- "So what if the code for our build pipeline is a 2000-line YAML file that we can only test by spinning up 20 VMs and watching it run for 4 hours? It's declarative, so it must be easy to reason about!"
- "Everyone's app uses a ton of NPM packages, so if there are problems with that, they must be ones no one cares about."
- there seems to be a moral aspect to people's thinking on this:
- people underestimate the cost of glue code, and overestimate the cost of reimplementing third-party code
- is it really worth adding an extra build tool to your pipeline and suffering kludgey types to be able to import 50 lines of code from a 1MB javascript library?
- a library you write yourself will, in my experience, almost always be faster, smaller, and more suited to your project than one written by someone else.
- the exception is libraries for working with open standards—e.g. don't write your own JSON parser.
- people are wrong about how to use the work of the library developers who are competent
- The Java standard library is not the API against which you should build your application. It is a set of tools designed for maximum capability, which should be wrapped in utilities that are suitable for your application. See CapabilityVsSuitability
- people are wrong about how good software could be
- most programmers have only ever seen bad code
- no one notices when software is good, precisely because good software is unremarkable.
- good software is just "software that does what you'd expect".
- the result of this is that most of the attention-moments that people pay to software go to bad software.
- most programmers never have to maintain code for more than a year or two, so they don't see the fallout of their bad decisions and can't learn from them.