-
Notifications
You must be signed in to change notification settings - Fork 0
Book1
Confidence • Satisfaction • Joy
As you try to put these ideas into practice, you'll undoubtedly have questions. That is a good time to come back to the book and thoroughly understand the details.
This book is for application programmers working in most any modern programming language. Most of the examples in this book are in TypeScript.
My experience is primarily with web applications and command-line tools, so the examples will be drawn from those architectures. With a bit of adaptation, the concepts in this book generalize to mobile and desktop apps as well.
- frontend, backend, or fullstack
- greenfield or legacy
The focus on modern web applications implies that there will be at least some JavaScript, or a language that compiles to JavaScript. In 2022, TypeScript is the most popular such language. It has an extremely expressive type system that makes it suitable for doing type-driven design, an idea that appears in Part III of this book. It also supports the three major programming paradigms, so it is ideal for exploring the tradeoffs and synergies among them. Additionally, because TypeScript is so close to being a superset of JavaScript, we can compare TypeScript and JavaScript code side-by-side to see the tradeoffs between statically- and dynamically-typed code.
TypeScript isn't the only language in this book, though. I use examples from other languages, primarily Java, Go, and Ruby, to illustrate how these principles apply in other languages. I chose these other languages because:
- They are rather different from TypeScript.
- They have feature sets that make for interesting case studies.
- I know them well.
- They are reasonably popular.
- Their syntax is similar enough to TypeScript's that you can probably get something out of the examples even if you don't know the language.
In this book, I describe a way of creating simple, maintainable, bug-free code that I claim is timeless.
The very idea of a timeless way of programming may seem quaint or naive—certainly when viewed through the lens of web development, where seemingly every day sees new libraries and programming patterns vying for the limelight. Yet I claim that this turbid stream of novelty flows from an ancient source.
Most of the recent innovations in software are based on ideas that have been around for decades, but are good enough to be recycled into new technology. If you grok these perennial ideas, you'll find that it's easy to pick up whatever new tech comes along, whenever you need to. The "timeless way" doesn't mean using old technology. It means going with the flow, choosing the best tools for the job, and building on approaches that have proven time and again to work.
Which ideas do I think are timeless? Here's a short list:
- The Unix philosophy (best described in Eric S. Raymond's The Art of Unix Programming)
- Functional programming
- Test-driven development (best described in Kent Beck's Test-Driven Development by Example)
- Data normalization
- Abstraction (see David Parnas' 1972 paper On the Criteria To Be Used in Decomposing Systems into Modules)
- Message-passing, polymorphism, and encapsulation of state (i.e. object-oriented programming)
- Domain modeling
Each of these is a deep topic, about which many volumes have been written. The best I can hope to achieve in this book is a summary. The road to mastering them is long, and demands diligent practice; the purpose of this book is to point you in a good direction, and show you how these seemingly disparate ideas can complement each other.
There's no hype here. Just practical techniques distilled from 70 years of computing history, and filtered through my own decade of industry experience.
Computers have changed a lot since the 1950s. But people haven't. We still have the same psychological needs, the same strengths and weaknesses, and the same habits of thought that we had then. These unchanging human factors are what make programming really challenging. Phrasing code for the machine, once we know what code to write, is the easy part. The hard part is learning about the context in which our code will run, organizing our thoughts, designing for portability and longevity, and communicating with each other.
Early software pioneers were well aware of these difficulties, and developed ways to address them. John McCarthy's LISP, the first functional programming language, arrived on the scene in 1958. Alan Kay innovated Object-Oriented Programming based, in part, on a technique U.S. Army engineers used in the 1960s. And Tony Hoare, a veteran of early operating system projects, wrote this about a successful development process:
First, we classified our [Elliot 503 Mark II] customers into groups, according to the nature and size of the hardware configurations which they had bought [. . .]. We assigned to each group of customers a small team of programmers and told the team leader to visit the customers to find out what they wanted; to select the easiest request to fulfill, and to make plans (but no promises) to implement it. In no case would we consider a request for a feature that would take more than three months to implement and deliver. The project leader would then have to convince me that the customers’ request was reasonable, that the design of the new feature was appropriate, and that the plans and schedules for implementation were realistic. Above all, I did not allow anything to be done which I did not myself understand. It worked! The software requested began to be delivered on the promised dates.
—Tony Hoare, The Emperor's Old Clothes
In other words, they succeeded by using agile software development techniques. The year was 1965.
So, even though almost everything about the technology—programming language, architecture, user experience—has changed, old ideas about how to make software are often still good, if you distill them to their essence.
When people and computers interact, the result can be satisfaction or suffering. Too often, it is suffering.
(insert illustration of two human/computer systems: one with smooth information exchange (defect-free, clear, fast, focused, familiar, simple, self-verifying) producing satisfaction, joy, comfort, and mastery; and one with turbulent information exchange (buggy, opaque, slow, distracted, alienated, complex, "magic") producing doubt, loathing, exhaustion, and fear).
When programmers suffer, the software suffers—and vice versa. The resulting feedback loop destroys the system.
When programmers hate the code, they stop taking care of it. They'll hold out hope for a rewrite and discount any possibility that things could be incrementally improved. Improving bad code is often mind-numbingly tedious, slow work—and a little bit better is nowhere near good enough. So why bother?
The problem is, that rewrite may be a long time coming. In the meantime, programmers may simply leave the company so they don't have to deal with the code anymore. The only way out of this situation is catastrophe, or heroic effort.
When I say "effortless", I don't mean to imply "boring". I mean the effortlessness of playing a piece of music or a video game that is exactly at your skill level. You have to be engaged in it, or you'll fail—but engaged is all you have to be. As you proceed through the task the information you need is ready to hand—it seems that things are just where you needed them to be. Each move you make is swift and sure. But you're not hyper-focused or in a trance. It feels like the most ordinary thing in the world.
(quote from Zen and the Art of Motorcycle Maintenance?) (the ancient masters were profound and subtle...)
If we are trying to get this feeling, few things matter more than the shape of the code and the tools we have for making sense of it.
If we are learning about an unfamiliar part of the codebase, the shape of the code dictates the shape of our (initial) thoughts. The concepts represented in the code, and the relationships among them, become the basis for our mental model of how the program works.
Bad code stymies our efforts to understand it, while good code helps.
Guidelines for how to write good code often get treated as hard-and-fast rules, and therefore misapplied.
There are many proposed guidelines for how to write "good code". But they're often misunderstood, and applied in ways that don't actually make the code easier to understand or change.
I think this happens mainly because the "rules" for writing good code aren't context-sensitive enough. The person who comes up with the rule and publishes it has tried it, perhaps, in only a few situations. They don't anticipate all the ways in which their context is different from other programmers' contexts, and so the rule gets disseminated without the necessary warning labels.
The truth is, there are no rules for how to write good code. There are only mental tools and techniques that are appropriate in certain contexts. This book makes those contexts explicit. For every guideline, I also list exceptions where it doesn't apply.
I also document the mental models of software that are enabled by certain coding patterns. This, I think, is crucial.
The purpose of "good code" is to enable us to learn faster and structure our thoughts better. But most books on code quality stop at "good code" and don't discuss "good thoughts". This omission is the second reason I think coding guidelines get misapplied: the programmers applying them don't have a clear idea of what the guidelines are supposed to achieve, so they have no way of accurately judging if their efforts to improve the code have made it better or worse.
This book goes beyond the code: it covers ways of thinking about software that will improve your ability to understand it, whatever its shape.
I call this book "A Timeless Way of Programming" because it really does describe a way of programming: a coherent philosophy. That philosophy is a chorus of harmonious ideas from the annals of computing history and beyond, which is why I claim that it is timeless. The ideas transcend the boundaries of technology and language. I expect they'll remain relevant far into the future.
I've used this way of programming in many languages and on many different projects. I hope this book helps you to do the same.
Always remember: the code works for you.
You don't work for the code.
All software was written by people just like you
and mostly, those people are trying to help you,
though often (being human) they fall short.
When you know that—really know it, deeply,
your code is truly yours.
Once you own it, you can grok it.
Having grokked it, you can change it.
Having changed it, you can judge the results.
If you don't like them... well, that's what undo is for.
This is the beginning of recovering control.
Make the code work for you.
Credit for the phrase "the code works for you, you don't work for the code" goes to GeePawHill.
This definition encompasses all kinds of different activities: QA with a formal test plan, casual poking around a dev environment, automated testing for a single function, and even performance benchmarking.
I divide testing along two axes, making four main branches:
Manual | Automated | |
---|---|---|
Formal | Traditional QA | "tests" |
Informal | Poking around the UI, ExploratoryTesting, GAKTest |
test scripts, most perf benchmarking |
A FormalTest specifies what is expected of the SystemUnderTest and unambiguously signals failure if those expectations aren't met. An InformalTest relies on human observation and judgment to determine if the software's Behavior is acceptable.
Automated tests run start-to-finish with no human intervention. Manual tests rely completely on a person interacting with the software. There's a grey area in between these extremes—semi-automated testing?—where a test might do some steps automatically but require human intervention for other steps.
Most of the testing that programmers do is functional testing—testing what the application does, or whether it gives the right output for each input. Other types of testing measure "non-functional" attributes of the software, like performance, usability, security, and compatibility. This book briefly touches on performance testing, but other non-functional testing is beyond its scope. Both functional and non-functional tests may fall under any of the four branches of testing above.
When programmers in 2022 talk about "tests", we generally mean formal automated functional tests. That is the sense in which I use the noun "test" in this book.
System testing means any testing performed on a whole, running application. In system testing, the application is first compiled, installed, or deployed, and then run in an environment where a person could potentially interact with it. Often a person does interact with it, in the case of manual system testing. But system testing can also be automated, using tools like Selenium and ChromeDriver that simulate human interaction. End-to-end testing is a synonym for system testing. Acceptance testing and smoke testing are also sometimes treated as synonyms for system testing, although properly they are subsets of system testing which have specific goals.
Acceptance testing means testing performed by the customer—the person or group who writes the software team's paychecks. The purpose of acceptance testing is to answer the question "does it do what the customer wants?" Usually, acceptance tests are automated. The idea is that the customer phrases test cases in an English-like language, and programmers work behind the scenes to link the English statements to runnable test code. A typical acceptance test, written in the Gherkin language, might look like this:
Feature: login
Scenario: Alicia logs in
Given the user Alicia exists
And I am on the "/login" page
And I have entered "alicia@example.com" in the "email" field
And I have entered "password123" in the "password" field
When I click "Log in"
Then I see the "/me" page
And I see the text "Hello, Alicia"
Acceptance tests were popular in the early 2000s, largely because practitioners of Extreme Programming were pushing for closer collaboration between programmers and customers. The problem with acceptance tests, and the reason you don't see them much anymore, was that the customers didn't care about them. On the teams I worked on that used testing languages like Gherkin, the programmers wrote and maintained the "acceptance" tests. Non-technical people never looked at them. These types of "acceptance" tests are just system tests written in a funny-looking language.
Smoke testing is system testing done with few or no specific assertions about the application's functionality. The purpose of smoke testing is simply to check that the application works at all. A smoke test is thus more of a test of the build and deployment process than it is of the application. The term comes from electrical engineering: a "smoke test" is when you hook up the power to your device and see if any smoke comes out.
I use the term unit test to mean any formal automated functional test that isn't a system test. A unit test does not interact with a whole, deployed application, but instead calls bits and pieces of the application's code directly. I acknowledge that this definition is controversial, but it's the most useful one I've come up with, and aligns with colloquial usage of the term in 2022.
Some other definitions of "unit test" include:
- a test that is isolated from other tests. A unit test should be able to run on its own without depending on other tests for setup. Nor should it interfere with tests that may be run after it. Matt K. Parker points out that this is the original definition of the term "unit test". However, I find originalist arguments uncompelling if the original meaning of a term is not useful in the present day.
- I like test isolation and agree that a unit test ought to be isolated. The problem is, every type of testing benefits from isolation, making this definition so all-encompassing as to be useless. In most modern software dev shops, test isolation goes without saying.
- a test that invokes a relatively small amount of code when it runs
- I agree that it's often good to keep unit tests "small", but I think this definition is too vague about what a "small" amount of code is.
- a test that interacts with a single object, with all other objects replaced by mocks
- I find this definition harmful because it implies that in order to do unit testing properly you cannot refactor one object into many unless you also split up its tests and mock each object's collaborators. That makes refactoring laborious and error prone (since it is no longer covered by tests).
- a test that only invokes a single method of an object.
- This definition effectively prohibits object-oriented programming, because the only reason to have methods and objects is if the methods interact in an interesting way (via the object's state). To test the interesting behavior of an object, you need tests that invoke multiple methods.
I explicitly do not draw a distinction between unit tests and integrated tests—in fact, I don't use the term "integrated test" at all in this book, outside this paragraph. People who draw the distinction often use "unit test" to mean "a test for a single object or function" and "integrated test" to mean "a test which involves multiple objects and functions and which is therefore slow, brittle, and difficult to debug". The pejorative implications are unearned: I can point to quite a few "integrated" tests that don't suffer from these problems. But then again, I consider them to be unit tests.
"Integration test", on the other hand, is a useful term. An integration test is a non-functional test that verifies that the various modules, processes, or services that make up an application can talk to each other. While it may appear to be a functional test because it makes some assertions about behavior, its purpose is not to test functionality, but to check that things are "wired up" more or less correctly. While I think integration tests have an essential role to play in dynamically-typed systems, in well-designed TypeScript systems they are all but obsolete. The typechecker does a much more thorough job of verifying the wiring between components.
Other ways of categorizing testing practices include:
By who writes the tests:
- Programmer tests
- QA tests
- Acceptance tests
By goal:
By granularity:
By the type of requirement verified:
- FunctionalTesting
- IntegrationTesting
- PerformanceTesting
- LoadTesting
- StressTesting
- PenetrationTesting
- MigrationTesting
- RecoveryTesting
- ContractTesting
- ScreamTest
By verification strategy:
By technique:
Testing of any kind is learning. Testing means gaining information about what the software is currently capable of, and integrating that information into our mental model of the system. Testing is sometimes about discovery and sometimes about checking what we already know, but it is always about gaining information. A test that provides no information is a useless test.
When we test, whether as part of an automated continuous delivery pipeline or a manual QA process, we fundamentally want to know whether the software is suitable to be "promoted" to the next level of maturity—whether that means pushing to a shared Git branch, deploying to a staging environment, or releasing to users. Phrased in the inverse, we want to know if the software has any show-stopping bugs.
No feasible amount of testing can prove that software is bug-free. This is because the number of possible inputs to a typical software application—the number of things that users can do to it, multiplied by the number of states it can be in when they do those things—is astronomically vast. Each of those possible inputs could potentially cause some undesirable thing to happen, and there is no way we could ever test all of them.
Just to give a quick back-of-the-envelope example: suppose we have a function that takes two strings and returns a string, and we want to exhaustively test it. Suppose our language limits strings to be at most 4 gigabytes (2^32 bytes) in length.
With 8 bits per byte, the number of possible strings is 2 ^ (8 * 2^32). The number of pairs of strings—that is, the number of possible inputs to our function—is the square of that. Written in decimal notation, that's a 1 with about 20 billion zeroes after it.
This is why Edsger Dijkstra said:
Program testing can be used to show the presence of bugs, but never to show their absence!
And why Ben Moseley and Peter Marks said:
[T]esting for one set of inputs tells you nothing at all about the behaviour with a different set of inputs
And why Jim Coplien said:
Be humble about what your unit tests can achieve [. . .] Unit tests are unlikely to test more than one trillionth of the functionality of any given method in a reasonable testing cycle. Get over it.
I call this testing's epistemological (knowledge-related) problem. Testing even a moderately complex software system tells us almost nothing about it. It seems (at first glance) incredibly unlikely that our tests will ever catch a bug. Most bugs, the math seems to tell us, will occur in some untested region of the input space, and sneak right past our tests.
The epistemological problem with testing is solvable if we can make certain assumptions about the code.
Let's go back to the quote from Moseley and Marks:
Testing for one set of inputs tells you nothing at all about the behaviour with a different set of inputs.
Is it really true? Is it true in practice?
I think that in practice, we can draw reasonable conclusions about what code does, from even a small amount of testing. This is because we usually have access to the code. Even if we do not read it thoroughly, we can see how much of it there is, and based on that place a qualitative upper bound on its complexity.
To go back to our example of a function that takes two strings: whatever that function does, we know that the output is not going to be interestingly different for all the 10^(10^10) possible inputs. We know this because we know that the function cannot possibly have 10^(10^10) "if" statements in it, which is what would be needed to make its behavior interestingly different for all of those inputs. Most likely, the function is only a few lines long.
TODO: what about hash functions? Clarify what "interestingly different" means?
So, in practice, if we do a few tests and get a general picture of what the function does, most of the testing we do after that will simply confirm that picture. Which is to say, it will not tell us anything new or interesting about the code.
This does not, however, excuse random or haphazard testing. It matters which tests we choose to do.
TODO: is now the right time to talk about test coverage?
The key assumption is that of simplicity: that the code is the simplest implementation that passes all the tests. Here, simplicity means generality, which means fewer special cases where bugs can hide.
Testing is useful in practice when we know that the code is in some way general: that it doesn't consist of an astronomical number of "if" statements, but rather an algorithm that handles its infinity of possible inputs with a comparatively small amount of text.
This idea of generality, which is closely tied to the idea of simplicity, goes deep—very deep—and has its roots in the philosophy of science. We'll explore it thoroughly in the coming chapters.
Test-driven development—writing code to pass the tests, rather than tests to "cover" the code, plus refactoring to keep the code simple and general—is how we solve the epistemological problem.
Test-driven development is a process for writing code, (re-)discovered and popularized by Kent Beck. It's often described as a three-step-cycle:
- Write a test for functionality you wish you had. Watch it fail.
- Write the code to make the test pass.
- Refactor the code to simplify it, while keeping all the tests passing.
By only adding functionality to our code when a failing test forces us to, we ensure that all the functionality is tested. By refactoring code to remove hardcoded values and special-case "if" statements, we ensure that it generalizes beyond the specific cases we've tested.
TDD is, unfortunately, one of the most misused and maligned buzzwords in the software field today. Over the last two decades, lots of people have published their own spin on it, often saddling it with unhelpfully dogmatic baggage. If you learned TDD from one of those sources, you might have found it... well, unhelpfully dogmatic.
Even if you've hated your TDD experiences so far, I hope this book will convince you to give it another chance. To clarify what I think TDD is not, here is a short list:
Test-driven development is NOT:
- writing tests for every method of every class
- automated testing through the UI
- always writing tests before the production code
- 100% test coverage
- testing classes and functions in isolation by mocking out all their dependencies
- having to wait more than a fraction of a second for your tests to run
If these are the things about "TDD" that have vexed you, you might like the way this book treats it. I believe this treatment is aligned with Kent Beck's vision of the practice. Here's the man himself:
I get paid for code that works, not for tests, so my philosophy is to test as little as possible to reach a given level of confidence (I suspect this level of confidence is high compared to industry standards, but that could just be hubris). If I don't typically make a kind of mistake (like setting the wrong variables in a constructor), I don't test for it. I do tend to make sense of test errors, so I'm extra careful when I have logic with complicated conditionals. When coding on a team, I modify my strategy to carefully test code that we, collectively, tend to get wrong.
Different people will have different testing strategies based on this philosophy, but that seems reasonable to me given the immature state of understanding of how tests can best fit into the inner loop of coding. Ten or twenty years from now we'll likely have a more universal theory of which tests to write, which tests not to write, and how to tell the difference. In the meantime, experimentation seems in order.
—Kent Beck on Stack Overflow (2008, https://stackoverflow.com/a/153565)
But there is a second, more practical problem with testing: in many cases, testing a piece of code is somewhere between difficult and impossible.
If you've written many unit tests, you've probably experienced this. In order to test some piece of logic lodged deep in a hundred-line function, you have to churn out almost as many lines of test code. You have to create test data, get things into just the right state, set up mocks, call the function you're testing, and finally comb through its output for the value you want to assert on. Testing like this is difficult, frustrating, and error-prone.
In response to the frustration of unit testing, programmers often fall back to testing the whole system through the user interface—often manually. Such system tests have many downsides, but one of the big ones is that they take us right back to the epistemological problem. The system likely exhibits very complex behavior, and it likely contains a huge amount of code. If the code for the whole system is beyond our comprehension, we can't reasonably claim to have performed enough tests to cover all its interestingly different behaviors. Nor can we reasonably claim that the code for the whole system is the simplest code that passes all the tests. To rely on system testing alone, then, is to relinquish intellectual control over the system.
So we apparently have a dilemma: complicated unit tests on the one hand, incomplete system testing on the other. How should we choose between these two poisons? Fortunately, we don't have to.
Unit testing can be easy, confidence-building, and fun—but only if the code is designed to make it so.
GeePaw Hill calls this the "steering premise": the need for testability has to steer the evolution of the code. If we want our code to be easy to test, we have to design it that way. Testability must join correctness, maintainability, security, and performance as one of the criteria we use to evaluate software designs. If it can't be tested, it's no good—in the same way that "if it's not correct, it's no good" and "if it's too slow, it's no good".
This is often, incidentally, where I lose people when trying to "sell" TDD to them. My impression is that they think that testability will ride roughshod over all those other design criteria, turning the code into a maze of tiny objects, dependency injectors, and mocks. But that's certainly not what I want my code to be like, and TDD doesn't mean it has to be that way. The approach in this book achieves testability while improving readability—and often, security and performance as well.
The third problem with testing—unit testing, specifically—is that correctness of the parts does not imply correctness of the whole.
Even if we have a passing suite of unit tests showing that every part of the system is doing what we intended it to do, we can't conclude that the system as a whole is behaving correctly, or even working at all. After all, we might have put the pieces together wrong. If the pieces are incompatible, there might not be a right way to put them together. It seems that, even if we have comprehensive unit tests, we still need comprehensive system tests, too!
While it never hurts to have a few system tests (as in, a single-digit number of them), I prefer to rely on other strategies for verifying that I've assembled the pieces of my system correctly. Attempting comprehensive system testing is a losing game, thanks to the epistemological problem.
To verify the whole system, we need to complement the empirical spot-checking of testing with other, more mathematical approaches.
My approach to verifying system correctness is two-pronged:
- Use type checking to prove that the pieces can at least talk to each other, and type-driven design to rule out inconsistent or otherwise undesirable system states.
- Use compositional reasoning to convince myself that the aggregate of the pieces has the behavioral properties I desire.
The nice thing about type checking is that it's almost free: the TypeScript compiler (like compilers and typecheckers for many other languages) can infer the types of most variables and functions, so you don't have to write them explicitly. And typechecking, unlike testing, doesn't put you through hell if you don't design your code a specific way. You can add TypeScript types to JavaScript code of almost any shape or size.
Compositional reasoning is a little more nuanced. Imagine reading a very simple, but untested function, and believing it to be correct just on the basis of reasoning about the code. Essentially, compositional reasoning is the same, but at the system level. With compositional reasoning, we can look at a boxes-and-lines diagram of all the parts and how they interact, and be confident that, if each part is correct, the whole system works. This demands that we design the parts of the system to fit together in a certain way which makes the system self-evidently correct.
Compositional reasoning isn't just a fancy way of saying "cross your fingers and ship it". It means characterizing, quasi-mathematically, the way the parts of the system interact, and logically drawing conclusions about the behavior of the system as a whole.
Type-driven design and compositional reasoning will be fully covered in later chapters. For now, let's focus on testing.
The fourth and final problem with testing is that we can never be completely sure we've performed all the necessary tests.
Even when we bring to bear all of the techniques I've outlined above, we still can never be completely sure that our software is correct.
We just cannot be sure.
Every few months or so, I write a bug. Even though I used TDD. Even though my tests were passing and the typechecker was happy. Something hiding in one of my blind spots—something I didn't even know was possible—causes my code to do the wrong thing in production.
The silver lining is that the cause of the bug is nearly always obvious to me when I see the bug report, I can nearly always write a failing unit test to reproduce it, and fixing it usually takes less than an hour. When you have a comprehensive understanding of your software, "debugging" ceases to be part of your vocabulary. That comprehensive understanding is what the techniques in this book can give you. Then, when there's a problem, it's no big deal. You just fix it.
As Jim Coplien points out above, it behooves us to be humble about what our tests can achieve. We may feel confident in our passing test suite, but... how do we know we've written all the tests?
From the Dàodé Jīng:
When [the master] makes a mistake, he realizes it.
Having realized it, he admits it.
Having admitted it, he corrects it.
He considers those who point out his faults
as his most benevolent teachers.
He thinks of his enemy
as the shadow that he himself casts.
In spite of all its pitfalls, testing tells us something very valuable: that the software does what we meant it to do—at least in the cases we tested.
Ultimately, the answers to "have we tested enough?" and "Can we ship?" come down to human judgment. We make the decision to ship, or not, based on our incomplete knowledge of what the system currently does. Tests are one side of the "scaffolding" that we stand on to build that knowledge. Types are the other side.
Tests and types make orthogonal statements about the system. Tests demonstrate that the software does the right thing in specific cases, while types prove that the software always does something (that is, every input has a well-defined output). Neither tests nor types, or even both together, can prove that the software always does the right thing. But they can strongly hint at it. Tests and types put definite bounds on what could go wrong—and that is immensely helpful.
A mental model is your inner picture of some aspect of reality. It tells you what kinds of things exist, and the various ways they can relate to each other. Mental models are what allow us to make accurate predictions about the world. When I use the word model in this book, I usually mean "mental model" (I'll tell you if I mean the software kind).
The models you use change how you "see" the code. A good model can make the difference between finding a given piece of code confusing and finding it clear. When we can model code accurately, we can change it easily and reliably, without introducing bugs.
To quote George Box, "All models are wrong, but some models are useful." Models are simplified, artificial views of reality. The simplification and artifice are what make the models comprehensible, and therefore useful, but they also mean that a given model won't accurately describe every situation. Throughout this book, as I refer to various models, I point out the situations where I know them to be wrong or incomplete. They may even have other flaws that I don't know about. But I hope that by calling your attention to the known flaws, I'll prepare you to spot the ones I don't know about yet.
Although all of the models are wrong, they are wrong in different ways, so they complement each other. Each fills gaps left by the others. By learning to use all of the models you can get a comprehensive picture of your software systems.
If the members of a team don't share mental models, the system starts to fall apart at the seams. Programmers retreat into their own isolated silos of knowledge, working on just the part of the code that they know well. Bugs creep in because the siloed chunks of code don't talk to each other in an organized way, and no one has a comprehensive picture of how the whole system works. To shield your software from this fate, you and your teammates must have similar mental models of the code.
If your teammates already have models of the code that are working well for them, don't try to impose mine. Learn theirs instead. The list of models in this book is almost certainly not complete, and their models probably work well too. They may be the only models that work for your codebase.
Likewise, if you discover new models that work well, don't hesitate to use them and share them with your team.
A system is a collection of interacting parts. I define a software system as one in which some of the parts are made out of computer code.
The parts of a software system, which I call software system components, can be people (usually the users of the software), processes (running programs), services (like a search engine, or a filesystem), hardware devices (like disk drives, cameras, keyboards, and printers), or aggregates of all of these. The way we divide up the system into parts is, to some extent, arbitrary, and depends on what aspects of it we care about modeling.
The simplest division of a software system is into two parts:
Define SoftwareSystem
Code that DoesWhatYouIntend.
This is an easier goal to agree on than code that is "correct" or "high-quality". What "correct" and "high-quality" mean will depend on your context. But we can all agree that if code doesn't do what we intended it to do, it's no good.
Definitions of SoftwareQuality often focus on conformance to requirements. For application software, this is problematic, because we almost never have a complete and correct description of "requirements" before we start writing code. We discover "requirements" as we build the software and observe people using it. Furthermore, the "requirements" are not really requirements, in the sense of "behaviors the software must exhibit to be considered a success". They're more like options: "this is one possible way of solving the user's problem". We're constantly weighing the cost and value of these options, some of which may be incompatible with each other, to design the product.
In order to know if a software system does what we intend, we first have to be able to describe what the system does. "What the system does" is called its behavior.
To model the behavior of a system, we first divide the system into components. Services, processes, devices, and people might all be modeled as components.
There are many possible divisions, at many different levels of granularity, depending on how detailed we're interested in getting. The simplest possible division is into two components: often the split is user+software, or client+server.
(Is it dehumanizing to model people as components? Well, as Alan Watts would say, that's the pessimist's view of it. The optimist's view is much more heartening, and will be elucidated toward the end of this book.)
The behavior of the system is the set of possible interactions among those components. An interaction is a sequence of discrete messages.
As an example, consider the behavior of a pocket calculator—the kind that a grade-school student might use.
This is part of why developing software is so hard. If we imagine the set of all message-sequences arranged in a hyperdimensional space, then the task of defining the software's behavior is equivalent to sculpting a very complicated, infinitely large "shape" in that hyperdimensional space. That "shape" is the boundary separating desirable interactions from undesirable ones. We have to describe this shape indirectly, by writing code that will generate it, and that code has to be simple enough to fit in our heads.
The task sounds impossible, and perhaps it is, in general. But the saving grace is usually that our software's users are human, too, and so the behavior has to fit in their heads. This means that (if the user experience is well-designed) it should always be possible to intuitively grasp the software's behavior. Formally defining the behavior is the part that takes a bit more work.
The "behavior" model is useful, first of all, because it allows us to communicate with some precision about what the software should and should not do.
Second, the behavior model is useful because we can translate interactions directly into automated tests. These tests demonstrate that, at least in the few situations we've tested, the software does exactly what we intended it to do.
Of course, without knowing how the software is structured internally, we have no way of knowing if those tests say anything in general about the correctness of our code. If we have a test demonstrating that typing 1 + 1 =
into our calculator produces 2
, we can't necessarily conclude that 2 + 2 = 4
is working correctly. Ben Moseley and Peter Marks, in "Out of the Tar Pit", remark that "testing for one set of inputs tells you nothing at all about the behaviour with a different set of inputs," but I think they overstate their case. While their statement is formally true, in practice the situation is not quite as bleak as that. When code is simple, we can convince ourselves that it is correct without exhaustively testing every possible set of inputs. The importance of simplicity cannot be overstated. It is vital to science—in fact, we need simplicity to be able to form any kind of general picture of reality at all. The next section, which compares test-driven development to the scientific method, explains exactly how simplicity operates in the context of software development.
In other words, a test can quickly, reliably, and automatically tell you, "no, that won't work".
While the analogy between TDD and science is illuminating, it would be a mistake to take it too literally. Scientists investigate nature, which always truthfully answers questions that are put to it by experiment. But in TDD, the "nature" for which we are developing theories is our understanding of what users expect the software to do. If this understanding is wrong, we'll write the wrong test assertions. The resulting "theory"—the code generated from the tests—would then be wrong, in the sense that even though it passes all the tests, it doesn't do what the users want. TODO: revise this section; it's confusing. Mention oracles. Cite WhyMostUnitTestingIsWaste
This model is incomplete in the sense that there are often restrictions on the valid values that can be exchanged—restrictions that cannot be expressed in the type system. E.g. {items: Array<Item>, activeItem: number}
may have the additional restriction that activeItem
must be a valid index into the items
array.
Processes are a useful model for computation—so useful that they are reified in our operating systems. So as a step toward discussing computational processes generally, we will discuss Unix processes as an example.
A process is the dynamic instantiation of a static program. A process is born when you start the program running, and dies when the program finishes. In Unix-like OSes, you can also kill processes, forcing them to stop. Many processes created from the same program can be running at the same time on one computer.
The state of a process consists of an instruction pointer, which indicates which piece of code is to be executed next, and the information the process is storing in memory. (Strictly speaking, the state also includes the values stored in CPU registers, but when programming in TypeScript, we almost never have to think about that.)
Processes are deterministic. If you know the state of a process, and the program it spawned from, you can flawlessly predict what it will do in the future, up until the point where it receives information from some external source. That source could be a file it's reading from, the output of some other process, the current time, or input from the user.
This model is flawed because some things, like logging and performance monitoring, have effects on the world outside the process, but are allowed to occur in "pure" functions.
I call these special-case effects "instruments" since their purpose is almost always to probe the running system and get information about it.
In some circumstances, we might also model effectful components as merely stateful—such as when a component reads and writes a file on disk, but we can be reasonably certain that it is the only thing that will ever write to that file.
In all cases, the model of a given component's capabilities must be chosen based on how we want to reason about the system, not based on nitpicking about what the code is really doing.
Data, properly speaking, are records of observed facts. Changing a data value doesn't semantically make sense. It also doesn't make sense practically. The TypeScript typechecker simplifies its work by assuming that objects that are simply sets of key-value pairs are immutable. If you mutate these objects, you might introduce type errors that the typechecker won't catch! If you want mutability, use an object as defined in the Component Capabilities model, which only reveals its internal state in response to messages (realized in TS as method calls).
An assumption required by this model: synchronous code runs so fast we can consider it instantaneous. For many applications, this is an okay assumption. In JavaScript, which is single-threaded, this way of thinking about time is almost forced on us: since synchronous computation blocks UI interaction, long-running computations must either be offloaded to worker threads (which the main thread communicates with asynchronously) or broken up into small chunks of work that yield control back to the browser every few milliseconds.
This model is based on ArchitecturalLayers and CodeCapability. The conceit of the extract-transform-load model is that we can divide all synchronous computation into three distinct steps: get inputs (extract), process them to calculate some result (transform) and output that result somewhere (load).Use exceptions for infrastructural errors, and union types for domain errors (e.g. input and state validation).
TODO: should this be model 1?
The time it takes to type in the code is insignificant compared to all of the other work—learning, communicating, and inventing. It's only 1 or 2 percent of the total.
With this in mind, it is obvious why Brooks' Law—that adding staff to a late software project makes it later—is true.
Most of our job is understanding—and 10 people can't understand something faster than one person can.
Given that learning and understanding are so fundamental to programming, let's spend a few pages considering the nature of understanding. The next chapter, on mental models, delves into this issue.
-
InformalReasoning - c.f. OutOfTheTarPit
- GregWilson cited some reasearch showing that reading the code finds more bugs per hour than testing
- SoftwareTesting - passing tests make us more confident
-
AlgebraicType - proofs of some kinds of internal consistency, ruling out many errors that could happen in a program with dynamic types. If our typechecker outputs no errors, that makes us more confident.
- easiest to see in a language where we can do an apples-to-apples comparison of typed and untyped forms, e.g. TypeScript vs. JavaScript.
- CompositionalReasoning with AlgebraicProperties (i.e. "semi-formal" reasoning)
-
InformalReasoning
- Supported by: StructuredProgramming, Symmetry
- limiting the number and Scope of Variables
- limiting the depth of hierarchies (call hierarchy, if statement nesting)
- Cohesion and avoiding IdeaFragments
- See: OutOfTheTarPit
-
SoftwareTesting
- Flaw: spot-checking is not a proof.
- Quote Dijkstra, OutOfTheTarPit
- Complement with: TestDrivenDevelopment, Simplicity, OccamsRazor
- Complement with: AlgebraicTypes
- Flaw: process-external Effects are hard to test
- Complement with: TestDoubles and CompositionalReasoning
- ...or FunctionalProgramming, FunctionalCoreImperativeShell, Simplicity, and InformalReasoning (leaving the CompositionRoot and entrypoint Procedures without UnitTests, as GaryBernhardt does in Boundaries)
- Flaw: duplicate test coverage makes failures hard to interpret and coverage hard to analyze. The opposite, "over-mocking," leads to situations where all the tests pass but the system as a whole doesn't work.
- Fix with ShallowHierarchy (reducing the number of tests that duplicate coverage) and CompositionalReasoning + ContractTesting (making sure that mocked collaborators have easy-to-reason-about properties).
- Use AlgebraicTypes to get a baseline level of confidence that components can work together.
- Flaw: you're not the Oracle
- e.g. you need to call an API that returns some complicated data. It's not clearly documented and you misinterpret the meaning of one of the fields when creating a Stub of the API. So your UnitTests pass, but the system as a whole is wrong.
- Partial fix: ensure you only make this mistake once by transforming incoming data into a form that's self-documenting or otherwise well-documented, and hard to misuse. I.e. ParseDontValidate.
- Summary: testing is complemented by TDD (writing the simplest code that passes the tests), an architecture that pushes effects to the boundaries or abstracts them behind contracts with convenient algebraic properties, a shallowly layered architecture, and a discipline of understanding the data that's passed across architectural boundaries.
- Flaw: spot-checking is not a proof.
-
AlgebraicType
- Flaw: certain generic interfaces are very difficult or impossible to express in certain type systems. E.g. generic variadic functions.
- This is a shortcoming of current, specific type system technologies, not the mathematics of types
- Even proponents of dynamic typing rely on the idea of types to make sense of their programs—they just don't have automated tools to check their thinking.
- Possible resolution: something like Clojure's
spec
? Then you can't write fewer tests, though.
- Summary: algebraic types are complemented by tests.
- Flaw: certain generic interfaces are very difficult or impossible to express in certain type systems. E.g. generic variadic functions.
-
AlgebraicProperty and CompositionalReasoning
- Flaw: error propagation threatens the simplicity of algebraic properties when the implementor has process-external Effects.
- Fix: DomainSandwich, CheckedExceptions considered harmful
- Flaw: error propagation threatens the simplicity of algebraic properties when the implementor has process-external Effects.
The first goal is the sine qua non of the second.
A sense of oneness with your work. "Oneness" isn't quite the right word, and there are many possible near-synonyms: connectedness, identity, care, kindness. Oneness involves:
- a feeling of Mastery: you are confident that you can get the code to do what you need it to do. When you get a feature request, you can often develop plans of action that will work.
- a feeling of Compassion toward your fellow programmers, the users of your software, and the authors of your dependencies. Compassion also involves doing things that help your coworkers achieve oneness with their work (e.g. writing understandable code with APIs that are hard to misuse).
- a sense of responsibility. If the code's Messy or has a Bug, you take it seriously and fix it. This kind of responsibility can't be forced on you by others. You assume it, naturally and inevitably, when you care about the code and the people it affects. If the other qualities of oneness are present, you'll find it easy to fix any problems you cause, so the responsibility won't be a burden.
- non-Instrumentality. Instrumentality often appears when you try to get something for free, without putting time or attention into it, or identifying with it. E.g. suppose you use someone else's code that you don't understand very well to try to make a job easier. If what you try doesn't work, it's easy to get frustrated and blame the code or the other person, which causes everyone to suffer. An attitude of non-instrumentality both:
- recognizes that you may have to put some learning effort in to get the benefit of using the code. Nothing is free.
- is willing to let go of a dependency on bad code and reimplement the functionality, if that's the pragmatic option.
- continuous attention and course-correction as you work.