Book1

Software Application Engineering

How to Write Code You Know Will Work

About This Book

In this book, bold headlines alternate with supporting details.

To get the gist, you can just read the headlines.

As you try to put these ideas into practice, you'll undoubtedly have questions. That is a good time to come back to the book and thoroughly understand the details.

This book is for application programmers. My experience is primarily with web applications, so I will focus on those.

frontend, backend, or fullstack
greenfield or legacy

The programming language used in this book is TypeScript.

What is Software Engineering?

Engineering means applying science and mathematics to the design of practical things, in order to make them a better fit for the contexts of their creation, distribution, and use (and retirement).

The goal of software engineering is to create human+computer systems that are free of errors and annoyances, easy to change, and satisfying to work in. This book explains how to write code that promotes these qualities.

The fundamental techniques covered in this book are test-driven development and type-driven design. These techniques aid the creation of multi-paradigm code that can be mentally modeled and therefore known to behave a certain way. When we understand what code does, how, and why, we can change it without fear of breaking it. That makes the system satisfying to work on because the outcomes we produce are in proportion to our effort: working on the software creates value reliably and steadily, without turmoil, catastrophe, or heroism. All these qualities enable us to adapt the system to the changing needs of its users, so it stays useful and usable in the long run.

The term software engineering is a controversial one in 2022. Programmers, and others involved in the software trade, sometimes muse doubtfully about whether software engineering is a "real" engineering field. I contend that, while much of the software-making that takes place today is not engineering, some of it is, and software could indeed become an engineering discipline in the not-too-distant future. I hope this book will give you a good sense of what such a shift might look like, and get you excited about being involved in it.

We apply the scientific method to programming by testing our software.

I used to think that there wasn't much science in computer science, and not much interesting math in the average web application. What do science and math have to do with building better software?

A lot, it turns out. While "science" in the sense of "the study of the natural world" isn't really applicable to software, we can use a version of the scientific method to help us develop simple programs that exhibit the behavior required of them. That scientific method is called test-driven development.

A note on test-driven development: TDD is one of the most misused and maligned buzzwords in the software field today. If you've learned it from a source other than Kent Beck's original book Test Driven Development by Example, you've likely been exposed to some unhelpfully dogmatic ideas along with the good stuff. If you've tried TDD and hated it, well... I hope this book will convince you to give it a second chance. To clarify what I think TDD is not, here is a short list:

Test-driven development is NOT:

writing tests for every method of every class
automated testing through the UI
always writing tests before the production code
100% test coverage
testing units of code in isolation by mocking out all their dependencies
having to wait more than a fraction of a second for your tests to run

If these are the things about "TDD" that have vexed you (or your coworkers), you might like this book.

We apply mathematics to programming by using type algebra.

Type algebra is a system of logical rules for reasoning about types. A type is a set of possible values. So, for example, when we talk about the type of a variable, we're talking about the set of values that could be stored in that variable.

A type system is a language for expressing theorems about a program—statements like "the concat function is always called with two strings as arguments, and always returns a string". A type checker is a program we can run on our code, which tries to prove that all the theorems we've stated are true. If it can't prove some of them, we get a type error.

A note on type systems: many programmers' only exposure to types is via Java and C, which I think is dreadfully unfortunate. These languages have unfriendly, rather restrictive type systems. Their type annotations exist mainly to help the compiler optimize the code, not to help the programmer.

Much better type systems—ones that do substantially help the programmer—exist, and this book focuses on those. TypeScript is one of the good ones. It is an algebraic type system, which means its types can be composed to form more sophisticated types. For instance, you can have so-called union types like this, which allow a value to be any of a set of alternatives:

// The httpResponse variable can contain either the exact string
// "pending", an instance of the Error class, or a response
// object with a `data` property which holds a string.
let httpResponse: "pending" | Error | {data: string}

Unlike the type systems of Java and C, which are riddled with opportunities for NullPointerExceptions and segmentation faults, TypeScript can effectively rule out the equivalent errors from JavaScript code. What this means is that if the type checker accepts your code, you do not have to worry about those kinds of errors. That dramatically simplifies the process of reasoning about the software, and unlocks many new techniques. For example, to find out if a particular property of an object is used anywhere, you can simply delete the property from the type of that object and see if the type checker complains.

Another common criticism leveled at typed languages is that type checking (or compilation, which is closely related) is too slow to give you timely feedback while programming. In dynamic languages like Python, Ruby, or JavaScript, programs start in the blink of an eye, and thousands of tests can run in under a second. By comparision, just compiling a moderately-sized Java app can take many seconds.

I'm happy to report that the TypeScript typechecker does not suffer from this problem. Typically, typechecking takes a fraction of a second. It can be so fast because it doesn't check the entire program on every change. Instead, it watches your code for changes and re-checks just the parts that changed.

Tests and types are power tools for understanding and changing code.

Even with good tests and good type systems, many programmers still see them as an annoying stumbling block—just one more thing they have to deal with before they can ship their code. I used to have this adversarial relationship with tests and types, but over time I discovered that, with the right approach, they can be extremely useful. The test failures and type errors cease to be annoying once you gain intellectual control over them and begin to wield them as a tool. The error messages can form a kind of self-checking to-do list, reminding you what still needs to be fixed up after your last change. And there are subtler and more powerful benefits too, which will be explored in depth throughout the rest of this book.

But tests and types are not cure-alls. They provide more value for their cost when paired with particular design approaches.

GeePaw Hill calls this the DrivenPremise.

Designs that maximize the value of tests and types have another benefit as well: they are easier to reason about than typical designs.

In this book I present several mental models for reasoning about test-driven and type-driven code.

These models are the software equivalent of the plans and schematics that engineers in other fields draw.

I don't want UML, though. Software engineering is not a matter of thinking or acting according to rigid, bureaucratic procedures, and never will be.

The preceding description of models may, for some, call to mind unpleasant memories of enormous UML diagrams. The people who promulagated these diagrams seemed to think that programming could be "solved"—that with the right process, they could turn user requirements into diagrams which would turn into code, with very little creativity required on the part of the engineers. However, that is not what I have in mind at all. I don't think that totalitarian kind of approach will ever work for software engineering.

When something is completely algorithmically solvable in software, we tend to automate it. Case in point: compilation. It is likely that more and more programming tasks will be automated, at least partially, as AI becomes more capable. However, to the extent that there is human work involved in software, the job requirements will always include: creative problem-solving and communication ability, intuition, holistic awareness, judgment, ethical values, and knowledge of the world outside the machine. Beyond technical skills, those are the qualities a good engineer needs to possess.

Some people worry that if programming becomes engineering, all the fun will be taken out of it. I certainly hope not—and I don't think that's likely, anyway. At "worst," the superficial fun will be replaced by a much deeper joy. The worry seems to come from the idea—unfortunately reinforced by too much of the STEM curriculum in schools today—that math, science, and engineering are dry, soulless disciplines, lovable only by people who want to think like machines. That simply isn't accurate. Science and mathematics are, at their heart, the investigation of reality by engaged and curious minds—investigation that is made much easier by creativity and an appreciation of beauty. Nor is the reality that science reveals to us depressing. While twentieth-century philosophy has left us with the idea that reality is fundamentally machine-like and inhuman, closer investigation reveals this view to be miguided. There is nothing fundamentally true about the view that the universe is like a machine—that view is an incomplete mental model, like any other. A deep understanding of software has the power to reveal this to you, through quasi-mystical insight. Once you grok that insight, science and mathematics become a window through which you can glimpse the awe-inspiring and inexpressible metapattern that generates all experience. So don't worry!

Accordingly, this book does not present a "process" or "method" that you can follow step by step.

Instead, it presents mental models, heuristics for improving code, and techniques.

A mental model is your inner picture of some aspect of reality. It tells you what kinds of things exist, and the various ways they can relate to each other. Mental models are what allow us to make accurate predictions about the world. When I use the word model in this book, I usually mean "mental model" (I'll tell you if I mean the software kind).

The models you use change how you "see" the code. A good model can make the difference between finding a given piece of code confusing and finding it clear. When we can model code accurately, we can change it easily and reliably, without introducing bugs. Therefore, modeling is absolutely crucial to software engineering.

This book also contains many heuristics, which are rough guidelines for improving code by making it easier to model. Beware, though—they aren't hard and fast rules, and they don't improve the code in every situation. The important thing is not just to learn the heuristics, but to understand when and why they're useful. Most of the bad software I have worked on was created by applying the wrong "best practices" in situations where those practices weren't appropriate. Don't let your systems meet the same fate they did!

The techniques in this book are specific tricks you can use when programming to accomplish some short-term goal. Each of the techniques in this book helps with at least one of four things:

understanding programs (that is, building models)
designing programs so they can be modeled more effectively
predicting the results of changes to the code, or
changing the code.

The models in this book are incomplete, but they're all useful

To quote George Box, "All models are wrong, but some models are useful." Models are simplified, artificial views of reality. The simplification and artifice are what make the models useful, but they also mean that a given model won't accurately describe every situation. Throughout this book, as I refer to various models, I point out the situations where I know them to be wrong or incomplete. They may, of course, have other flaws that I don't know about. But I hope that by calling your attention to the known flaws, I'll prepare you to spot the ones I don't know about yet.

Although all of the models are wrong, they are wrong in different ways, so they complement each other. Each fills gaps left by the others. By learning to use all of the models you can get a comprehensive picture of your software systems.

To work well, models must be shared by all the programmers working on the software.

If the members of a team don't share mental models, the system starts to fall apart at the seams. Programmers retreat into their own isolated silos of knowledge, working on just the part of the code that they know well. Bugs creep in because the siloed chunks of code don't talk to each other in an organized way, and no one has a comprehensive picture of how the whole system works. To shield your software from this fate, you and your teammates must have similar mental models of the code.

If your teammates already have models of the code that are working well for them, don't try to impose mine. Learn theirs instead. The list of models in this book is almost certainly not complete, and their models probably work well too. They may be the only models that work for your codebase.

Likewise, if you discover new models that work well, don't hesitate to use them and share them with your team.

Model 1: The Behavior of a Software System

The first goal of software engineering is to make software that does what we intend.

Define SoftwareSystem

Code that DoesWhatYouIntend.

This is an easier goal to agree on than code that is "correct" or "high-quality". What "correct" and "high-quality" mean will depend on your context. But we can all agree that if code doesn't do what we intended it to do, it's no good.

Definitions of SoftwareQuality often focus on conformance to requirements. For application software, this is problematic, because we almost never have a complete and correct description of "requirements" before we start writing code. We discover "requirements" as we build the software and observe people using it. Furthermore, the "requirements" are not really requirements, in the sense of "behaviors the software must exhibit to be considered a success". They're more like options: "this is one possible way of solving the user's problem". We're constantly weighing the cost and value of these options, some of which may be incompatible with each other, to design the product.

In order to know if a software system does what we intend, we first have to be able to describe what the system does. "What the system does" is called its behavior.

To model the behavior of a system, we first divide the system into components. Services, processes, devices, and people might all be modeled as components.

There are many possible divisions, at many different levels of granularity, depending on how detailed we're interested in getting. The simplest possible division is into two components: often the split is user+software, or client+server.

(Is it dehumanizing to model people as components? Well, as Alan Watts would say, that's the pessimist's view of it. The optimist's view is much more heartening, and will be elucidated toward the end of this book.)

The behavior of the system is the set of possible interactions among those components. An interaction is a sequence of discrete messages.

As an example, consider the behavior of a pocket calculator—the kind that a grade-school student might use.

In general, the behavior is an infinite set of possible interactions.

This is part of why developing software is so hard. If we imagine the set of all message-sequences arranged in a hyperdimensional space, then the task of defining the software's behavior is equivalent to sculpting a very complicated, infinitely large "shape" in that hyperdimensional space. That "shape" is the boundary separating desirable interactions from undesirable ones. We have to describe this shape indirectly, by writing code that will generate it, and that code has to be simple enough to fit in our heads.

The task sounds impossible, and perhaps it is, in general. But the saving grace is usually that our software's users are human, too, and so the behavior has to fit in their heads. This means that (if the user experience is well-designed) it should always be possible to intuitively grasp the software's behavior. Formally defining the behavior is the part that takes a bit more work.

The "behavior" model is useful, first of all, because it allows us to communicate with some precision about what the software should and should not do.

Second, the behavior model is useful because we can translate interactions directly into automated tests. These tests demonstrate that, at least in the few situations we've tested, the software does exactly what we intended it to do.

Of course, without knowing how the software is structured internally, we have no way of knowing if those tests say anything in general about the correctness of our code. If we have a test demonstrating that typing 1 + 1 = into our calculator produces 2, we can't necessarily conclude that 2 + 2 = 4 is working correctly. Ben Moseley and Peter Marks, in "Out of the Tar Pit", remark that "testing for one set of inputs tells you nothing at all about the behaviour with a different set of inputs," but I think they overstate their case. While their statement is formally true, in practice the situation is not quite as bleak as that. When code is simple, we can convince ourselves that it is correct without exhaustively testing every possible set of inputs. The importance of simplicity cannot be overstated. It is vital to science—in fact, we need simplicity to be able to form any kind of general picture of reality at all. The next section, which compares test-driven development to the scientific method, explains exactly how simplicity operates in the context of software development.

Model 2: Test-Driven Development Is the Scientific Method

Code expresses a theory about how to solve a problem.

A test is a reproducible experiment that can disprove the theory expressed by the code.

In other words, a test can quickly, reliably, and automatically tell you, "no, that won't work".

In order for a test to be valuable, it must be possible for the test to fail.

Tests are worth more when they give understandable failure messages.

Therefore, test your tests by watching them fail when the code is broken.

Beware, though, of writing incorrect tests.

While the analogy between TDD and science is illuminating, it would be a mistake to take it too literally. Scientists investigate nature, which always truthfully answers questions that are put to it by experiment. But in TDD, the "nature" for which we are developing theories is our understanding of what users expect the software to do. If this understanding is wrong, we'll write the wrong test assertions. The resulting "theory"—the code generated from the tests—would then be wrong, in the sense that even though it passes all the tests, it doesn't do what the users want. TODO: revise this section; it's confusing. Mention oracles. Cite WhyMostUnitTestingIsWaste

Example: Testing the Calculator

Model 3: The Set of Messages Exchanged By Two Components Is a Type

This model is incomplete in the sense that there are often restrictions on the valid values that can be exchanged—restrictions that cannot be expressed in the type system. E.g. {items: Array<Item>, activeItem: number} may have the additional restriction that activeItem must be a valid index into the items array.

Model 4: Processes

Processes are a useful model for computation—so useful that they are reified in our operating systems. So as a step toward discussing computational processes generally, we will discuss Unix processes as an example.

A process is the dynamic instantiation of a static program. A process is born when you start the program running, and dies when the program finishes. In Unix-like OSes, you can also kill processes, forcing them to stop. Many processes created from the same program can be running at the same time on one computer.

The state of a process consists of an instruction pointer, which indicates which piece of code is to be executed next, and the information the process is storing in memory. (Strictly speaking, the state also includes the values stored in CPU registers, but when programming in TypeScript, we almost never have to think about that.)

Processes are deterministic. If you know the state of a process, and the program it spawned from, you can flawlessly predict what it will do in the future, up until the point where it receives information from some external source. That source could be a file it's reading from, the output of some other process, the current time, or input from the user.

Model Q: The Dependency Graph

DependencyGraph, CallGraph, TheDependencyGraphIsNotTheCallGraph

Model 4.1 Component Capabilities

This model is flawed because some things, like logging and performance monitoring, have effects on the world outside the process, but are allowed to occur in "pure" functions.

I call these special-case effects "instruments" since their purpose is almost always to probe the running system and get information about it.

In some circumstances, we might also model effectful components as merely stateful—such as when a component reads and writes a file on disk, but we can be reasonably certain that it is the only thing that will ever write to that file.

In all cases, the model of a given component's capabilities must be chosen based on how we want to reason about the system, not based on nitpicking about what the code is really doing.

Heuristic: Most messages sent to and from objects should contain only data, not object references.

Heuristic: Do Not Alter Data

Data, properly speaking, are records of observed facts. Changing a data value doesn't semantically make sense. It also doesn't make sense practically. The TypeScript typechecker simplifies its work by assuming that objects that are simply sets of key-value pairs are immutable. If you mutate these objects, you might introduce type errors that the typechecker won't catch! If you want mutability, use an object as defined in the Component Capabilities model, which only reveals its internal state in response to messages (realized in TS as method calls).

Model 4.2 Time is a Message

An assumption required by this model: synchronous code runs so fast we can consider it instantaneous. For many applications, this is an okay assumption. In JavaScript, which is single-threaded, this way of thinking about time is almost forced on us: since synchronous computation blocks UI interaction, long-running computations must either be offloaded to worker threads (which the main thread communicates with asynchronously) or broken up into small chunks of work that yield control back to the browser every few milliseconds.

Example: Oven Timer

Example: Game with a time limit

Example: Static Core Microprocessor

StaticCore

Model 5: Kinds of Tests

SoftwareTesting

Model 6: Organizing Code

Model 6.1: Model-View-Controller

Model 6.2 Policy and Mechanism

Model 6.3 Application and Infrastructure

Model 6.4 Reactive Architectures

Model 7: Extract-Transform-Load (Input-Processing-Output)

This model is based on ArchitecturalLayers and CodeCapability. The conceit of the extract-transform-load model is that we can divide all synchronous computation into three distinct steps: get inputs (extract), process them to calculate some result (transform) and output that result somewhere (load).

Heuristic: separate input, parsing, processing, presentation, and output.

Heuristic: Domain Logic, Infrastructure, and Error Handling

Use exceptions for infrastructural errors, and union types for domain errors (e.g. input and state validation).

Model 9: Software Development Activities

TODO: should this be model 1?

About half my programming time is spent learning.

A large chunk of the remaining time is spent communicating what I've learned.

The rest of the time I'm inventing new things—that is, solving novel problems computationally.

The time it takes to type in the code is insignificant compared to all of the other work—learning, communicating, and inventing. It's only 1 or 2 percent of the total.

With this in mind, it is obvious why Brooks' Law—that adding staff to a late software project makes it later—is true.

Most of our job is understanding—and 10 people can't understand something faster than one person can.

Given that learning and understanding are so fundamental to programming, let's spend a few pages considering the nature of understanding. The next chapter, on mental models, delves into this issue.

Sources of Confidence

InformalReasoning - c.f. OutOfTheTarPit
- GregWilson cited some reasearch showing that reading the code finds more bugs per hour than testing
SoftwareTesting - passing tests make us more confident
AlgebraicType - proofs of some kinds of internal consistency, ruling out many errors that could happen in a program with dynamic types. If our typechecker outputs no errors, that makes us more confident.
- easiest to see in a language where we can do an apples-to-apples comparison of typed and untyped forms, e.g. TypeScript vs. JavaScript.
CompositionalReasoning with AlgebraicProperties (i.e. "semi-formal" reasoning)

How the Confidence Sources complement each other

InformalReasoning
- Supported by: StructuredProgramming, Symmetry
- limiting the number and Scope of Variables
- limiting the depth of hierarchies (call hierarchy, if statement nesting)
- Cohesion and avoiding IdeaFragments
- See: OutOfTheTarPit
SoftwareTesting
- Flaw: spot-checking is not a proof.
  - Quote Dijkstra, OutOfTheTarPit
  - Complement with: TestDrivenDevelopment, Simplicity, OccamsRazor
  - Complement with: AlgebraicTypes
- Flaw: process-external Effects are hard to test
  - Complement with: TestDoubles and CompositionalReasoning
  - ...or FunctionalProgramming, FunctionalCoreImperativeShell, Simplicity, and InformalReasoning (leaving the CompositionRoot and entrypoint Procedures without UnitTests, as GaryBernhardt does in Boundaries)
- Flaw: duplicate test coverage makes failures hard to interpret and coverage hard to analyze. The opposite, "over-mocking," leads to situations where all the tests pass but the system as a whole doesn't work.
  - Fix with ShallowHierarchy (reducing the number of tests that duplicate coverage) and CompositionalReasoning + ContractTesting (making sure that mocked collaborators have easy-to-reason-about properties).
  - Use AlgebraicTypes to get a baseline level of confidence that components can work together.
- Flaw: you're not the Oracle
  - e.g. you need to call an API that returns some complicated data. It's not clearly documented and you misinterpret the meaning of one of the fields when creating a Stub of the API. So your UnitTests pass, but the system as a whole is wrong.
  - Partial fix: ensure you only make this mistake once by transforming incoming data into a form that's self-documenting or otherwise well-documented, and hard to misuse. I.e. ParseDontValidate.
- Summary: testing is complemented by TDD (writing the simplest code that passes the tests), an architecture that pushes effects to the boundaries or abstracts them behind contracts with convenient algebraic properties, a shallowly layered architecture, and a discipline of understanding the data that's passed across architectural boundaries.
AlgebraicType
- Flaw: certain generic interfaces are very difficult or impossible to express in certain type systems. E.g. generic variadic functions.
  - This is a shortcoming of current, specific type system technologies, not the mathematics of types
  - Even proponents of dynamic typing rely on the idea of types to make sense of their programs—they just don't have automated tools to check their thinking.
  - Possible resolution: something like Clojure's spec? Then you can't write fewer tests, though.
- Summary: algebraic types are complemented by tests.
AlgebraicProperty and CompositionalReasoning
- Flaw: error propagation threatens the simplicity of algebraic properties when the implementor has process-external Effects.
  - Fix: DomainSandwich, CheckedExceptions considered harmful

Programming Tactics

The True Goal

The first goal is the sine qua non of the second.

A sense of oneness with your work. "Oneness" isn't quite the right word, and there are many possible near-synonyms: connectedness, identity, care, kindness. Oneness involves:

a feeling of Mastery: you are confident that you can get the code to do what you need it to do. When you get a feature request, you can often develop plans of action that will work.
a feeling of Compassion toward your fellow programmers, the users of your software, and the authors of your dependencies. Compassion also involves doing things that help your coworkers achieve oneness with their work (e.g. writing understandable code with APIs that are hard to misuse).
a sense of responsibility. If the code's Messy or has a Bug, you take it seriously and fix it. This kind of responsibility can't be forced on you by others. You assume it, naturally and inevitably, when you care about the code and the people it affects. If the other qualities of oneness are present, you'll find it easy to fix any problems you cause, so the responsibility won't be a burden.
non-Instrumentality. Instrumentality often appears when you try to get something for free, without putting time or attention into it, or identifying with it. E.g. suppose you use someone else's code that you don't understand very well to try to make a job easier. If what you try doesn't work, it's easy to get frustrated and blame the code or the other person, which causes everyone to suffer. An attitude of non-instrumentality both:
- recognizes that you may have to put some learning effort in to get the benefit of using the code. Nothing is free.
- is willing to let go of a dependency on bad code and reimplement the functionality, if that's the pragmatic option.
continuous attention and course-correction as you work.