Skip to content

Abstraction

Ben Christel edited this page Dec 17, 2024 · 15 revisions

Pithy Aphorisms

A bad abstraction makes you wonder if you've missed something. A good abstraction makes it obvious that you haven't.

The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise.

—Edsger Dijkstra

Don't abstract logic "away". Abstract it apart.

Abstraction is not about obscuring relationships, but about making the absence of certain relationships immediately, palpably obvious.

Discussion

A good abstraction leaves out everything inessential while precisely specifying everything that is essential.

A good set of abstractions is a high-level language in which the concrete behaviors required of our program are easy to express.

We often talk about abstractions "hiding implementation details" from their clients. But good abstractions go further: A good abstraction hides details about the clients from the implementation.

When great thinkers think about problems, they start to see patterns. They look at the problem of people sending each other word-processor files, and then they look at the problem of people sending each other spreadsheets, and they realize that there’s a general pattern: sending files. That’s one level of abstraction already.

Joel Spolsky

Code that sends files shouldn't care about whether those files are Word documents or Excel spreadsheets or plain text or anything else. The specific use case of the caller should be abstracted away from the low-level file-sending code.

Poor abstractions, by contrast, hide from the client what really ought to be under the client's control, while forcing the implementer to know about the all the messy details of the clients' needs.

It is precisely the hiding of details from the implementer that makes abstractions reusable. If the implementation knows about the details of a specific use case, that implies it is coupled to that use case and can't easily be repurposed.

The word "abstraction" can refer to a lot of different patterns in programming that are used in very different ways. In the strictest sense of the term, even a variable is an abstraction because it expresses a generalization over a set of values. A function that takes parameters is an abstraction. An interface with many implementers is an abstraction.

Every abstraction expresses a generalization over the members of some set.

It's important, though, not to imagine that the abstraction is generalizing what it merely conceals. IndirectionIsNotAbstraction, though the two are often conflated.

For example, consider the following procedure call:

// Not an abstraction
users = database.findUsersWithPostsInLast10Days()

Some programmers call this an "abstraction" because it "hides implementation details". But this is not an abstraction: it generalizes nothing (since it takes no parameters); it merely abbreviates. The name of the procedure contains exactly the same information (we hope) as its implementation. If we don't want to read the implementation, hope is indeed our only option. The interface is opaque and provides no insight into its inner workings.

Contrast with this interface:

last10Days = DateRange(daysAgo(10), today())
users = database.find(UsersWithPostsIn(last10Days))

Here, we've separated concerns and introduced a couple of true abstractions. We now have a concept of a date range in which the specific dates have been abstracted away—that is, the concept generalizes to any dates. The database.find procedure accepts any query object—thus generalizing over the set of possible queries.

Here is another pattern that people often say is abstraction:

// not an abstraction
users = database.findActiveUsers()

Suppose this findActiveUsers procedure actually gets the users with posts in the last 10 days—same as the other examples. People tend to think this is good because now we've identified a domain concept: we've said what the "posts in last 10 days" requirement means to our application. In doing so, we are supposedly "free to change implementation details" like the criteria by which we determine whether a user is active.

In fact, we are very likely not free to change implementation details—the feature this is supporting (maybe it's sending emails or something) is only supposed to be turned on for users with posts in the last 10 days, dammit, and the product manager will know and care if that definition changes. If we reuse the "active" concept—maybe we only show links to the profiles of active users—the criteria for that specific usage are likely to change independently of other usages. That is, the PM will eventually request a change like "show profile links for users who have logged in in the last month, but only send emails to users who have posted in the last week" and the developers will look at each other and groan.

Kranz's First Corollary: If something can be known statically, it will be depended on statically. If the effect of our code is that users who have posted in the last ten days get emails, then something, some human or machine somewhere in the system, is going to assume that that will continue to be true unless it is explicitly decided and communicated that it will change. It's our job as engineers to make that concretion as visible, intelligible, and easy to change as possible—and the way we do that is by creating abstractions (like DateRange, daysAgo, and UsersWithPostsIn(dateRange)) that let us express the concretion in the highest-level language we can.

Why Abstractions are Useful

Since abstractions generalize, they have a better chance of being Stable in RobertCMartin's sense of the word—they have many callers. Going back to the database.find example: we've likely seen database.find in other parts of our application and we trust it. It's battle-tested. If there are bugs in this code, we know where to look: the UsersWithPostsIn(dateRange) object.

If people have to read a Routine to understand it, it should be inlined. DuplicationIsCheaper than ill-conceived indirection.

Don't Increase Connascence By Duplicating

Please don't read this as "duplicate ALL the things!" Duplication can allow code that needs to be in sync get out of sync.

For example, suppose we have the following two code snippets, potentially in far-distant parts of the codebase:

quiz = new Quiz(problems=10)
lastProblem = quiz.problems[9]

It's clearly an assumption of our system that quizzes are going to have 10 problems. If that number changes, just doing a find-replace for the number 10 isn't going to cut it.

In this case, we should extract a constant:

quiz = new Quiz(problems=PROBLEMS_PER_QUIZ)
lastProblem = quiz.problems[PROBLEMS_PER_QUIZ - 1]

Clearly, it would be better if these two code snippets were localized (perhaps in the Quiz class?) and moving connascent values closer together is always a good thing. But sometimes, you just can't avoid referencing the same concept in multiple places, so extracting a constant is the lesser evil.

Why is this different from the active users example? Well, in the active users example we were yoking multiple features together that could otherwise change independently without causing problems. By doing so, we unnecessarily constrained the space of changes we could easily adapt to. Here we're making connections between multiple pieces of code related to the same feature. If these didn't stay in sync, there would definitely be bugs. So here we're keeping our options open—making sure that we can make a certain type of change safely.

Clone this wiki locally