09 Structure and modularity

Structure and modularity guidance

Please bear with me while I set the scene by seeming to state the glaringly obvious. Thereafter I'll provide the analysis supporting my conclusions. Alternatively you could skip this and read the last paragraph at the very bottom of this page.

When considering the behaviour (actual or desired) of any complex system, we first and foremost need sufficient insight into that system - what it does and why. This is analysis. Once you are armed with an appropriate depth of analysis, large scale design can begin, and top of the list of design activities is system decomposition. During decomposition we exercise choices to break-up large complex objectives into smaller simpler objectives, and to iterate this concept until we reach simple steps, that we can express as functions or methods. The earlier choices should lead to (sub-)component and (sub-)system decompositions. Attention to data often yeilds a complex picture that can also be made more accessible by thinking about and then choosing how to group and separate data elements.

Whether you see these activities as a precursor step preparing you to start writing code, or whether you feel you need to start implementing first, to gather enough insight to make your analysis and design effective depends on many factors and a few preferences. You can develop in a waterfall style (sequentially) or adopt agile (do everything iteratively) or do something in between. You might select top-down or bottom-up or meet-in-the-middle design.

Regardless of the approach, it is undeniable that two important attributes of the eventual solution are strongly correlated to how well analysis and design have actually been done. These attributes are:

the development cost, time to delivery, and quality of the eventual solution
maintenance costs

I dont think I need to argue this axiom. In agile, maintenance begins when the first line of code is written, and so maintenance costs boils down to development costs.

If your goal is to minimize life-cycle cost and to maximize quality then you should be interested in predictive measurements that are strong indications of good, bad or indifferent analysis and design. This is especially true in the agile world where saving money is frequently claimed as a reason to omit from projects any formal expression of analysis and design outcomes. This often leaves quality assessors in a desert, empty of evidence.

This is where structure and modularity assessment comes in. If you assess the structure and modularity of your implementation e.g. by peer review you will have a measure of a factor that correlates with good analysis and design.

To do this you need some guidance to differentiate good structure/modularity from bad. Below I propose a few candidates, that are cheap and easy to monitor. They do not necesarily predict good quality, but low scores are suggestive of bad quality, and simply by attending to an assessment/measurement you should gain some insight.

Predictive factors for lifecycle cost and quality

decomposition depth
interface counts and complexity
encapsulation and complexity hiding

Decomposition depth

Design decomposition is very frequently represented (if its expressed at all) as a single flat picture showing a dozen or more interacting parts. Frequently 'joining-up' lines show lots of interactions between these parts and some interactions are not shown "to avoid linking everything to everything". Its worth emphasizing to programmers (who are often puzzle afficionados, and complexity athletes), that "Keep It Simple, Stupid" is expert advice on best practice. The human brain can typically cope with seven simultaneous concepts. It is more effective when working with less. So this typical flat picture is very far from optimal. It is arbitrary and non-rigourous, and it essentially fails to qualify as modular or encapsulated. Its no more developed than a first impression. A superficial preparation such as this is a strong predictor of spaghetti code - high cost and fault content.

Benefit would arise from (sub-)component grouping so that the top-level component count is five or less, and if necessary increasing the depth of the design model from (the usual) one, to two, or more. Consider that a balanced decomposition into five component parts, repeated to a depth of five would offer up 3125 objects. If each object was a class with, say, again an 'optimal' 5 methods, we would have over 15000 methods and all-in something between a half a million and a million lines of code. This is upwards of 50 engineering-years work. Small to medium sized projects can easily be accommodated within this guidance. Imagine navigating through this without a strong intuitive feel for where in the picture you need to look to find the specific feature implementation and/or the bug. But we can effortlessly remember that Kinshasa, is in the DR Congo, in Africa, and if we needed to find someone or an address there we would immediately know where to start.

Of course when considering decomposition the first concern is the problem space. There's nothing wrong with a flat decomposition into 26 sub-components that hold logic to process unique letters of the alphabet, and it would be nonsense to limit that to five. When considering depth, there are also a limited number of platform, language and tools related containers for systems subsystems and components. We need to use these containers in a hierarchy to provide depth, and some (like libraries) are comfortable and easy to use, while others can be brought to serve, but are less obvious.

We have operating system components 'applications' 'services' 'devices'; in Visual Studio, solutions and projects, programs and libraries; In the C++ language we have namespaces, classes, structs, unions, and enums and these can be assembled in source and header files. Any of these can be co-opted into a discipline or scheme to represent a level in decomposition. Still libraries stand out (because they nest easily).

Regardless of how it is achieved, decomposition to five or less can be a driving factor to ensure that some design effort is expended on structure and modularity. Assessing compliance with five or less (sensibly) by peer review, can enforce adequate attention to strategies that make the code base smaller simpler and more easily navigable. That is a key cost factor when extending somebody else's code or tracking down a bug.

The App3Dev example exhibits the following structure and modularity strengths

A top level with two (<5) source files.
A depth of two (>1) by using component projects/libraries
The top level links to two(<5) support libraries
The support libraries each contain three (<5) source (.cpp) files
All support sources represent a single class, with another single internal impl class

There are also accceptable weaknesses, which you are invited to identify and assess for yourself.

Of course it isn't this end destination that matters, but rather the continual iterative process of analysis and design that has lead to this outcome that I want to encourage by example here. On this basis I submit that the decomposition depth of App3Dev is a strong indicator that at least some analysis and design effort has been applied and claim that this is well spent time yielding benefits for code structure and accessibility. Judge these benefits from the results, and assess these suggestions for yourself.

As design recommendation

Use decomposition in depth to create a hierarchy of domains.
Domains decomposition should try to reflect recognizable concepts and details of the problem space.
Ideally components should align with specific expertise or knowledge areas.
Leave plenty of slack during initial design to accommodate complexity growth over the life-cycle.

As implementation recommendation

Select a container from the available system, tooling, or language options to represent decomposition layers.
Apply regular peer review to assess decomposition to less than five (and other criteria).
Refactor as frequently as necessary to keep modularity advantages.

Interface counts and complexity

header files
class headers
project references

Here I want to emphasise that an optimal design has the minimum number of interfaces between components and that these interfaces are as simple as they can be. Paying attention to managing this design aspect (or failing to do so) is the difference between modular design and spaghetti.

I can't suggest any count of the interfaces or a magic number. I do propose that frequent peer-review (during initial development, bug fixing and feature addition) counts and keeps track of the three bullet items above, and attends to emerging trends that are observed. Managing these trends is done by changing system decomposition, moving objects around, or reconsidering their scope and boundary.

An additional reason for the decomposition to five or less guidance, is that reorganization and relocating logic to respect that guidance, simultaneously attends to managing interface size and complexity.

You should understand that by driving down number and complexity of interfaces, you are automatically managing solution size (and complexity). There is less code to build manage understand and maintain, and less cost.

Encapsulation and complexity hiding

One of the most useful ideas to emerge as trending now is the PIMPL idiom. This decouples implementation from client code, and (by using a pointer to implementation) ensures a strong separation and independent evolution of client and implementation code. This has the effect of hiding the complexity of the implementation almost effortlessly.

PIMPL also has great value on Windows when using the UTF8-everywhere paradigm, as the term 'everywhere' can be reasonably adjusted to mean all code outside the (internal) implementation class, including the interface to the implementation. Relaxing UTF-8 everywhere rules inside the implementation of objects that build on (windows) platform features makes adopting and reusing legacy code and interacting with operating system calls a much more practical proposition than tightly wrapping every individual platform library call that your software makes. You can see that most clearly (e.g. device.cpp) where Win32 calls designed for unicode are required.

Do not lose sight of the fact that structure and modularity choices are made expressly to hide complexity from client code, in a way that promotes actionable assignments. Problems should be assignable to engineers with specific knowledge and expertise that is well aligned with the purpose of the component or subsystem domain.

Again frequent peer-review should assess at least

decomposition to less than five
containers selected (eg libraries/other) to implement design layers
application of PIMPL to decouple, encapsulate and hide complexity

Provide feedback

Saved searches