-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathintro.tex
259 lines (218 loc) · 12.5 KB
/
intro.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
%!TEX root = paper.tex
\chapter{Introduction}
\label{sec:intro}
% Here is a problem
% It's an interesting problem
% It's an unsolved problem
% Here is my idea
% Describe the problem
% State your contributions
% Here are the parameters of the problem:
% \begin{itemize}
% \item We want to support separate modular development in Haskell,
% for all the reasons we've stated above (ability to abstract over
% implementation, giving you the ability to swap out components.
% Extreme implementation changes.)
% \item What do we want to abstract over? We want to abstract
% over packages. This is because packages are the way large
% scale software is distributed.
% \item But there is an abstraction barrier between the compiler
% and the package system. How to design a system that respects
% this barrier. (Why is this hard?)
% \end{itemize}
% The case for separate modular development
% The case for separate modular development at the package level
% The basic idea
% Large software is made of packages.
% Packages are not modular.
% Idea of separate modular development.
% SMD is not done at package level
The essence of modular development is to decompose a program into
collection of smaller modules separated by abstraction barriers: the
defining characteristic of a modular program is the ability of a module to
operate without knowing the implementation details of another. Modular
programs are easier to understand; opportunities to reuse modular
components are everywhere:
\begin{itemize}
\item Suppose you are writing an application which makes a request to
some external backend service---perhaps a database or a credit card
processor. You might want to modularize your
application so that you can \emph{swap} out the real implementation
of a service with a fake, so that you can test your application without
charging your customers.
\item Suppose you are writing a library that handles strings in some
way---perhaps a parser or a string processing library. Usually, your
library doesn't require a specific representation; in fact, you would
like your library to be \emph{parametric} over all the string representations
your clients might use (whether it is linked lists of
characters, packed bytestrings, packed UTF-16 strings, packed UTF-8
strings\ldots).
\item Suppose you have two implementations of an API\@: a simple but
inefficient reference implementation and a complicated but
fast production implementation. You might like to \emph{instantiate}
an application with both implementations and then run them in
parallel, comparing their outputs to verify the correctness of the
production implementation.
\end{itemize}
%
Typically, programming languages provide support for both \emph{incremental
modular development}, where a module can be developed only when implementations
of its dependencies are available, and \emph{separate modular development},
where a module can be developed with respect to an \emph{interface}
and instantiated later. Haskell's existing module system is a mechanism
for incremental modular development---it is purely a namespace management
mechanism with a proper concept of interface---while Haskell's type classes are a
mechanism for separate modular development. Separate modular development
offers more abstraction, as the interface characterizes
what is needed from a dependency.
Unfortunately, type classes are ill-suited for certain cases of
separate modular development:
\begin{itemize}
\item From a code perspective, type class parametric code is often
harder to use than monomorphic code. For an inexperienced
Haskeller, the proliferation of constraints and type parameters in the
type signatures of functions can make an otherwise straightforward
API impenetrable:\footnote{\url{https://hackage.haskell.org/package/regex-posix-0.95.2/docs/Text-Regex-Posix-Wrap.html}}
\begin{lstlisting}
(=~) :: (RegexMaker Regex CompOption ExecOption source,
RegexContext Regex source1 target)
=> source1 -> source -> target -- from regex-posix-0.95.2
\end{lstlisting}
Furthermore, type classes work best when exactly a single type
parameter is involved in instance resolution. If there aren't any type
parameters, you must introduce a proxy type to drive instance resolution; if
there are multiple parameters~\cite{lfp92}, you often need to resolve ambiguity
by introducing functional
dependencies~\cite{Jones:2000:TCF:645394.651909} or replacing
parameters with associated types~\cite{towards-open-type-functions-haskell}.
\item Type classes work best when it is clear what methods they
should support. However, for many interfaces, it is not obvious
what set of methods should be put into an interface; for example,
there are many functions which an interface for strings might
support---which ones go in the type class? It is inconvenient
to define a new type class (and new instances) just to add a
few more methods, and thus this is rarely done in practice.
\item From a performance perspective, code which uses a type class
is compiled without knowledge of how the type class's
methods are implemented. This can be quite costly in a language
like Haskell, where inlining definitions is essential to achieving
good performance.~\cite{PeytonJones:2002:SGH:968417.968422} This problem can
be mitigated by specializing code to a particular type, but if this
specialization occurs at use-site, the compiler ends up repeatedly reoptimizing
the same function, blowing up the code size of
a program. (C++ templates suffer from similar problems.\footnote{\url{https://gcc.gnu.org/onlinedocs/gcc/Template-Instantiation.html}})
\end{itemize}
%
In short, Haskell would be well served by an extension to its module
system to support the cases of separate modular development which are
currently poorly served by type classes.
The Backpack package system by Scott Kilpatrick et al.~\cite{backpack}
(hereafter called \OldBackpack{}) broke new ground, arguing that \emph{mixin
packages} could be a good fit for providing modularity in Haskell.
The two words ``mixin'' and ``package'' capture the key properties of \OldBackpack{}:
\begin{itemize}
\item \emph{Mixins}. Mixin modules~\cite{bracha+:modularity,ancona+:cms,flatt+:units,duggan:mixin} are characterized by components which both \emph{provide} and \emph{require}
modules: these components can be ``mixed'' together, linking
together provisions and requirements which have
the same name. Unlike the more conventional approach of
\emph{parametrized} modules (ala ML~\cite{milner+:def-of-sml-revised}),
mixin modules natively support recursive linking.
Furthermore, mixins help reduce
the preponderance of \emph{sharing constraints} that occur with ML
functors, since requirements can be mixed together by name.
(For more discussion on this, see Appendix~\ref{sec:mixins-reduce-sharing-constraints})
\item \emph{Packages}. Rather than introduce a new module language
to the compiler, \OldBackpack{} proposed that mixins be added to
Haskell's package system. There are multiple benefits to this choice.
First, it allows us to retrofit Haskell to support separate modular
development without requiring ``yet another type system extension.''
Second, it means that Haskell code can be made more
modular without having to adopt the ``fully-functorized'' style~\cite{structabshotlang}
commonly associated with ML functors.
Finally, it brings modularity to the place where it matters most:
software development in the large; after all, \emph{packages} are how
software systems are organized at the largest scale.
\end{itemize}
%
There was just one problem: despite its emphasis on being a practical
design, \OldBackpack{} could not be implemented in a real world compiler
like GHC\@. Why not? The
semantics of \OldBackpack{}, especially its package-level semantics,
were closely entwined with the semantics of Haskell itself, violating
the traditional abstraction barrier between the compiler and package
manager. Without tightly coupling GHC (the Haskell compiler) and Cabal
(the Haskell package manager), there was no way to directly implement
\OldBackpack{}.
In this thesis, we show how to refactor
\OldBackpack{} to respect the abstraction barrier between the package
manager and the compiler, giving us a new system: \Backpack{}. In
particular, we divide \Backpack{} into two parts: \emph{mixin linking}
which is indifferent to Haskell source code, and \emph{typechecking against
interfaces}, which is purely the concern of the compiler.
Specifically, the contributions of this thesis are as follows:
\begin{itemize}
\item We describe a package language which can be \emph{mixin
linked} without knowing anything about the Haskell programming
language. (\cref{sec:mix-in}) This gives us a language agnostic mechanism for
expressing separate modular development, in contrast to
\OldBackpack{}, whose semantics critically relies on the import structure
of Haskell modules. As a result, \Backpack{}
employs a far simpler mixin linking procedure essentially equivalent
to early descriptions of mixin linking in the literature.~\cite{cardelli:linksets}
In doing so, we adjust \OldBackpack{}'s primary technical device,
the \emph{module identity} into a new concept, the \emph{unit identity} (\cref{sec:uid}),
which can be computed by language agnostic mixin linking.
Our new package language is order-independent, which is
essential for backwards-compatibility with Haskell's existing
package format.
\item To express the results of mixin linking, we introduce the
\emph{\unit{} language}, an intermediate representation that
represents all instantiations explicitly. This language is reminiscent of
ML-style applicative functors, but with a twist:
unfilled requirements of dependency are propagated to the requirements
of the enclosing package via the process of \emph{signature merging}.
The effect is that \Backpack{}, as a whole, has similar
semantics to the ones that were pioneered by \OldBackpack{}.
\item We describe how to typecheck the \emph{\unit{} language} in the
real world (\cref{sec:compiler}), typechecking the source into semantic objects that
faithfully reflect the full glory of GHC Haskell's type system.
We give an unsound but pragmatic mechanism for supporting \emph{type classes}
(it is unsolvable in the presence of open type families), support
the use of \emph{type synonyms} to implement abstract data
(not supported by \OldBackpack{}), and describe
some tricky interactions with GHC's type inference algorithm and
roles annotations.
\item This is not a paper design: Backpack has been fully
implemented GHC 8.2 and cabal-install 2.0.
We give a tour of the modifications we made to GHC to
implement the necessary instantiation and merging operations
required by the \unit{} language, as well as the architectural
changes that were necessary in Cabal (the package manager)
to orchestrate the typechecking and building of the mixin
packages. (\cref{sec:implementation})
\item Does \Backpack{} work?
We have done a number of case studies to give evidence
that \Backpack{} works, including conversions of complex,
real-world packages to use \Backpack{}. (\cref{sec:evaluation})
% Among other things, these case studies
% have motivated the introduction of \emph{signature thinning}
% and how Backpack interoperates with Cabal's existing dependency
% solver.
\end{itemize}
%
% In this thesis, we do not address
% mutual recursion between packages, allowing us to avoid some orthogonal
% technical complications. We believe that in practice this is a minor
% limitation.
Although the idea of separating the core language and the module/package
language is not novel~\cite{leroy:modular,milner+:def-of-sml-revised,rossberg+:f-ing}, we are the first to exploit these ideas in service
of an actual implementation, retrofitting strong modularity to an existing
programming language. The true test of \Backpack{} will be whether or not
the larger Haskell community adopts it, which can only be seen in the
coming years, but we have enough experience working with \Backpack{}
that we believe that this design is practical and solves many problems
that Haskell programmers face today.
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "paper"
%%% End: