-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathoverview.tex
728 lines (643 loc) · 32.6 KB
/
overview.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
%!TEX root = paper.tex
\chapter{The pipeline}
\label{sec:overview}
\begin{figure}
\center\includegraphics{diagrams/pipeline.pdf}
\caption{Diagram of the pipeline}\label{fig:pipeline}
\end{figure}
\begin{figure}
\begin{mdframed}
\begin{description}
\item[Source component]
The smallest unit of user-written source which can have its
dependencies resolved, e.g., a library, an executable or a test
suite. A \textbf{source package} contains one or more source
components, and is the unit of code that may be distributed
(e.g., on Hackage). A source package is identified by a
\textbf{package identifier} consisting of a package name and
package version (e.g., \verb|p-0.1|); a source component is in
turn identified by a \textbf{source component identitifer}
consisting of the package identifier with the name of the
component.
\item[\Ccomp{}] ($\Scomp$ in \cref{fig:rcomponents}) The output of dependency
resolution on a source component. A \ccomp{} is the
input source with direct dependencies resolved to
other \ccomp{}s, and \Backpack{}-irrelevant Cabal fields dropped.
As dependency resolution is
nondeterministic, a \ccomp{} is identified by a \textbf{\cid}
($p$), which augments the source component identifier with the
resolved \cid{}s of the direct dependencies. (Thus, the \cid{}
identifies the entire transitive source a component depends on.)
In \Backpack{}, we treat \cid{}s as opaque strings.
\item[\Unit{}] ($\Uunit$ in \cref{fig:lcomponents}) The output
of mixin linking on a \ccomp{}, where we have computed the ``wiring
diagram'' for the direct dependencies of the unit, describing how
requirements of the \ccomp{}'s direct dependencies have been
(partially) filled. Mixin linking is
deterministic; thus, a \unit{} is also identified by a \cid{}. A
\unit{} can be typechecked by the compiler.
% \item[Instantiated component] The output of instantiating a
% \unit{}. It is identified by an \textbf{\uid{}} ($P, Q ::= \I{p}[S]$,
% also known as a \emph{unit identifier})
% which is simply a \cid{} equipped with a module substitution.
% If the substitution has no module holes, an instantiated component
% can be compiled; otherwise, the type of an instantiated component
% can be computed by appropriately substituting over the type of the \unit{}. Two types are equivalent if and only if
% they are defined by the same module from the same \unit{}s instantiated in the same
% way.
% \item[Module substitution] ($S ::= \overline{m \mapsto M}$) A module substitution
% is a mapping from module names to module identifiers. These are most
% commonly used to say how a component is instantiated.
% \item[Module name] ($m$) Source-level module name, e.g., \verb|Data.Map|.
% \item[Module identifier] ($M ::= P\!:\!m \,|\, \holevar{m}$) A fully
% qualified name (by $P$) for a module, or a module hole
% $\holevar{m}$ (which can be instantiated by filling the
% requirement at $m$).
\end{description}
\caption{Glossary of representations in \Backpack{}}
\label{fig:glossary}
\end{mdframed}
\end{figure}
The implementation and semantics of \Backpack{} is structured as a
series of passes (\cref{fig:pipeline}) on several intermediate
languages (\cref{fig:glossary}).
In this section, we will give a detailed summary of all the phases in
this pipeline, starting with the source package language described in
the previous section, and continuing on to two intermediate languages
that will serve as the basis for our semantic account for \Backpack{}.
\section{Dependency resolution to a \ccomp{}}
\begin{figure}
\[ \DIGatoms{} \]
\[ \DIGsource{} \]
\[ \DIGresolved{} \]
\caption{\Ccomp{}s. While in this syntax declarations like \texttt{exposed-module}
embed the corresponding Haskell source code, in practice, this source code is recorded in a separate file.}\label{fig:rcomponents}
\end{figure}
\begin{figure}
\begin{verbatim}
component base
exposed-module: W module W where
data I = MkI I
f (MkI i) = i
component p
signature: A signature A where data J
signature: B signature B where
data K
g :: K
exposed-module: Y module Y where
import A
import B
data L = MkL J K
component q
signature: A signature A where data I = MkI I
exposed-module: X module X (I(..), K, g) where
import A
type K = I
g = MkI g
component r
mixin: base (W)
mixin: p (Y) requires (B)
mixin: q (X as B)
signature: A signature A (I(..)) where
import W(I(..))
reexported-module: Y as Z
\end{verbatim}
\caption{Our running example, with the package language displayed
on the left, and the Haskell source code displayed on the right.}\label{fig:resolved-example}
\end{figure}
The first step in
processing a source component is \emph{dependency resolution},
the existing Cabal phase that
selects the versions and conditional flags of all direct dependencies of
a source component (and their transitive direct dependencies as well).
The result of dependency resolution is a \emph{\ccomp}
(\cref{fig:rcomponents}) which has all conditionals and fields
irrelevant to \Backpack{} eliminated, and all direct dependencies
resolved to \cid{}s. In this thesis, we define a \ccomp{} to be precisely the fragment of the
Cabal package language which is relevant to \Backpack{}, written in a
form convenient for processing (e.g., each \verb|exposed-module| is
specified individually and explicitly associated with its source).
Dependency resolution in Cabal is a separate topic in its own
right~\cite{well-typed-solver, well-typed-qualified}, beyond the scope of this thesis.
Instead, we simply assume that we have a black box which
produces \ccomp{}s. One thing that is worth noting is that
every package \verb|p| mentioned in \verb|build-depends|
that is not mentioned in \verb|mixins| translates to
\verb|mixin: p|---this behavior maintains backwards compatibility
with existing Haskell packages which use \verb|build-depends| to
bring modules of packages into scope.
\section{Mixin linking to a \unit{}}
\label{sec:overview-mixin}
\begin{figure}
\[ \DIGuid{} \]
\[ \DIGshape{} \]
\[ \DIGmixed{} \]
\caption{Component shapes and \unit{}s.}\label{fig:lcomponents}
\end{figure}
\begin{figure}
\[
\begin{array}{l}
\UsynUnitH{\cidl{base}}{} \haspr \{\provMod{W}{\icid{base}{}}{W}\} \\
%\qquad \UsynLet{W}{\Mod{\icid{base}{}}{W}} \\
\qquad \UsynMod{\modname{W}(\texttt{I}(..))}{\texttt{data I = MkI I; f (MkI i) = i}} \\
\\
\UsynUnitH{\cidl{p}}{\hv{\modname{A}}~\hv{\modname{B}}}
\haspr \{
\provMod{Y}{%
\icid{\cidl{p}}{ \subst{A}{\holevar{A}}, \subst{B}{\holevar{B}} }%
}{Y} \} \\
\qquad \UsynSig{A}{\texttt{data J}} \\
\qquad \UsynSig{B}{\texttt{data K; g ::}~\texttt{K}} \\
%\qquad \UsynLet{Y}{\MOD{p}{\substHole{A}, \substHole{B}}{Y}} \\
\qquad \UsynMod{Y}{\texttt{import A; import B; data L = MkL J K}} \\
\\
\UsynUnitH{\cidl{q}}{\hv{\modname{A}}}
\haspr \{ \provMod{X}{\icid{\cidl{q}}{\subst{A}{\holevar{A}}}}{X} \}
\\
\qquad \UsynSig{\modname{A}~(\texttt{I}(..))}{\texttt{data I = MkI I}} \\
%\qquad \UsynLet{X}{\MOD{q}{\substHole{A}}{X}} \\
\qquad \UsynMod{\modname{X}~(\texttt{I}(..), \texttt{K}(..))}{\texttt{import A; type K = I; g = MkI g}} \\
\\
\UsynUnitH{\cidl{r}}{\hv{\modname{A}}}
\haspr \{
\provMod{Z}{\icid{\cidl{p}}{%
\subst{A}{\holevar{A}},
\substMod{B}{\icid{\cidl{q}}{\subst{A}{\holevar{A}}}}{X}
}}{Y} \}
\\
\qquad \UsynDep{\icid{base}{}}{ \rename{W}{W} } \\
%\qquad\qquad \textsf{with}~\Mod{\icid{\cidl{q}}{\substHole{A}}}{A},
% \Mod{\icid{\cidl{p}}{\substHole{A}, \subst{B}{\ldots}}}{A} \\
\qquad \UsynDep{\icid{\cidl{q}}{\substHole{A}}}{ \rename{X}{B} } \\
\qquad \UsynDep{\icid{\cidl{p}}{\substHole{A}, \substMod{B}{\icid{\cidl{q}}{\substHole{A}}}{X}}}{ \rename{Y}{Y} } \\
\qquad \texttt{signature}~\modname{A}~(\texttt{I}(..))~\{~\texttt{import W(I(..))}~\} \\
\end{array}
\]
\caption{\Unit{}s. Each component is annotated with its provided modules, as
$\texttt{component}~p~\overline{\protect\hv{m}}\haspr\{\provs\}$.}\label{fig:linked-example}
\end{figure}
The next step is \emph{mixin linking}, which
elaborates a \ccomp{} into a \emph{\unit{}} and computes its
\emph{component shape} ($\lctx ::= \lctxpair{\provs}{\reqs}$).
A component shape describes precisely what module identities the
component requires ($\reqs$) and provides ($\provs$).
A \unit{} essentially describes the set of command line flags that will
eventually be passed to GHC\@. The syntax of \unit{}s is given in
\cref{fig:lcomponents}:
\begin{itemize}
\item $\UsynDep{\UP}{r}$ indicates an external component dependency,
which should be brought into scope for imports according to the
renaming $r$. $P$ is an \uid{}; a \cid{} augmented with a
\emph{module substitution} $S ::= \overline{\subst{m}{M}}$
specifying how every requirement of the component is instantiated.
A \uid{} need not be fully instantiated: we can also leave some
requirements uninstantiated by recording a \emph{module hole}
($\hv{m}$) in the module substitution.
(Compare to the \texttt{mixin} in a \ccomp{}, which only gives a \cid{}).
\item \UsynMod{m}{\Uhsbody} indicates that this component defines
a module named $m$ with Haskell source code $\Uhsbody$.
\item \UsynSig{m}{\Uhssig} indicates that this component has a signature
named $m$.
\end{itemize}
%
The most important part of mixin linking is the computation of \uid{}s in
\texttt{dependency} declarations. While a syntactic semantics
for mixin linking is given in \cref{sec:mix-in},
the mixin linking algorithm
can actually be understood entirely pictorially:
\begin{figure}[H]
\center\includegraphics{diagrams/mixin-diagram.pdf}
\end{figure}
\noindent
Here, we look at the process of mixin linking \cidl{p} and \cidl{q} in
the component \cidl{r} in our running example.
The includes of \cidl{p} and \cidl{q} are represented as boxes, each of
which have an input port for every required module (\modname{A} for \cidl{q}
and \modname{A}, \modname{B} for \cidl{p}), and an output
port for every provided module (\modname{X} for \cidl{q} and \modname{Y}
for \cidl{p}). Every port is labeled with a module name, which may
be renamed according to the \texttt{mixin} (e.g., \modname{X} from \cidl{q}
is renamed to \modname{B}).
These two components are linked in two ways: first, the (renamed) output
of \cidl{q} is linked to the input of \cidl{p} which has the same
\modname{B}; second, the input \modname{A} of \cidl{p}
and \cidl{q} are merged together to form a single input for \cidl{r}.
The result is a complete wiring diagram for \cidl{r}\footnote{Well, mostly
complete; \cidl{base} is omitted since it doesn't contribute in any
interesting way to linking.}:
\begin{figure}[H]
\center\includegraphics{diagrams/uid-diagram.pdf}
\end{figure}
\noindent
With this wiring diagram in hand, we can say what the shape of \cidl{r}
is by successively computing \emph{module identities} ($M$) and
\emph{\uid{}s} ($P$) for the modules and components defined in the
component. In \cidl{r}, we first bind a \emph{module variable} \hv{A}
(a module identifier) for our unfilled requirement \modname{A}. \hv{A}
feeds into the input port for \cidl{q}: thus we give \cidl{q} the
\uid{} $\cidl{q}[\subst{A}{\hv{A}}]$, mapping each of its input ports (\modname{A})
to the module identifier (\hv{A}) that filled it. As \modname{X} is defined
in \cidl{q}, it then gets the module identifier $\Mod{\cidl{q}[\subst{A}{\hv{A}}]}{X}$.
We continue computing identities until we know the identity of every exported
or reexported module in \cidl{r}; in this case, \modname{Y} from \cidl{p}
(which is reexported as \modname{Z}). Then the component shape of
\cidl{r} is $\forall \hv{A}.\, \{ \prov{\modname{Z}}{\Mod{\cidl{p}[\ldots]}{Y}} \}$.
The elaborated \unit{} for \cidl{r} can be seen in \cref{fig:linked-example};
it too follows straightforwardly from the wiring diagram.
% \section{Ordering a \unit{}}
% There is a minor step which must be done before typechecking a
% \unit{}: we must \emph{order} the declarations of a \unit{} since declarations in the
% earlier languages in the pipeline are unordered. At this point,
% we step into the realm of the GHC compiler, as this ordering depends
% on the \texttt{import} statements within (Haskell-level) modules and
% signatures. We can define a topological ordering on declarations in a
% \unit{} as follows:
% \begin{itemize}
% \item Obviously, every module/signature declaration depends on
% any module/signature/dependency declaration which defines a
% module it imports.
% \item Every module declaration implicitly depends on every signature
% declaration in the component. (This means that it is illegal for a
% signature to import a locally defined module. In the absence of
% mutual recursion such signatures would not be implementable.)
% \item A dependency declaration $\textsf{dependency}~P$ depends
% on the signature declarations for every free module variable
% in $P$. For example, in the definition of component \cidl{r} in
% \cref{fig:linked-example},
% \textsf{signature}~\modname{A} must precede the dependencies
% on \cidl{p} and \cidl{q}, because both mention \hv{A}; but
% can come after the \cidl{base} dependency which does not
% depend on \modname{A}.
% % \item A signature $m$ depends on another signature if it
% % transitively imports it, either locally in this module, or
% % within any of the dependencies (since dependencies there
% % merge into our local dependency).
% % A signature $m$ which is merged from $\Mod{p[S, \subst{m'}{\hv{m}}, S']}{m'}$
% % depends on the signatures of any free module variables in $S$, where $S$
% % is the set of requirements that signature $m'$ in $p$ transitively depends on.
% % This is because the required interface for this
% % component is not well-defined before we know the types of $S$.
% % \Red{Forward reference to worked example here.}
% \end{itemize}
% %
% A dependency cycle indicates mutual recursion and is not allowed by GHC\@.
% (In principle, other compilers might be more permissive.)
\section{Typechecking a \unit{}}
\label{sec:overview-compiler}
% \begin{figure}
% \[ \DIGinterface{} \]
% \caption{The semantic objects of GHC Haskell.}
% % We say $\tilde{\tau}$ to refer to the $\UNs$ of a $\tau$. Export specifications $\UNs$ have the invariant that every $n$ is distinct; so we may say $\Un \mapsto \UN \in \UNs$ to match the specific $\UN$ with $\Un$.
% \label{fig:haskell-semantics}
% \end{figure}
\begin{figure}
\[
\cidl{base}
: \left\{
\modname{W}
: T_W^+
\right\},\,
\cidl{p}~:
\left\{
\modname{A}~:T_{PA}^-,\,
\modname{B}~:T_{PB}^-,\,
\modname{Y}~:T_Y^+
\right\},\,
\cidl{q}~:
\left\{
\modname{A}~:T_{QA}^-,\,
\modname{X}~:T_X^+
\right\},\,
\cidl{r}~:
\left\{
\modname{A}~:T_{RA}^-
\right\}
\]
\[
\begin{array}{rcl}
T_W &=& \begin{array}{l}
\UobjIface\: (\MOD{base}{}{W}.\texttt{I}, \MOD{base}{}{W}.\texttt{f} ) \\
\qquad\texttt{data I = MkI}\ \MOD{base}{}{W}.\texttt{I} \\
\qquad\texttt{f ::}~\MOD{base}{}{W}.\texttt{I}~\rightarrow~\MOD{base}{}{W}.\texttt{I}
\end{array} \\
T_{PA} &=& \begin{array}{l}
\UobjIface\: (\nhv{\texttt{A.J}}) \\
\qquad\texttt{data J}
\end{array} \\
T_{PB} &=& \begin{array}{l}
\UobjIface\: (\nhv{\texttt{B.K}}, \nhv{\texttt{B.g}}) \\
\qquad\texttt{data K} \\
\qquad\texttt{g ::}~\nhv{\texttt{B.K}}
\end{array} \\
T_Y &=& \begin{array}{l}
\UobjIface\: (\MOD{p}{\substHole{A}, \substHole{B}}{Y}.\texttt{L}) \\
\qquad\texttt{data L = MkL \nhv{\texttt{A.J}} \nhv{\texttt{B.K}}}
\end{array} \\
T_{QA} &=& \begin{array}{l}
\UobjIface\: (\nhv{\texttt{A.I}}) \\
\qquad\texttt{data I = MkI}\ \nhv{\texttt{A.I}}
\end{array} \\
T_X &=& \begin{array}{l}
\UobjIface\: (\nhv{\texttt{A.I}}, \MOD{q}{\substHole{A}}{X}.\texttt{K}) \\
\qquad\texttt{type K = \nhv{\texttt{A.I}}} \\
\qquad\texttt{g ::}~\nhv{\texttt{A.I}}
\end{array} \\
T_{RA} &=& \begin{array}{l}
\UobjIface\: (\MOD{base}{}{W}.\texttt{I}, \nhv{\texttt{A.J}}) \\
\qquad\texttt{data J }
\end{array} \\
\end{array}
\]
\caption{Example types of our running example (\cref{fig:linked-example}).
Note that the reexported \texttt{Z} from \texttt{r} is not shown in the type of \texttt{r}, as the \emph{type} of a component encompasses only the modules it defines, not those it reexports.}
\label{fig:typing-example}
\end{figure}
\begin{figure}
\center\includegraphics{figures/pqr-type-diagram.pdf}
\caption{Pictorial depiction of the types in our running example
(\cref{fig:linked-example}), with all original names replaced with
pointers to the declaration of the semantic object.
Lightened declarations (seen to the left of the holes for \texttt{A} and
\texttt{B} in \texttt{q} and \texttt{p}) specify requirements which
were subsequently filled by instantiation or signature merging.}
\label{fig:typing-example-diagram}
\end{figure}
After mixin linking has produced a \unit{} definition from a \ccomp{}
definition, the component can now be \emph{typechecked}, producing \emph{module types}
which record the type information
of all top-level declarations in a module.
GHC Haskell's module types resemble Haskell source code,
with two differences:
first, all declarations have been annotated with full type and kind
information; second, all identifiers have been renamed into \emph{original
names} which identify the exact module which originally defined the
entity. The types of our running example can be found
in \cref{fig:typing-example}; to see the full grammar for module
types skip ahead to \cref{fig:semantic-objects}.
In the rest of this section, we'll describe the interface and semantic
representations in more detail, and outline how each type of declaration
in a \unit{} can be typechecked.
\subsection{Typechecking a module}
A module is typechecked to produce a \emph{module
type}, consisting of an \emph{export
specification} (what gets brought into scope when
you import this module) and a set of \emph{defined entity specifications}
(the types and values defined by this module).
% Syntactically, these are represented with
% a notation reminiscent of Haskell module declarations:
% $\UobjTau{\UNs}{\overline{\Uty}}$.
For example, the module
\modname{W} in \cidl{base} (\cref{fig:linked-example}) type
checks to the following module type (see also \cref{fig:typing-example}):
\begin{figure}[H]
\center\includegraphics{figures/base-types.pdf}
\end{figure}
\noindent
In the diagram, we depicted the type of \verb|box| pictorially
by wiring up the enboxed references of \texttt{I} and \texttt{f} to their definitions.
The enboxed references \texttt{I} and \texttt{f} are
not unresolved Haskell-source level names,
but \emph{original names} $N ::=
M.n$, which serve as unique identifiers for any entities; in the
diagram, the original names have been abbreviated for concision. An
original name for an entity defined in a module is generated by joining
the identity of the module that defines it---$\MOD{base}{}{W}$ in this
case--- with the Haskell source level name (e.g., \texttt{I},
\texttt{f}).
If, when typechecking another module, we need to look up the kind of
$\MOD{base}{}{W}.\texttt{I}$, we look up the module type of
$\MOD{base}{}{W}$ from the context and then find the defined entity
specification for \texttt{I} within the module type.
\subsection{Typechecking a signature}
Typechecking \cidl{q} (\cref{fig:linked-example}) from our running example requires
us to deal with a signature declaration. Here is a
representation of the final type of \cidl{q}:
\begin{figure}[H]
\center\includegraphics{figures/q-types.pdf}
\end{figure}
\noindent
In the diagram, the declarations from the signature are
drawn externally from \cidl{q}. We do not know what module
will be used to fill the requirement \texttt{A}, and
consequently, we don't know what original name will provide an
implementation of \texttt{I}, although we do have a declarative
specification which any implementation must satisfy. When we type check against the
signature, we use this required declaration ``as if'' it were
a declaration of its own.
In mixin linking, we bound a module variable \hv{A} to represent the
unknown module identifier. In typechecking, we bind a \emph{name
variable} $\nhv{m.n}$ to represent the unknown entity $n$ required by
requirement $m$.
You may be wondering why name variables are not simply defined as
$\hv{m}.n$. Read literally, names of this form demand that $n$ be defined
by $\hv{A}$ itself, ruling out the possibility that it \emph{reexports}
$n$ from another module.
In the diagram above, the results of typechecking the module \texttt{X}
are also shown. As before, entities that are defined in the module
are given original names combining the module identifier (\MOD{q}{\subst{A}{\hv{A}}}{X})
and their source level name, while reexported entities retain the original
name of their source.
\subsection{Typechecking a dependency}
Now we can define the key operation for typechecking in \Backpack{}:
how to compute the type of \cidl{q} instantiated with an implementation
for its requirements. These instantiations are specified to the
compiler via \texttt{dependency} declarations.
Let's first consider a simple one: filling \cidl{q}'s
requirement with \modname{W} from \cidl{base}, i.e., $\UsynDepEmpty{\cidl{q}[\subst{A}{\Mod{\cidl{base}[]}{W}}]}$.
\begin{figure}[H]
\center\includegraphics{figures/base-q-types.pdf}
\end{figure}
\noindent There are two key steps. First, we take the
required export specification from \cidl{q}, and the provided export
specification from \cidl{base}, and we perform \emph{export matching},
matching every required entity with an exported entity with the same
name from the implementing module. In our example, we learn that
\nhv{A.I} is instantiated with \MOD{base}{}{W}.\texttt{I}; in the graph
representation, all previous references to \nhv{A.I} are rewritten so
that they now point to the declaration for \MOD{base}{}{W}.\texttt{I}.
Additionally, we must apply the module substitution to the original names of all
entities defined in the instantiated component: e.g., we apply the
substitution $\subst{A}{\Mod{\cidl{base}[]}{W}}$ to the original name
\MOD{q}{\subst{A}{\hv{A}}}{X}.\texttt{K}.
When we do this, we must check if this instantiation is
well-typed via \emph{signature matching}: do the
declarations provided by \cidl{base} match the required declarations of
\cidl{q}? This proceeds by comparing the declarations of each required
entity with the declaration of the corresponding provided entity.
In our example, the implementation of \texttt{I} is an algebraic data
type which matches the required declaration exactly; the implementation
would also have successfully matched an abstract data type \texttt{data I}.
It is important that export matching happen before
signature matching, as the process of export matching may reveal more
type equalities. This can be seen in the instantiation of \cidl{p} in
our running example:
\begin{figure}[H]
\center\includegraphics{figures/instantiation-with-synonyms.pdf}
\end{figure}
\noindent
In this example, it is not syntactically obvious that \verb|g :: I|
and \verb|g :: K| are matching declarations. \verb|I|
and \verb|K| are definitionally equal because the abstract data type \verb|K|
was implemented by the type synonym \verb|type K = I|:
\begin{figure}[H]
\center\includegraphics{figures/instantiation-with-synonyms-reduced.pdf}
\end{figure}
\noindent
By wiring up everything beforehand, we ensure that signature matching
sees the correct equalities.
% This does mean that we may construct some
% types in the graph representation that are ill-kinded before discovering
% that signature matching fails!
\subsection{Signature merging}
\label{sec:signature-merging}
The distinguishing characteristic of mixin
linking is not just the linking that occurs when a module is brought into
scope with the same name as a signature, but also the \emph{signature merging}
that occurs when two signatures are brought into scope under the same name.
Like dependency instantiation, we must take the required exports of each
signature being merged and merge them with the identically
named entities from other signatures.
% Unlike dependency instantiation,
% this matching process is \emph{bidirectional}, as can be seen in the
The subtlety of signature merging is determining whether or not the declarations
of a type in two signatures are compatible with one another. Consider
the following example (which is adapted from the MixML paper~\cite{rossberg+:mixml}):
\begin{tabular}{p{0.45\textwidth} p{0.45\textwidth}}
\begin{lstlisting}
signature A where
type T = Int
data U
f :: Int -> U
\end{lstlisting}
&
\begin{lstlisting}
signature A where
data T
type U = Bool
f :: T -> Bool
\end{lstlisting}
\end{tabular}
\noindent
It is not obvious that \verb|Int -> U| has the same type as \verb|T -> Bool|,
unless we refine \verb|U| to \verb|Bool| and \verb|T| to \verb|Int| in
both signatures.
\begin{figure}[H]
\center\includegraphics{figures/signature-merging.pdf}
\end{figure}
\noindent
To merge these signatures,
we first take the required export specifications from each signature and
match entities which have the same names.
For every entity which occurs in multiple signatures,
we pick out the most-defined
declaration from each signature. In this example, the choice is easy:
\verb|type T = Int| is more defined than the abstract \verb|data T|,
so we pick it for the merged signature.
If there is not a unique choice, we pick an arbitrary one; we will
subsequently check that all of the entities are equivalent.\footnote{You might
wonder why we cannot designate a specific entity to use. Because
signature merging is unordered, the only way to treat signatures equally
would be to define some merging operation on them. However, this is
technically challenging to do, as we do not yet know what the refined
types of the entities are. Picking an arbitrary entity cuts the
gordian knot, while only introducing nondeterminism in a benign way.}
Then, we check if each of the original declarations are compatible with
the merged declaration. The important twist is that whenever we
see a name in one of the original declarations, we must resolve it
to its definition in the \emph{merged} signature: in the diagram above,
this is depicted by pointing the enboxed references to the central
declarations. Now we can see that both instances of \verb|f| have a type that
is definitionally equal to \verb|Int -> U|.
% \paragraph{Signature merging}
% There is one final detail: how to merge requirements from
% the dependencies of a component. If
% we know the required module types of all our dependencies, the merge is
% straightforward. For example, here are the module types to be merged
% in \cidl{r} for the signature \modname{A}:
% \[
% \begin{array}{lll}
% &\begin{array}{l}
% \UobjIface\: (\MOD{base}{}{W}.\texttt{I}) \\
% \end{array} & \text{Local signature} \\
% \oplus&
% \begin{array}{l}
% \UobjIface\: (\nhv{\texttt{A.J}}) \\
% \quad\texttt{data J :: *}
% \end{array} & \text{From \cidl{p}} \\
% \oplus&
% \begin{array}{l}
% \UobjIface\: (\nhv{\texttt{A.I}}) \\
% \quad\texttt{data I :: * = MkI}\ \nhv{\texttt{A.I}}
% \end{array} & \text{From \cidl{q}} \\
% =&
% \begin{array}{l}
% \UobjIface\: (\MOD{base}{}{W}.\texttt{I}, \nhv{\texttt{A.J}}) \\
% \quad\texttt{data J :: *}
% \end{array} & \text{Final requirement for \cidl{r}}
% \end{array}
% \]
% %
% The export lists are unioned together, and we discover that \nhv{A.I}
% actually must implemented with $\MOD{base}{}{W}.\texttt{I}$. And since
% the entity \texttt{I} is already implemented, we additionally
% drop its specification from the resulting requirement (having checked
% that its specification from \MOD{base}{}{W} matches that from \cidl{q}).
% Unfortunately, this definition of signature merging is circular: to
% compute the instantiated type of a component, we need to know the (full)
% types of our requirements; but to compute the type of a merged
% requirement, we need to compute the instantiated types of the components
% which merge into the requirement. \Scott{I'm not sure what this means. Can
% you identify how the problem manifests in the example so far?}
% The solution to this problem is to
% define \emph{partial instantiation}, which only instantiates a component
% type as much as is necessary to determine the type of the requirement in
% question. We give the technical details in \cref{sec:compiler}.
% \paragraph{Compiling instantiated components}
% \Red{Need to rewrite this section, motivate why the semantics split
% into separate type checking and compiler.}
% When a \unit{} has no
% requirements, it can be compiled. In this case, compilation is very
% trivial: the dependencies of \unit{}s can be successively instantiated
% by Cabal to form a graph of instantiated components, which GHC can
% simply compile in order. In this way, the soundness of Haskell with \Backpack{}
% trivially reduces to the soundness of Haskell without \Backpack{}.
% While compilation of \Backpack{} is trivially sound, we would like to
% relate type checking and compilation with a property that says that if a
% unit typechecks compositionally, it also typechecks monolithically
% (i.e., when you re-typecheck the entire transitive closure.)
% Unfortunately, in full Haskell it is impossible for this to be the case
% with our current signature language and semantics. In Section \Red{XXX}
% we give some examples and conjecture under what circumstances this
% property does hold.
\section{Instantiation and compilation}
\label{sec:overview-instantiate}
Typechecking is all very well and good, but what about
running our programs? For both soundness reasons (\cref{sec:limitations})
and performance reasons, it is desirable to defer compiling
until we know how all the requirements of a component are to be
filled.
When we have a \unit{} which has no unfilled requirements,
the package manager computes the
\emph{instantiated component graph}, reflecting all of the
concrete code dependencies of the component. Specifically, given a closed instantiated component identity $p[S]$, for every
\texttt{dependency} $P$ of $p$'s mixed component, recursively
instantiate $\substw{P}{S}$ ($P$ after applying the $S$
substitution). Furthermore, for every $\substMod{m}{P}{m'} \in S$,
recursively instantiate $P$. Then, we compile each instantiated component
in topological order. This compilation is exactly like ordinary
compilation without \Backpack. The only new feature is that signatures
compile into trivial modules which reexport entities from
the backing implementation; otherwise, modules that imported the signature might see
entities from the backing implementation that were not exported by the
signature.
It is important that the package manager computes
the instantiated component graph: \emph{applicative}
semantics of instantiation mean that completely independent
components could instantiate a library in the same way: in such
cases we should share the compiled code. A traditional compiler cannot
do this: it is solely responsible for transforming source
files into object code.
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "paper"
%%% End: