intro.tex

%!TEX root = paper.tex
\chapter{Introduction}
\label{sec:intro}

% Here is a problem
% It's an interesting problem
% It's an unsolved problem
% Here is my idea

% Describe the problem
% State your contributions


%   Here are the parameters of the problem:

%   \begin{itemize}
%       \item We want to support separate modular development in Haskell,
%       for all the reasons we've stated above (ability to abstract over
%       implementation, giving you the ability to swap out components.
%       Extreme implementation changes.)

%       \item What do we want to abstract over? We want to abstract
%       over packages.  This is because packages are the way large
%       scale software is distributed.

%       \item But there is an abstraction barrier between the compiler
%       and the package system.  How to design a system that respects
%       this barrier.  (Why is this hard?)
%   \end{itemize}


% The case for separate modular development
% The case for separate modular development at the package level

% The basic idea


% Large software is made of packages.
% Packages are not modular.
% Idea of separate modular development.
% SMD is not done at package level

The essence of modular development is to decompose a program into
collection of smaller modules separated by abstraction barriers: the
defining characteristic of a modular program is the ability of a module to
operate without knowing the implementation details of another.  Modular
programs are easier to understand; opportunities to reuse modular
components are everywhere:

\begin{itemize}
    \item Suppose you are writing an application which makes a request to
    some external backend service---perhaps a database or a credit card
    processor.  You might want to modularize your
    application so that you can \emph{swap} out the real implementation
    of a service with a fake, so that you can test your application without
    charging your customers.

    \item Suppose you are writing a library that handles strings in some
    way---perhaps a parser or a string processing library.  Usually, your
    library doesn't require a specific representation; in fact, you would
    like your library to be \emph{parametric} over all the string representations
    your clients might use (whether it is linked lists of
    characters, packed bytestrings, packed UTF-16 strings, packed UTF-8
    strings\ldots).

    \item Suppose you have two implementations of an API\@: a simple but
    inefficient reference implementation and a complicated but
    fast production implementation.  You might like to \emph{instantiate}
    an application with both implementations and then run them in
    parallel, comparing their outputs to verify the correctness of the
    production implementation.
\end{itemize}
%
Typically, programming languages provide support for both \emph{incremental
modular development}, where a module can be developed only when implementations
of its dependencies are available, and \emph{separate modular development},
where a module can be developed with respect to an \emph{interface}
and instantiated later.  Haskell's existing module system is a mechanism
for incremental modular development---it is purely a namespace management
mechanism with a proper concept of interface---while Haskell's type classes are a
mechanism for separate modular development.  Separate modular development
offers more abstraction, as the interface characterizes
what is needed from a dependency.

Unfortunately, type classes are ill-suited for certain cases of
separate modular development:

\begin{itemize}
    \item From a code perspective, type class parametric code is often
    harder to use than monomorphic code.  For an inexperienced
    Haskeller, the proliferation of constraints and type parameters in the
    type signatures of functions can make an otherwise straightforward
    API impenetrable:\footnote{\url{https://hackage.haskell.org/package/regex-posix-0.95.2/docs/Text-Regex-Posix-Wrap.html}}

\begin{lstlisting}
    (=~) :: (RegexMaker Regex CompOption ExecOption source,
             RegexContext Regex source1 target)
         => source1 -> source -> target -- from regex-posix-0.95.2
\end{lstlisting}

    Furthermore, type classes work best when exactly a single type
    parameter is involved in instance resolution. If there aren't any type
    parameters, you must introduce a proxy type to drive instance resolution; if
    there are multiple parameters~\cite{lfp92}, you often need to resolve ambiguity
    by introducing functional
    dependencies~\cite{Jones:2000:TCF:645394.651909} or replacing
    parameters with associated types~\cite{towards-open-type-functions-haskell}.

    \item Type classes work best when it is clear what methods they
    should support.  However, for many interfaces, it is not obvious
    what set of methods should be put into an interface; for example,
    there are many functions which an interface for strings might
    support---which ones go in the type class?  It is inconvenient
    to define a new type class (and new instances) just to add a
    few more methods, and thus this is rarely done in practice.

    \item From a performance perspective, code which uses a type class
    is compiled without knowledge of how the type class's
    methods are implemented.  This can be quite costly in a language
    like Haskell, where inlining definitions is essential to achieving
    good performance.~\cite{PeytonJones:2002:SGH:968417.968422}  This problem can
    be mitigated by specializing code to a particular type, but if this
    specialization occurs at use-site, the compiler ends up repeatedly reoptimizing
    the same function, blowing up the code size of
    a program. (C++ templates suffer from similar problems.\footnote{\url{https://gcc.gnu.org/onlinedocs/gcc/Template-Instantiation.html}})
\end{itemize}
%
In short, Haskell would be well served by an extension to its module
system to support the cases of separate modular development which are
currently poorly served by type classes.

The Backpack package system by Scott Kilpatrick et al.~\cite{backpack}
(hereafter called \OldBackpack{}) broke new ground, arguing that \emph{mixin
packages} could be a good fit for providing modularity in Haskell.
The two words ``mixin'' and ``package'' capture the key properties of \OldBackpack{}:

\begin{itemize}
    \item \emph{Mixins}. Mixin modules~\cite{bracha+:modularity,ancona+:cms,flatt+:units,duggan:mixin} are characterized by components which both \emph{provide} and \emph{require}
    modules:  these components can be ``mixed'' together, linking
    together provisions and requirements which have
    the same name.  Unlike the more conventional approach of
    \emph{parametrized} modules (ala ML~\cite{milner+:def-of-sml-revised}),
    mixin modules natively support recursive linking.
    Furthermore, mixins help reduce
    the preponderance of \emph{sharing constraints} that occur with ML
    functors, since requirements can be mixed together by name.
    (For more discussion on this, see Appendix~\ref{sec:mixins-reduce-sharing-constraints})

    \item \emph{Packages}. Rather than introduce a new module language
    to the compiler, \OldBackpack{} proposed that mixins be added to
    Haskell's package system.  There are multiple benefits to this choice.
    First, it allows us to retrofit Haskell to support separate modular
    development without requiring ``yet another type system extension.''
    Second, it means that Haskell code can be made more
    modular without having to adopt the ``fully-functorized'' style~\cite{structabshotlang}
    commonly associated with ML functors.
    Finally, it brings modularity to the place where it matters most:
    software development in the large;  after all, \emph{packages} are how
    software systems are organized at the largest scale.
\end{itemize}
%
There was just one problem: despite its emphasis on being a practical
design, \OldBackpack{} could not be implemented in a real world compiler
like GHC\@.  Why not? The
semantics of \OldBackpack{}, especially its package-level semantics,
were closely entwined with the semantics of Haskell itself,  violating
the traditional abstraction barrier between the compiler and package
manager.  Without tightly coupling GHC (the Haskell compiler) and Cabal
(the Haskell package manager), there was no way to directly implement
\OldBackpack{}.

In this thesis, we show how to refactor
\OldBackpack{} to respect the abstraction barrier between the package
manager and the compiler, giving us a new system: \Backpack{}.  In
particular, we divide \Backpack{} into two parts: \emph{mixin linking}
which is indifferent to Haskell source code, and \emph{typechecking against
interfaces}, which is purely the concern of the compiler.

Specifically, the contributions of this thesis are as follows:
\begin{itemize}

    \item We describe a package language which can be \emph{mixin
    linked} without knowing anything about the Haskell programming
    language. (\cref{sec:mix-in})  This gives us a language agnostic mechanism for
    expressing separate modular development, in contrast to
    \OldBackpack{}, whose semantics critically relies on the import structure
    of Haskell modules.  As a result, \Backpack{}
    employs a far simpler mixin linking procedure essentially equivalent
    to early descriptions of mixin linking in the literature.~\cite{cardelli:linksets}
    In doing so, we adjust \OldBackpack{}'s primary technical device,
    the \emph{module identity} into a new concept, the \emph{unit identity} (\cref{sec:uid}),
    which can be computed by language agnostic mixin linking.
    Our new package language is order-independent, which is
    essential for backwards-compatibility with Haskell's existing
    package format.

    \item To express the results of mixin linking, we introduce the
    \emph{\unit{} language}, an intermediate representation that
    represents all instantiations explicitly.  This language is reminiscent of
    ML-style applicative functors, but with a twist:
    unfilled requirements of dependency are propagated to the requirements
    of the enclosing package via the process of \emph{signature merging}.
    The effect is that \Backpack{}, as a whole, has similar
    semantics to the ones that were pioneered by \OldBackpack{}.

    \item We describe how to typecheck the \emph{\unit{} language} in the
    real world (\cref{sec:compiler}), typechecking the source into semantic objects that
    faithfully reflect the full glory of GHC Haskell's type system.
    We give an unsound but pragmatic mechanism for supporting \emph{type classes}
    (it is unsolvable in the presence of open type families), support
    the use of \emph{type synonyms} to implement abstract data
    (not supported by \OldBackpack{}), and describe
    some tricky interactions with GHC's type inference algorithm and
    roles annotations.

    \item This is not a paper design: Backpack has been fully
    implemented GHC 8.2 and cabal-install 2.0.
    We give a tour of the modifications we made to GHC to
    implement the necessary instantiation and merging operations
    required by the \unit{} language, as well as the architectural
    changes that were necessary in Cabal (the package manager)
    to orchestrate the typechecking and building of the mixin
    packages. (\cref{sec:implementation})

    \item Does \Backpack{} work?
    We have done a number of case studies to give evidence
    that \Backpack{} works, including conversions of complex,
    real-world packages to use \Backpack{}. (\cref{sec:evaluation})

%   Among other things, these case studies
%   have motivated the introduction of \emph{signature thinning}
%   and how Backpack interoperates with Cabal's existing dependency
%   solver.

\end{itemize}
%
%   In this thesis, we do not address
%   mutual recursion between packages, allowing us to avoid some orthogonal
%   technical complications. We believe that in practice this is a minor
%   limitation.
Although the idea of separating the core language and the module/package
language is not novel~\cite{leroy:modular,milner+:def-of-sml-revised,rossberg+:f-ing}, we are the first to exploit these ideas in service
of an actual implementation, retrofitting strong modularity to an existing
programming language.  The true test of \Backpack{} will be whether or not
the larger Haskell community adopts it, which can only be seen in the
coming years, but we have enough experience working with \Backpack{}
that we believe that this design is practical and solves many problems
that Haskell programmers face today.


%%% Local Variables:
%%% mode: latex
%%% TeX-master: "paper"
%%% End: