compiler.tex

%!TEX root = paper.tex
\chapter{Type checking}
\label{sec:compiler}

We now give a formal model of the type checking process on mixed
units.  This is challenging, as GHC Haskell's type system is quite
complex; a full formalization of all aspects of GHC Haskell is
beyond the scope of this thesis.  Still, we must capture enough
of GHC Haskell's type system to demonstrate how \Backpack{}
interacts with features like type classes and type synonyms.

%   It's not possible to
%   give a full semantics, since that would involve formalizing all of
%   Haskell as implemented by GHC, but we will informally explain the
%   operation of the judgments we assume are given to us.

\section{Semantic objects}
\label{sec:semantic-objects}

\begin{figure}
\[ \DIGinterface{} \]
\caption{Semantic objects of GHC Haskell with \Backpack{}}
\label{fig:semantic-objects}
\end{figure}

The semantic objects of GHC Haskell, extended with \Backpack{}, are
given in \cref{fig:semantic-objects}.  These semantic objects
correspond closely to the top-level declarations supported by
Haskell's syntax, but with a few key differences: first, every
identifier is resolved to an \emph{original name} ($N$) which identifies
the exact module that holds the declaration describing this identifier;
second, all declarations are explicitly annotated with types ($\tau$), kinds ($\kappa$) and
roles ($\rho$).\footnote{Type families are not annotated with roles, because their
type parameters are always at nominal role.}  If the kinds or roles are not
relevant to a judgment, we may elide them.

The most important semantic object is the module type ($T$; also referred
to as an \emph{interface} in GHC).  A module
type consists of four components:

\begin{enumerate}

    \item The export list ($\UNs ::= \overline{N}$), which
    specifies what entities are brought into scope when a module is
    imported.  Every original name in an export list has a unique
    occurrence name (e.g., $\UNs(n) = N$).

    \item The type declarations ($\Utys$), which give
    definitions for each entity provided by a module.  Every defined
    entity list defines a particular entity $n$ (shaded in gray
    in \cref{fig:semantic-objects}).  Like export lists,
    each declaration in a module has a unique occurrence name
    (e.g., $\Utys(n) = \Uty$).

    \item A set of instances ($\Uinsts$), which specifies the class
    and family instances defined in this module.

    \item The set of modules transitively imported by this module
    ($\Uimps$), which controls the set of \emph{orphan instances}
    (instances which may not necessarily be in scope) which are in scope
    when performing type class resolution.  If the module in question
    defines orphan instances, its identity is included in this list.

\end{enumerate}
%
Below is a simple example of a module's Haskell source code and its
corresponding module type:

\vspace{-1em}
\begin{figure}[H]
\centering
\begin{shortmath}
\begin{tabular}{p{0.30\textwidth} p{0.30\textwidth}}
\begin{lstlisting}
module A where
  data T = MkT
  f = MkT
  instance Eq T where
    MkT == MkT = True
\end{lstlisting}
&
\vspace{-12pt}
\[
\begin{array}{l}
    \UobjIface\: (\Mod{P_0}{A}.\texttt{T}, \Mod{P_0}{A}.\texttt{f}) \\
    \qquad\texttt{data T where MkT} :: \Mod{P_0}{A}.\texttt{T} \\
    \qquad\texttt{f} :: \Mod{P_0}{A}.\texttt{T} \\
    \qquad\texttt{instance} :: N_{Eq}~\Mod{P_0}{A}.\texttt{T} \\
\end{array}
\]
\end{tabular}
\end{shortmath}
\end{figure}

\vspace{-2em}
\noindent
Here, $P_0$ represents the unit identifier that this module was
being typechecked as a part of (for example, if \verb|module A| was
in the component \verb|p| with no holes, then $P_0 = \uidl{p}{}$),
and $N_\texttt{Eq}$ is the original name of the equality type
class (not shown, but part of Haskell's Prelude, a set of declarations
which are always in scope).  Even though the original module left its
exports implicit, the module type explicitly specifies each export.

The module type of a signature is quite similar, although there are few differences:

\vspace{-1em}
\begin{figure}[H]
\centering
\begin{shortmath}
\begin{tabular}{p{0.30\textwidth} p{0.30\textwidth}}
\begin{lstlisting}
signature A where
    data T
    f :: T
    instance Eq T
\end{lstlisting}
&
\[
\begin{array}{l}
    \UobjIface\: (\nhv{A.T}, \nhv{A.f}) \\
    \qquad\texttt{data T} \\
    \qquad\texttt{f} :: \nhv{A.T} \\
    \qquad\texttt{instance} :: N_{Eq}~\nhv{A.T}
\end{array}
\]
\end{tabular}
\end{shortmath}
\end{figure}

\vspace{-2em}
\noindent
Instead of original names based off of the ambient unit identifier $P_0$,
every defined entity is allocated a fresh name hole (e.g., $\nhv{A.T}$),
which may be subsequently substituted with the actual original name of the
implementing declaration.

A collection of module types forms a component ($\Xi$), where each module
type is annotated with a $+$ or $-$ to indicate if it is a module or
signature.   Although the syntax of component types suggests that a module
type can be put into the context at any name, a particular module
identity is baked into the module type (as can seen in the example
above); a module type must be placed in the context consistently with
the module identifier it was typechecked with.  (We could have avoided
this constraint by using name holes for all self-references and
applying a ``selfification'' step prior to adding modules to the
context, but the current presentation is more direct and accurately
describes how GHC is implemented.)

\section{Typing rules}

We organize our formalization of \Backpack{} typing into a few
concerns:

\begin{itemize}
    \item The assumed Haskell judgments (\cref{typing:haskell}) are the
        Haskell type-checking judgments we assume are available, but which
        we do not formalize.

    \item The top-level typing rules (\cref{typing:main}) tell us
        how to typecheck the declarations of a component and form the final
        component type.

    \item The type lookup and renaming rules (\cref{typing:lookup}) tell us how to
        get the type of a module by looking up the original type from
        the context, and then \emph{renaming} it according to the substitution
        recorded in the module identifier.

    \item The subtyping rules (Figures~\ref{typing:top-subtyping}--\ref{typing:subtyping})
        specify when one module type is a subtype of another, and is used
        by both signature merging and dependency matching.

    \item The merging rules (\cref{typing:merging}) specify how the
        module types are merged together during signature merging.

\end{itemize}
Most typing rules require the following three pieces of context:

\begin{itemize}
    \item The external component type context $\Gamma ::= \overline{p : \Xi}$,
        which records the types of all components we have previously
        typechecked.  These components are typed but not instantiated.
        For example, the context
        $\Gamma = \mathtt{base} : \Xi_b, \mathtt{containers} : \Xi_c$
        would be one where the components \verb|base| and \verb|containers| (with
        component types $\Xi_b$ and $\Xi_c$ respectively) were in the context.
        A component with holes, e.g., \verb|regex-indef| (\cref{sec:functorizing-the-matcher}),
        would occur only \emph{once} in this context (even if we have a dependency
        on an instantiated version of it.)

    \item The local type context $\Delta :: \overline{m : T^s}$,
        which records the types of the modules and signatures we have typechecked
        from the current package.  The polarity $s$ distinguishes between a
        module and a signature.  For example, if we typechecked a component
        consisting of a signature \verb|Str| and a module \verb|Regex|,
        the final local type context would be $\Delta = \texttt{Str} : T_S^-, \texttt{Regex} : T_R^+$,
        where $T_S$ and $T_R$ are the types of the signature and module (respectively).

    \item The current unit identifier $P_0$, which identifies the component
        we are typechecking. It is used to generate the original names
        of declarations in modules and determine when a lookup should be
        done in the local type environment.  For a component with no holes,
        e.g., \verb|str-string|, this unit identifier will have an
        empty substitution, e.g.,
        $\uidl{str-string}{}$; otherwise, each of the holes is
        uninstantiated, e.g., $\uidl{regex-indef}{\subst{Str}{\hv{Str}}}$.
\end{itemize}
Some rules require some extra context:

\begin{itemize}
    \item The logical context $\mathcal{L} ::= \overline{m \mapsto M}$ specifies
        which module identity is brought into scope when a module name is imported.
        It is needed when typechecking unrenamed Haskell source code, but is
        otherwise not needed for typechecking.  For example, if the logical context
        is $\mathcal{L} = \texttt{Data.Map} \mapsto \MOD{containers}{}{Data.Map}$,
        it means that if you write \verb|import Data.Map| in your Haskell source code,
        the exports of $\MOD{containers}{}{Data.Map}$ are brought into scope.

    \item The shape context $\shctx ::= \overline{p : \lctx}$ specifies what
        module identities are provided by a package; this is used to resolve
        a list of \texttt{dependency} declarations into the logical context $\mathcal{L}$.
        For example, given the shape context $\shctx = \cidl{containers} : \{ \texttt{Data.Map} \mapsto \MOD{containers}{}{Data.Map} \}$, adding a dependency on \verb|containers|
        would result in \verb|Data.Map| being brought into scope for import
        (in the logical context.)
\end{itemize}

\subsection{Assumed Haskell judgments}
\label{sec:typing-haskell}

\input{typing/haskell}

We assume some judgments for various operations on
Haskell, which we do not formalize.

\paragraph{Module and signature typechecking}
Module and signature typechecking are judgments which typecheck module
(\I{hsmod}) and signature (\I{hssig}) source into module types.
Alongside the usual context, these judgments also require the logical
context $\mathcal{L}$ so that they can resolve imports, and the name $m$
of the module they are typechecking (along with $P_0$, this will
determine what original names of the declarations in the module and
signatures will be: $\Mod{P_0}{m}.n$ in the case of a module, and
$\nhv{m.n}$ in the case of a signature.)

\emph{Example.} Recall the following source module (left) and its module type (right).

\vspace{-1em}
\begin{figure}[H]
\centering
\begin{shortmath}
\begin{tabular}{p{0.30\textwidth} p{0.30\textwidth}}
\begin{lstlisting}
module A where
  data T = MkT
  f = MkT
  instance Eq T where
    MkT == MkT = True
\end{lstlisting}
&
\vspace{-12pt}
\[
\begin{array}{l}
    \UobjIface\: (\Mod{P_0}{A}.\texttt{T}, \Mod{P_0}{A}.\texttt{f}) \\
    \qquad\texttt{data T where MkT} :: \Mod{P_0}{A}.\texttt{T} \\
    \qquad\texttt{f} :: \Mod{P_0}{A}.\texttt{T} \\
    \qquad\texttt{instance} :: N_{Eq}~\Mod{P_0}{A}.\texttt{T} \\
\end{array}
\]
\end{tabular}
\end{shortmath}
\end{figure}

\vspace{-1em}
\noindent
To illustrate all of the contextual information needed to typecheck
this source module, we give one example context which would successfully
typecheck this module:

\begin{itemize}
    \item $\Gamma = \cidl{base} : \{ \modname{Prelude} : T_P^+, \modname{GHC.Classes} : T_C^+ \}$.
        Although there are no imports, this source module implicitly makes
        use of types from the Haskell Prelude (a module which is implicitly
        imported); the module type of this module and any it transitively
        imports are recorded in the context.
        $T_P$ may contain many exports, but for this particular module
        it must export \verb|Eq| and \verb|True| (identifiers which are used
        inside \verb|A|).  Let us suppose
        $\mathsf{exports}(T_P)(\mathtt{Eq}) = \MOD{base}{}{GHC.Classes}.\mathtt{Eq}$;
        this means the declaration of the type class \verb|Eq|
        is in $T_C$: $\mathsf{decls}(T_C)(\mathtt{Eq}) =
        \mathtt{class}~\mathtt{Eq}~(a ::
        \star)~\mathtt{where}~\I{clinfo}$ (we leave $\I{clinfo}$
        unspecified, but it would contain information about the methods and
        default implementations of the \verb|Eq| type class.)  This declaration
        would be used when ensuring that the \verb|Eq T| instance in \verb|A|
        is well-typed and implements all necessary methods.
    \item $\Delta$ is empty; \verb|A| has no local imports.
    \item $P_0$ as itself.  Module typechecking is indifferent to the choice
    of current unit identifier, so in the example above we've kept it a
    metavariable.
    \item $\mathcal{L} = \modname{Prelude} \mapsto \MOD{base}{}{Prelude},
    \modname{GHC.Classes} \mapsto \MOD{base}{}{GHC.Classes}$.  Given a
    dependency on \verb|base|, this means that all the modules that are
    publically exported (this includes the internal-looking \verb|GHC.Classes|)
    are in scope for import.  The implicit \modname{Prelude} import will
    be resolved using this context.
    \item $m = \modname{A}$.  The current module that is being typechecked
    is \verb|A|; in the output module type, we can see that this module
    name is used to form the original names of entities declared in this
    module.
\end{itemize}

\paragraph{Type and kind equality}
We need judgments for testing type and kind equality.
These require a context, because type synonyms and type families
can introduce nontrivial definitional equalities between types that
are syntactically dissimilar.

\emph{Example.}  Obviously, type equality is reflexive.  It's an easy
matter to determine equality without consulting the context
(here, $M_T$ is the module identity of the module which originally
defines \verb|Int|):

\[ \ctx \vdash M_T.\mathtt{Int} =_\textsf{hs} M_T.\mathtt{Int} \]

To give an example of a nontrivial type equality, suppose that we have \verb|type A = Int|, e.g.,
$\mathsf{decls}(T_M)(\mathtt{A}) = \mathtt{type}~\mathtt{A} = M_T.\mathtt{Int}$,
where $T_M$ is the module type for $\Mod{P_0}{M}$.  Then the following type equality
also holds:

\[ \Gamma; \modname{M} : T_M, \Delta; P_0 \vdash \Mod{P_0}{M}.\mathtt{A} =_\textsf{hs} M_T.\mathtt{Int} \]

\paragraph{Instance resolution} Finally, we need a judgment for testing
if a type class instance is derivable from the instances available in
the context, plus some set of orphan instances (indicated by \I{orphs}).
Informally, this judgment collects all instances defined in \I{orphs},
as well as the modules which define all of the
types mentioned in \I{inst}, and then runs the instance constraint
solver to see if \I{inst} is implied by these instances.

\emph{Example.}  Suppose that we would like to show that there is an
instance of \verb|Show| for \verb|Maybe Int|, and there are no orphan
instances in scope.  Then we would have the following
judgment:

\[
\ctx \vdash \cdot~\textsf{solves}~\mathtt{instance} :: M_C.\mathtt{Eq}~(M_B.\mathtt{Maybe}~M_T.\mathtt{Int})
\]

where $M_C$, $M_B$ and $M_T$ are the modules which define \verb|Eq|,
\verb|Maybe| and \verb|Int|, respectively.  One situation we would
expect this judgment to hold is if $M_B$ has the instance
\verb|Eq a => Eq (Maybe a)| and $M_T$ has the instance \verb|Eq Int| (none of these
instances are orphans, since they are defined in the same module which
defines a type in the instance itself).  In more formal terms, the
judgment would hold if:
\[
\mathtt{instance}~::~\forall a.\, M_C.\mathtt{Eq}~a \Rightarrow
M_C.\mathtt{Eq}~(M_B.\mathtt{Maybe}~a) \in \mathsf{insts}(T_B)
\]
and
\[
\mathtt{instance}~::~
M_C.\mathtt{Eq}~M_T.\mathtt{Int} \in \mathsf{insts}(T_T)
\]
where $T_B$ and $T_T$ are the module types of $M_B$ and $M_T$.

Orphan instances are relevant when some of the needed instances
in the context are orphans: that is, they are defined in a different
module than the classes/types it mentions.  For example, suppose
we moved \verb|Eq Int| from $M_T$ to some alternate module called
$M_X$.  Then the above judgment would not hold, but this
judgment (which includes $M_X$ as an orphan instance module in scope)
would:

\[
\ctx \vdash M_X~\textsf{solves}~\mathtt{instance} :: M_C.\mathtt{Eq}~(M_B.\mathtt{Maybe}~M_T.\mathtt{Int})
\]

\subsection{Top-level typing rules}
\label{sec:typing/main}

\input{typing/main}

Broadly speaking, the process of typechecking a component involves
bringing all of the modules from its dependencies into scope
($\mathcal{L} ::= \overline{m \mapsto M}$), and then type checking each
module and signature in this context.  \textsc{TComp} computes the set
of in-scope modules by looking up the provided modules from the
component shape context (\textsf{exposes}).  As we typecheck each module
(\textsc{TModule}) and signature (\textsc{TSignature}), \textsc{TSeq}
adds their module types to the home module context ($\Delta ::=
\overline{m : T^s}$), so that their types are available for subsequent
modules and signatures.

The most complicated rule is the rule for typechecking signatures
(\textsc{TSignature}).  Like \textsc{TModule}, we first defer to an assumed
judgment to typecheck the source \I{hssig}.  However, after the module
type for the \emph{local} signature is computed, we must \emph{merge} it
with all of the inherited signatures (computed by \textsf{inherits} and
recorded in $\mathcal{D} ::= \overline{m \mapsto \overline{M}}$).

\emph{Example.}  To make things more concrete, we'll work through
the typechecking process for package \verb|r| from \cref{fig:linked-example}.
For your convenience, we've reproduced the \unit{} below:

\[
    \begin{array}{l}
      \UsynUnitH{\cidl{r}}{\hv{\modname{A}}}
        \\
      \qquad \UsynDep{\icid{base}{}}{ \rename{W}{W} } \\
      %\qquad\qquad \textsf{with}~\Mod{\icid{\cidl{q}}{\substHole{A}}}{A},
      %           \Mod{\icid{\cidl{p}}{\substHole{A}, \subst{B}{\ldots}}}{A} \\
      \qquad \UsynDep{\icid{\cidl{q}}{\substHole{A}}}{ \rename{X}{B} } \\
      \qquad \UsynDep{\icid{\cidl{p}}{\substHole{A}, \substMod{B}{\icid{\cidl{q}}{\substHole{A}}}{X}}}{ \rename{Y}{Y} } \\
      \qquad \texttt{signature}~\modname{A}~(\texttt{I}(..))~\{~\texttt{import W(I(..))}~\} \\
    \end{array}
    \]

Type-checking begins in the top-level judgment:

\begin{enumerate}
    \item \emph{Compute the logical context} ($\mathcal{L} = \bigoplus_i \textsf{exposes}(\shctx, P_i, r_i) $).  There shouldn't be much surprise here: we take each of the dependencies and brings the appropriate modules into scope according to the shape context and the renamings on the dependencies.  In this case, given the following shape context:
    \[
      \begin{array}{rcl}
      \shctx &=& \cidl{base}{} \haspr \{\provMod{W}{\icid{base}{}}{W}\}, \\
             && \cidl{p} \haspr \forall \hv{\modname{A}}\, \hv{\modname{B}}.\,
                            \{
                            \provMod{Y}{%
                                \icid{\cidl{p}}{ \subst{A}{\holevar{A}}, \subst{B}{\holevar{B}}  }%
                            }{Y} \}, \\
            && \cidl{q} \haspr \forall \hv{\modname{A}}.\,
                            \{ \provMod{X}{\icid{\cidl{q}}{\subst{A}{\holevar{A}}}}{X} \}
                \\
      \end{array}
    \]
    We will compute that:
    \[
    \begin{array}{rcl}
    \mathcal{L} &=& \modname{W} \mapsto \Mod{\icid{base}{}}{W}, \\
                && \modname{B} \mapsto \Mod{\icid{\cidl{q}}{\substHole{A}}}{X}, \\
                && \modname{Y} \mapsto \Mod{\icid{\cidl{p}}{\substHole{A}, \substMod{B}{\icid{\cidl{q}}{\substHole{A}}}{X}}}{Y}
    \end{array}
    \]

    \item \emph{Compute the inherited requirements} ($\mathcal{D} =
    \bigoplus_i \textsf{inherits}(P_i)$).  The requirements of a package
    consist of any signatures they locally defined, as well as the set
    of requirements inherited from dependencies.  For example, the
    dependency on $\icid{\cidl{q}}{\substHole{A}}$ means we are
    obligated to merge in the requirement from
    $\Mod{\icid{\cidl{q}}{\substHole{A}}}{A}$, so that \verb|r|
    accurately reflects the requirements of all of its dependencies.
    It's worth noting that \textsf{inherits} is recursive: if a
    component is instantiated with a component that itself has a
    requirement, we still must inherit it!  For \verb|r|, we will compute that
    the signature \verb|A| will inherit requirements from \verb|q| and \verb|p|:

    \[
    \mathcal{D} = \modname{A} \mapsto (\Mod{\icid{\cidl{q}}{\substHole{A}}}{A}, \Mod{\icid{\cidl{p}}{\substHole{A}, \substMod{B}{\icid{\cidl{q}}{\substHole{A}}}{X}}}{A})
    \]

    \item \emph{Compute the current unit identifier.} ($P_0 = p[\overline{m_j=\hv{m_j}}]$)
    This is pretty straightforward: $P_0 = \icid{r}{\substHole{A}}$.

    \item \emph{Typecheck the declarations.} ($\Gamma; P_0; \mathcal{L}; \mathcal{D} \vdash \overline{\I{decl}} : \Xi$)  Ordinarily, this would involve successively typechecking each declaration
    and adding it to the local context $\Delta$; eventually returning $\Delta$ as the final
    component type. In this case, there is only one declaration, a signature.  After typechecking
    the signature, we take the types of each requirement in $\mathcal{D}(\modname{A})$
    ($\Mod{\icid{\cidl{q}}{\substHole{A}}}{A}, \Mod{\icid{\cidl{p}}{\substHole{A}, \substMod{B}{\icid{\cidl{q}}{\substHole{A}}}{X}}}{A}$) and merge these types all together.  We'll discuss
    the merging process in more detail later.

    \item \emph{Typecheck the dependencies.} ($\forall i.~ \Gamma; \Xi; P_0 \vdash P_i ~\textsf{well-typed}$)  Finally, we have to recursively check that each of the dependency is instantiated in a
    well-typed fashion (since mix-in linking may have instantiated a
    component in an ill-typed way).  In the case of \verb|r|, we have to
    check that $\icid{\cidl{q}}{\substHole{A}}$ fills \verb|B| of
    \verb|p|, and $\hv{A}$ fills \verb|A| of \verb|p| and \verb|q| (the
    type of $\hv{A}$ is the result of signature merging, which motivates
    why dependency typechecking must be delayed to the very end).
    Well-typed instantiations are checked by testing if the type of
    the instantiating module (e.g., $\hv{A}$) is a subtype of
    the required module (e.g., \Mod{\icid{\cidl{q}}{\substHole{A}}}{A}).
    We'll describe \Backpack{}'s subtyping relation in more detail later.
\end{enumerate}

%   \[
%   \frac{
%   \begin{array}{c}
%   %\shctx \vdash \overline{\I{dep_i}} \leadsto (\mathcal{L}, \mathcal{D}) \qquad
%   \mathcal{L} = \textsf{exposes}(\shctx, \icid{base}{}, \rename{W}{W})
%          \oplus \textsf{exposes}(\shctx, \icid{\cidl{q}}{\substHole{A}}, \rename{X}{B} )
%          \oplus \textsf{exposes}(\shctx, \icid{\cidl{p}}{\substHole{A}, \substMod{B}{\icid{\cidl{q}}{\substHole{A}}}{X}}, \rename{Y}{Y} ) \\
%   \mathcal{D} = \textsf{inherits}(\icid{base}{})
%          \oplus \textsf{inherits}(\icid{\cidl{q}}{\substHole{A}})
%          \oplus \textsf{inherits}(\icid{\cidl{p}}{\substHole{A}, \substMod{B}{\icid{\cidl{q}}{\substHole{A}}}{X}})\\
%   P_0 = p[\substHole{A}] \\
%   \Gamma; P_0; \mathcal{L}; \mathcal{D} \vdash \overline{\I{decl}} : \Xi \\
%   \Gamma; \Xi; P_0 \vdash  ~\textsf{well-typed} \\
%   \Gamma; \Xi; P_0 \vdash  ~\textsf{well-typed} \\
%   \Gamma; \Xi; P_0 \vdash  ~\textsf{well-typed}
%   \end{array}
%   }{
%   \shctx; \Gamma \vdash \UsynUnit{p}{\overline{\hv{m_j}}}{\overline{\I{\UsynDep{P_i}{r_i}}}; \overline{\I{decl}}} : \{ p : \Xi \}
%   }
%   \]

\subsection{Type lookup rules}
\label{sec:typing/lookup}

\input{typing/lookup}

During the course of typechecking a Haskell module, we will often have
to lookup the type of an original name.  In plain
Haskell, the type of each original name is recorded directly
in the context; in \Backpack{}, we may need to \emph{substitute} the
type declaration, substituting any name holes embedded within it.
\textsc{TUnit} contains the key rule: to determine the type of a module
$\Mod{p[S]}{m}$, lookup the type and apply its substitution.

Substitutions on semantic objects are not ordinary substitutions: not
only do we substitute all module identities (as in
\textsc{SName}), but we must also substitute over names
(\textsc{SNameHole}), according to the exports of the modules we are
substituting with: substitution requires us to recursively look up the
types of these modules.  Additionally, when we do substitutions on
\I{orphs}, to maintain transitivity, we must substitute in all of
the transitively imported orphans of the implementing module.

%   During the course of typechecking, ordinary type lookup will only ever
%   refer to signature types in $\Delta$; signature types of components are
%   used only during the process of signature merging, in order to form a
%   merged signature $m : T^- \in \Delta$, which incorporates the
%   requirements of all the (transitive) dependencies of the component.
%   In particular, if we are looking up the type of \nhv{m.n} (\textsc{TNameHole}),
%   this is done by looking up the type in the \emph{local} requirement
%   \hv{m}, not the requirement of the package where \nhv{m.n} came
%   from (indeed, the original name doesn't provide this information!)

Ordinarily, type lookup will only look up \emph{modules} (types
with polarity $+$) from the component environment.  However, when merging
signatures, we must also lookup signatures from the component environment
for merging.  In this case, there are two further possible cases which
must be handled: \textsc{SRnOrphan} and \textsc{SRnNameHole} (greyed in
the figure).  These rules handle the situation when the there is no
type for the target of the substitution in the local context $\Delta$.
In these cases, we just directly rename without consulting the context:
signature merging proper handles updating any name holes into their
final identities.  Orphans are handled similarly.

% Key points:
%   - Signatures process one-by-one, as an earlier signature
%     can cause a later one to merge.  Same with exports.
% Properties we want: (1) signature can import other signature, (2) signature sees the same view of an import A as other modules would, (3) semantics should be forwards compatible with a recursive semantics, (4) signature should be able to refine type so that merging that would otherwise fail succeeds
%   - Non-recursive case is easier because we don't have to preemptively
%     merge all types (running afowl of (4))
%   - Thinning doesn't work too well: want to thin BEFORE you typecheck
%   so that you avoid typechecking things you don't know.
%   - Thinning has a relationship with explicit signature exports


\paragraph{Simple examples} Before looking in detail in the complex case when there
is a module substitution, we give three examples of type lookup when
things are simple:

\begin{itemize}
    \item \emph{Lookup of a local module}
        ($\Gamma; \modname{M} : T_M^+, \Delta; P_0 \vdash \Mod{P_0}{M} : T_M$).
        To lookup the type of a module from $P_0$, we read it off directly
        from the local type context.  It's impossible for $P_0$ to be instantiated
        in any way, since that would require a self-referential \texttt{dependency}
        declaration, which is always illegal.

    \item \emph{Lookup of a hole}
        ($\Gamma; \modname{A} : T_A^-, \Delta; P_0 \vdash \hv{A} : T_A$).
        To lookup the type of a hole variable (e.g., \hv{A}), we look up
        the type from the local type context.  This implies an ordering constraint
        on the order we process local declarations: we must process a
        signature before any other signature or module which could
        possibly reference an entity from that signature (this includes
        references from \emph{inherited} requirements.)

    \item \emph{Lookup of external type with empty module substitution}
        ($\cidl{p} : \{ \modname{M} : T_M^+ \}, \Gamma ; \Delta; P_0 \vdash \MOD{p}{}{M} : T_M$).
        If we are looking up the type of a module from a component with no
        hole substitution, things are easy: just take the type directly out
        from the context.
\end{itemize}

\paragraph{Example with substitution}

Type lookup with a nontrivial module substitution is more involved.
For the purposes of this example, suppose that in our context, we have the type
$\Gamma(\cidl{p})(\modname{B}) = T_B$ for the uninstantiated module
$\MOD{p}{\substHole{A}}{B}$, where:

\[
\begin{array}{rcl}
    T_{B} &=& \begin{array}{l}
                \UobjIface\: (\nhv{\texttt{A.T}}, \MOD{p}{\substHole{A}}{B}.\texttt{S}) \\
                \qquad\texttt{data S = MkS}\ \nhv{\texttt{A.T}} \\
                \qquad\texttt{orphan}~\hv{A}
            \end{array} \\
            \end{array}
        \]

Let's suppose we need to lookup the module type for $\MOD{p}{\substMod{A}{\icid{\cidl{q}}{}}{A}}{B}$,
where $\MOD{q}{}{A}$, has this type:

\[
\begin{array}{rcl}
    T_{A} &=& \begin{array}{l}
                \UobjIface\: (\MOD{q}{}{A}.\texttt{T}) \\
                \qquad\texttt{data T = MkT} \\
                \qquad\texttt{orphan}~\MOD{q}{}{X} \\
                \qquad\texttt{orphan}~\MOD{q}{}{Y}
            \end{array} \\
            \end{array}
        \]

We apply a substitution on $T_B$, proceeding functorial until we encounter a name.
When we encounter $\nhv{\texttt{A.T}}$, the exports of $T_A$ tells us to substitute
this name hole with $\MOD{q}{}{A}.\texttt{T}$.  In contrast, when we
encounter $\MOD{p}{\substHole{A}}{B}.\texttt{S}$, we simply apply the module
substitution $\substMod{A}{\icid{q}{}}{A}$ to the unit identifier.  Finally,
$\texttt{orphan}~\hv{A}$ must be replaced with all the orphans from $T_A$,
so that we continue to accurately record all transitively reachable orphan modules.

After these substitutions, the final type is:

\[
\begin{array}{rcl}
    T_{BS} &=& \begin{array}{l}
                \UobjIface\: (\MOD{q}{}{A}.\texttt{T}, \MOD{p}{\substMod{A}{\icid{\cidl{q}}{}}{A}}{B}.\texttt{S}) \\
                \qquad\texttt{data S = MkS}\ \MOD{q}{}{A}.\texttt{T} \\
                \qquad\texttt{orphan}~\MOD{q}{}{X} \\
                \qquad\texttt{orphan}~\MOD{q}{}{Y}
            \end{array} \\
            \end{array}
        \]

\paragraph{Example with signature merging}  Ordinarily, we only need to lookup
module types when renaming and typechecking a Haskell module.  However,
there is one more situation when type lookup occurs: signature merging.
Suppose that we have a \texttt{dependency} declaration on
$\icid{\cidl{p}}{\modname{A}=\hv{A}}$:  this dependency means that we
inherit the requirement $\Mod{\icid{\cidl{p}}{\modname{A}=\hv{A}}}{A}$.
Before signature merging, we must look up the type of this requirement
so that it can be merged with the other requirements at this name.

Although type lookup for signature merging bears many similarities
to regular type lookup (and we have formalized it with the same set
of rules), there is an important case where the usual rules do not
work.  In the previous example, what should the substitution for $\nhv{A.T}$ be?
Ordinarily, we would lookup the exports of $\hv{A}$ to determine
what the original name is.  But at this point, no local type for $\hv{A}$
is available (that's what we are trying to compute!)  Nor is the
substitution necessary: signature merging is responsible for
working out what the final exports of the merged export are (refining
$\nhv{A.T}$ in the process.)

To work around this, \textsc{SRnNameHole} says that if the type $\hv{A}$
is not available, we simply leave $\nhv{A.T}$ unsubstituted.
(Actually, the full rule is more nuanced: if
the requirement is $\Mod{\icid{\cidl{p}}{\modname{A}=\hv{B}}}{A}$,
we must rename $\nhv{A.T}$ to $\nhv{B.T}$ to maintain the invariant
that $\nhv{B.T}$ is defined in the module $\hv{B}$.)  A similar case
applies to orphans as well.

\subsection{Subtyping rules}
\label{sec:subtyping}

\input{typing/subtyping}

The subtyping judgment specifies when one module type
is a subtype of another under some context (\textsc{SubMod}).
Subtyping must be performed under a context, because type equality
cannot be computed without knowing, e.g., the definitions of any
type synonyms referenced inside the module type.  To test for
subtyping, we first check for \emph{export subtyping}, checking
the export list is a superset and allowing for the subtype to have
refined the identities of hole names.  We apply the resulting
name substitution to all other parts of the module type before
checking for subtyping.

\paragraph{Example}  Here is an example of two module types
which, given an appropriate context, are subtypes of each other:

\[
\MOD{q}{}{M}~~:
\begin{array}{rcl}
\begin{array}{l}
    \UobjIface\: (\MOD{q}{}{M}.\texttt{T}, \MOD{r}{}{N}.\mathtt{U}, \MOD{q}{}{M}.\texttt{f}) \\
    \qquad\texttt{type T} = N_{Int} \\
    \qquad\texttt{f} :: \MOD{q}{}{M}.\texttt{T} \rightarrow \MOD{q}{}{M}.\texttt{T} \\
\end{array}
&
\le_\modname{A}%
&
\begin{array}{l}
    \UobjIface\: (\nhv{A.T}, \MOD{r}{}{N}.\mathtt{U}) \\
    \qquad\texttt{data T} \\
    \qquad\texttt{instance} :: N_{Eq}~\nhv{A.T} \\
\end{array}
\end{array}
\]

A subtyping rule like this might arise when checking that $\modname{A}=\MOD{q}{}{M}$
is a valid instantiation of some component (whose requirement is depicted
on the right hand side).

Subtyping proceeds as follows:

\begin{itemize}
    \item \emph{Implementation (left-hand side) type lookup.} ($\ctx \vdash M : \UobjTau{\UNs'}{\Utys'~\Uinsts'~\Uimps'}$)
        For technical reasons, we've written our subtyping rule so that the ``implementor''
        is specified as a module identifier $M$ rather than a type; thus, we must lookup $\MOD{q}{}{M}$
        from the context.  The rule
        is written in this ``asymmetric'' way to emphasize that $\MOD{q}{}{M}$ must
        be contained \emph{in the context}; this is important, because $\MOD{q}{}{M}$
        may define extra type equalities that are necessary for the rest of the subtyping
        tests to succeed.  We'll see this later in this example.

    \item \emph{Export matching}. ($\UNs' \le_m \UNs \leadsto \USn'$)
        Compare the exports the implementation and the requirement,
        unifying any name holes of the form $\nhv{\modname{A}.n}$ with
        the identities specified in the implementation ($\UNs'$),
        giving us a name substitution ($\USn'$).  In this case,
        the resulting name substitution is just $\nhv{A.T} \mapsto
        \MOD{q}{}{M}.\texttt{T}$.  Obviously, the implementation can
        export more entities than the requirement, but not vice-versa.
        We apply the resulting name substitution to all of the other
        semantic objects we subsequently test for subtyping.

    \item \emph{Declaration subtyping}. ($\forall \I{decl} \in \I{decls}.\, \Gamma; \Delta; P_0 \vdash \UNs'(\textsf{occname}(\Uty)) \le \substw{\Uty}{\USn'}$)  The most
        obvious subtyping test we have to do is ensuring each declaration
        in the requirement is adequately implemented on the implementation
        side.  In this case, we need to test that \verb|type T = Int| is
        an adequate implementation of the abstract \verb|data T|; in particular,
        that they have compatible kinds and type synonyms are valid
        implementations of abstract data (they are, if the right-hand side of
        the type synonym has no type applications; see \textsc{SubTypeAbsData}).  Notice
        that, like the module subtyping judgment, the declaration subtyping
        judgment is also asymmetric.  This is helpful, because the original
        definition of an export from an implementing module may not reside
        in that module at all; in this case, we have to look it up again
        (which is exactly what \textsc{SubDecl} does first before the other tests).
        Ignore the conditions about roles for now; we will discuss them later.

    \item \emph{Instance subtyping}. ($\forall \I{inst} \in \I{insts}.\, \Gamma; \Delta; P_0 \vdash \Uimps' \cup \{ M \} ~\textsf{solves}~ \substw{\I{inst}}{\USn'}$)  For each instance
        in the requirement, we have to check that the instance is solvable in
        the context of the implementing module.  There is some nuance to this:
        it's not simply a matter of checking that the implementing module
        provides each instance word-for-word, since (1) the instance may not
        be defined in the implementing module (as is the case here), and
        (2) the instance may only be solvable by composing multiple instances
        from the environment (e.g., \verb|Show [Char]|).  The other subtlety
        is that this instance resolution needs to take orphans into account:
        we must consider exactly the orphans transitively imported by the
        implementing module, as well as the module itself.

        Returning to the example,
        suppose that in the implementing module of \verb|Int|
        the instance \verb|Eq Int| is defined (this is a non-orphan instance).
        After the name substitution, we must show that the instance $N_{Eq}~\MOD{q}{}{M}.\texttt{T}$
        is solvable.  Because $\MOD{q}{}{M}.\texttt{T}$ is recorded in the
        context (remember our asymmetric rule?) as being a type synonym of
        $N_{Int}$, the instance solver treats this as equivalent to \verb|Eq Int|
        and solves it accordingly.

    \item \emph{Orphan subtyping}. ($\Uimps' \supseteq \Uimps$)  The set of orphan
    modules a module transitively imports is an important part of its type,
    as it affects the instances in scope.  Thus, we require that the implementing module
    import a superset of the orphan modules the
    signature transitively imports.  In this example, there aren't any orphans, so
    this test trivially passes.
\end{itemize}

\paragraph{Declaration subtyping and pre-subtyping}
In our presentation, declaration subtyping is organized in a subtyping
relation (\textsc{SubDecl}) and a pre-subtyping relation: the subtyping
relation tests if the kinds of the declarations are equal and the roles
are compatible, and then defers to the pre-subtyping relation for
case-by-case subtyping on each different type of declaration.  The
non-reflexive rules are boxed, and specify which declarations can
validly implement abstract data, classes and closed type families.

\input{typing/kinding}

\paragraph{Subroling, subtyping and representational injectivity}
The condition on roles in \textsc{SubDecl} is quite interesting:
it states that if $\Uty'$ is \emph{representationally injective},
to show that $\Uty \le \Uty'$,
is sufficient to show that $\overline{\rho'_i \le \rho_i}$: i.e.,
the roles of the supertype are subroles of the subtype.
Subroling, whose definition we inherit from~\cite{Breitner:2014:SZC:2692915.2628141},
is oriented in the opposite direction of the subtyping relation.

To explain why the rule is setup this way, we first have to briefly
explain
the function of roles in Haskell.  In Haskell, there are two notions of
equality: nominal equality and representational equality.\footnote{There is also
a third notion, phantom equality, which is degenerate: all types are phantom
equal to all other types.}  Nominal
equality is the traditional notion of type equality, whereas
representational equality specifies when the underlying representation
of a type are equal.  Nominal equality implies representational
equality, but a \verb|newtype| can introduce a nominal distinction between
two types without changing the underlying representation. If I declare
\verb|newtype Age = Age Int|, I now have a representational equality
between \verb|Age| and \verb|Int| (written as $\texttt{Age}
\sim_\textsf{R} \texttt{Int}$), but no nominal equality (written as $\texttt{Age}
\not\sim_\textsf{N} \texttt{Int}$).

Roles are associated with the type parameters of type constructor and
are used in two ways:

\begin{enumerate}
    \item \emph{Application.} If we would like to show $\texttt{T}~\tau \sim_\textsf{R}
    \texttt{T}~\sigma$, we must show that $\tau \sim_\rho \sigma$, where
    $\rho$ is the \emph{role} of the parameter of the type constructor
    \verb|T|.  A data type like \verb|data T a = T a| will have its parameter
    at representational role, since you only require $\tau \sim_R
    \sigma$, while a data type that uses a type family to match on
    nominal identity of the type parameter will have nominal role,
    requiring us to show that $\tau \sim_N \sigma$.

    \item \emph{Decomposition.}  If we know that $\texttt{T}~\tau \sim_\textsf{R}
    \texttt{T}~\sigma$, we can infer that $\tau \sim_\rho \sigma$ (where $\rho$
    is the role of the parameter of the type constructor),
    but only if \verb|T| is \emph{representationally injective}.  Data types
    are representationally injective, but newtypes are not: if you
    have \verb|newtype T a = MkT Int|,
    we have $\texttt{T}~\tau \sim_\textsf{R} \texttt{Int}$ for any $\tau$,
    even if the type parameter of \verb|T| is declared
    to be nominal.  In this case,
    it would be unsound to assume that given
    $\texttt{T}~\tau \sim_\textsf{R} \texttt{T}~\sigma$, $\tau \sim_\textsf{N} \sigma$ holds!
\end{enumerate}
%
Roles follow a subroling relationship, which specifies that if you have
a data type which a type parameter with role $\rho$, it is valid to
replace the role with a role $\rho'$, where $\rho' \le \rho$; for example,
even though \verb|data T a = T a| is representational in its first
argument by default, we can explicitly override it to be nominal in its
first argument, since $\textsf{N} \le \textsf{R}$.  Do subroles induce
a supertyping relationship?  If we consider only \emph{applications}, it would
seem this should be the case:

\vspace{-1em}
\begin{figure}[H]
\begin{tabular}{p{0.45\textwidth} p{0.45\textwidth}}
\begin{lstlisting}
signature A where
    type role T nominal
    data T a
\end{lstlisting}
&
\begin{lstlisting}
module A where
    type role T representational
    data T a = MkT a
\end{lstlisting}
\end{tabular}
\end{figure}
\vspace{-1em}
%
\noindent
Under the signature, given $\tau \sim_\textsf{N} \sigma$, we can infer
$\texttt{T}~\tau \sim_\textsf{R} \texttt{T}~\sigma$; however,
$\tau \sim_\textsf{R} \sigma$ would not be sufficient.
Under the module, both conditions are sufficient to infer
$\texttt{T}~\tau \sim_\textsf{R} \texttt{T}~\sigma$: thus,
more programs typecheck when a type argument is at representational role
than when is at a nominal role.

However, with decompositions, the opposite holds. Suppose that \verb|T|
were representationally injective: under the signature, given
$\texttt{T}~\tau \sim_\textsf{R} \texttt{T}~\sigma$, we could
show $\tau \sim_\textsf{N} \sigma$; we could \emph{not} infer
this under the module.
Fortunately, abstract data is \emph{not}
representationally injective (\cref{fig:representational-injectivity}), as it can be implemented via
a newtype (\textsc{SubNewtypeAbsData} in \cref{typing:subtyping}),
so this counterexample does not hold.

\paragraph{Subtyping synonyms}
The subtyping rules for type synonyms (\textsc{SubTypeAbsData} and
\textsc{SubTypeAbsClass}) are worth extra comment, because the implementing
type synonyms are required to be \emph{nullary}: for example, \verb|type M a = a|
is not a valid implementation of the abstract \verb|data M a|,
but \verb|type M = Maybe :: * -> *| is.  This restriction
stems from an old design decision in Haskell to not support \emph{type level
lambdas}.  This restriction greatly helps type inference, since given the
type equality $t~a = s~b$, we can now conclude that $t = s$ and $a = b$
(this property is called \emph{generativity}).  Thus, GHC restricts type
synonym applications to be fully saturated (unlike data declarations, which can
be partially applied); similarly, a type synonym can only be used in place
of a data declaration if this could not result in partially applied type
synonyms.

\subsection{Merging rules}
\label{sec:typing/merging}

\input{typing/merging}

Finally, \emph{merging} finds the greatest lower bound on our subtyping relation,
finding a signature which is a subtype of all the original signatures (or failing
if no such signature exists).  Let's recall our example from \cref{sec:signature-merging}
(now written as module types), extended slightly with some instances:

\[
\begin{array}{rcl}
\begin{array}{l}
    \UobjIface\: (\nhv{A.T}, \nhv{A.U}, \nhv{A.f}) \\
    \qquad\texttt{type T} = N_{Int} \\
    \qquad\texttt{data U} \\
    \qquad\texttt{f} :: N_{Int} \rightarrow \nhv{A.U} \\
    \qquad\texttt{instance} :: N_{Eq}~\nhv{A.U} \\
\end{array}
&
\oplus%
&
\begin{array}{l}
    \UobjIface\: (\nhv{A.T}, \nhv{A.U}, \nhv{A.f}) \\
    \qquad\texttt{data T} \\
    \qquad\texttt{type U} = N_{Int} \\
    \qquad\texttt{f} :: \nhv{A.T} \rightarrow N_{Int} \\
    \qquad\texttt{instance} :: N_{Eq}~\nhv{A.T} \\
\end{array}
\end{array}
\]

\textsc{MSigType} specifies how to merge these two signatures:

\begin{enumerate}
    \item We perform \emph{export matching}, computing the merged
    export list, where name holes are subsumed by reexported original
    names. The new export list induces a name substitution.
    In the example above, the export matching and resulting name
    substitution is trivial; but if, for example, we merged in
    another signature which exported $\MOD{p}{}{M}.\mathtt{f}$,
    we would have the substitution $\nhv{A.f} \mapsto \MOD{p}{}{M}.\mathtt{f}$.

    \item We merge the declarations of the module type together,
    producing a temporary signature $T_P$ which doesn't contain
    any instances.  This merging operation prefers non-abstract versions
    of declarations over abstract ones, but otherwise picks a declaration
    arbitrarily.  In the example above, we get:
\[
\begin{array}{l}
    \UobjIface\: (\nhv{A.T}, \nhv{A.U}, \nhv{A.f}) \\
    \qquad\texttt{data T} = N_{Int} \\
    \qquad\texttt{type U} = N_{Int} \\
    \qquad\texttt{f} :: \nhv{A.T} \rightarrow N_{Int} \\
\end{array}
\]

    \item With this temporary signature in the context, we
    merge instances, deduplicating them based on type equality.
    In our example, we are able to see $\nhv{A.T} =_\textsf{hs} \nhv{A.U} =_\textsf{hs} N_{Int}$,
    so we keep only one copy of the two instances from the
    original modules.

    \item We now extend our temporary signature with the merged
    instances and orphans, giving us
    a candidate merged signature $T_M$; it is a candidate signature
    because during this step because we do not yet know if the
    declaration merge was successful or not.
\[
\begin{array}{l}
    \UobjIface\: (\nhv{A.T}, \nhv{A.U}, \nhv{A.f}) \\
    \qquad\texttt{data T} = N_{Int} \\
    \qquad\texttt{type U} = N_{Int} \\
    \qquad\texttt{f} :: \nhv{A.T} \rightarrow N_{Int} \\
    \qquad\texttt{instance} :: N_{Eq}~N_{Int} \\
\end{array}
\]

    \item Finally, we check that the merged type is a subtype of each of the
    (substituted) input signature types, with the merged type added to the
    context.

    Why is it necessary to add the merged type to context?  This is perfectly
    illustrated by our example.  Consider one of the two subtyping tests
    we must carry out:

\[
\begin{array}{rcl}
\begin{array}{l}
    \UobjIface\: (\nhv{A.T}, \nhv{A.U}, \nhv{A.f}) \\
    \qquad\texttt{data T} = N_{Int} \\
    \qquad\texttt{type U} = N_{Int} \\
    \qquad\texttt{f} :: \nhv{A.T} \rightarrow N_{Int} \\
    \qquad\texttt{instance} :: N_{Eq}~N_{Int} \\
\end{array}
&
\le_\modname{A}%
&
\begin{array}{l}
    \UobjIface\: (\nhv{A.T}, \nhv{A.U}, \nhv{A.f}) \\
    \qquad\texttt{type T} = N_{Int} \\
    \qquad\texttt{data U} \\
    \qquad\texttt{f} :: N_{Int} \rightarrow \nhv{A.U} \\
    \qquad\texttt{instance} :: N_{Eq}~\nhv{A.U} \\
\end{array}
\end{array}
\]

When we check \verb|f| for compatible, we must show
$\nhv{A.T} \rightarrow N_{Int} =_\textsf{hs} N_{Int} \rightarrow \nhv{A.U}$.
This is only true if $\texttt{type U} = N_{Int}$ is in the context!
The presence of the merged type is also important when testing if instances
are solvable.
\end{enumerate}
%   Both module
%   types are assumed to have had a name substitution applied to them, so that
%   subtyping on exports can be checked using a simple superset operation.

\paragraph{A little metatheory}
We would like it if our merging operation were both sound (if it
succeeds, the new module type is the greatest lower bound of the two
types under the name substitution induced by the exports of the merged
module type) and complete (if there exists a greatest lower bound, we
find it) with respect to the subtyping relation.  Unfortunately,
while our operation is sound, it is not complete.

\begin{lemma}[Name substitution absorption]
\label{lem:absorb}
If $\UNs' \le_m \UNs$, then $\substw{\substw{x}{\textsf{nsubst}(m, \UNs)}}{\textsf{nsubst}(m, \UNs')} = \substw{x}{\textsf{nsubst}(m, \UNs')}$.
\end{lemma}

\begin{lemma}[Export merging is a greatest lower bound on export subtyping.]
\label{lem:export}
If $\UNs_M = \bigoplus_i \UNs_i$ for the signature $m$, then
$\overline{\UNs_M \le_m \UNs_i}$; furthermore,
for all $\UNs'_M$, if
$\overline{\UNs'_M \le_m \UNs_i}$, then
$\UNs'_M \le_m \UNs_M$
\end{lemma}

%   \begin{proof}
%   To show $\UNs_M \supseteq \substw{\UNs_i}{\textsf{nsubst}(m,\UNs_M)}$,
%   it is sufficient to show that for all $N \in \UNs_i$, it is the
%   case that $N \in \UNs_M$, except
%   when $N$ has the form $\nhv{m.n}$, in which case we must show that $n \in \textsf{dom}(\UNs_M)$.
%   This is clear by inspection of the rules for export merging: by \textsc{MName} merging always exactly preserves any $N$ which is not of the form $\nhv{m.n}$; by \textsc{MExport} merging never drops a hole name from the domain of
%   the resulting exports.

%   Now suppose there existed some $\UNs'_M$ such that
%   $\overline{\UNs'_M \supseteq \substw{\UNs_i}{\textsf{nsubst}(m,\UNs_M)}}$.  To show
%   $\UNs'_M \supseteq \substw{\UNs_M}{\textsf{nsubst}(m,\UNs'_M)}$,
%   it is sufficient to show that for all $N \in \UNs_M$, $N \in \UNs'_M$, except
%   when $N$ has the form $\nhv{m.n}$, in which case we must show that $n \in \textsf{dom}(\UNs'_M)$.  Consider the first case. By the definition of $\UNs_M$, we know there must be some $i$ where $N \in \UNs_i$.  Then by the subtyping relation on $\UNs'_M$, it must be the case that $N \in \UNs'_M$.  The second case is similar: $\nhv{m.n}$ must exist in some signature, so $n$ must be in the domain of $\UNs'_M$.
%   \end{proof}

\begin{lemma}[Representational injectivity conservation]
\label{lem:inj}
For any $\Uty$ and $\Uty'$, $\Uty \oplus \Uty'$ is representationally injective
if and only if either $\Uty$ or $\Uty'$ (or both) is representationally
injective.
\end{lemma}

\begin{lemma}[Role merging is the least upper bound on subroling]
\label{lem:role}
If $\rho_M = \bigoplus_i \rho_i$, then
$\overline{\rho_M \ge \rho_i}$; furthermore,
for all $\rho'_M$, if $\overline{\rho'_M \ge \rho_i}$,
then $\rho'_M \ge \rho_M$.
\end{lemma}

\begin{theorem}[Merging is the greatest lower bound on subtyping]
If $\ctx; m \vdash \bigoplus_i T_i \leadsto T_M$, then
$\overline{\Gamma; \Delta, m : T^-_M; P_0 \vdash \hv{m} \le T_i}$; furthermore, for all
$T'_M$, if $\overline{\Gamma; \Delta, m: T'^-_M \vdash \hv{m} \le T_i}$, then $\Gamma; \Delta, m : T'^-_M \vdash \hv{m} \le T_M$.
\end{theorem}

\begin{proof}
If our merging algorithm succeeds, the merged type is trivially a
subtype of the input types (since we check this explicitly).  Now
suppose that there existed some type $T'_M$, such that $\overline{\Gamma; \Delta, m : T'^-_M; P_0 \vdash \hv{m} \le T_i}$.
We will show that $\Gamma;\Delta, m : T'^-_M; P_0 \vdash \hv{m} \le \substw{T_M}{\USn'}$,
where $\USn' = \textsf{nsubst}(m, \textsf{exports}(T'_M))$.
Additionally, let $\USn = \textsf{nsubst}(m, \textsf{exports}(T_M))$.

\begin{itemize}
    \item \emph{Exports.} By Lemma~\ref{lem:export}, $\UNs' \le_m \UNs$.

    \item \emph{Declarations.} Consider each $\I{decl} \in \textsf{decls}(\substw{T_M}{\USn'})$
    and let $n=\textsf{occname}(\I{decl})$.  To
    show declaration subtyping, we must show kind equality, role
    compatibility and pre-subtyping.
    We have $\I{decl'}$ such that $\Gamma; \Delta, m : T'^-_M; P_0 \vdash \UNs'(n) : \I{decl'}$; $\UNs'(n)$ is guaranteed to exist by export list subtyping.
    For any choice of $i$, let $\I{decl}_i = \textsf{decls}(T_i)(n)$ if it exists.
    \begin{itemize}
        \item \emph{Kind equality.} We have $\I{decl'} : \kappa'_M$.  By inversion, we know that there must be some
        $i$ such that $\Gamma; \Delta, m : T'^-_M; P_0 \vdash \kappa'_M =_\textsf{hs} \substw{\kappa_i}{\USn'}$,
        where $\I{decl}_i :: \kappa_i$, and which was preferred by merging
        so $\kappa_M = \substw{\kappa_i}{\USn}$.
        Then, by Lemma~\ref{lem:absorb}, we have $\substw{\kappa_M}{\USn'} = \kappa'_M$.

        \item \emph{Role compatibility.} There are two
        cases.  First suppose that $\I{decl}$ is
        representationally injective.  By Lemma~\ref{lem:inj}, we know
        that there must be an $i$ such that $\textsf{inj}(\I{decl}_i)$,
        and was preferred by merging to have the same roles as $\Uty$.
        By inversion, $\I{decl}'$ must also have exactly the same roles as the
        $\I{decl}_i$. Second, consider if $\I{decl}$ is not
        representationally injective.  Then it must be the case that
        all defined $\Uty_i$,
        are not representationally injective, and only
        the first two cases of declaration merging which apply.
        Lemma~\ref{lem:role}
        then establishes the desired result.
        \item \emph{Pre-subtyping} There are two cases.
        First, suppose that there exists a non-abstract declaration
        $\I{decl}_i$ which was preferred by merging so that $\I{decl} = \substw{\I{decl}_i}{\USn}$.
        By inversion on $\Gamma; \Delta, m : T'^-_M; P_0 \vdash \Uty'_M \le \substw{\Uty_i}{\USn'}$,
        we know all the components of $\Uty'_M$ are $=_\textsf{hs}$
        to their counterparts in $\substw{\Uty_i}{\USn'}$ under
        this context.
        Then, by Lemma~\ref{lem:absorb}, we know the components of $\Uty'_M$
        are $=_\textsf{hs}$ to their counterparts in $\substw{\Uty}{\USn'}$.
        Second, suppose for all $i$, $\Uty_i$ is abstract.
        Then, not only is each declaration identical (by inspection of subtyping rules
        for abstract types), they are identical with $\I{decl}$, giving us
        the desired subtyping.
    \end{itemize}

    \item \emph{Transitively imported modules.} Direct from properties of
    set union and subset.

    \item \emph{Instances.} Consider each $\I{inst}$ in $T_M$.
    There must exist $\I{inst}_i$ from merging such that $\I{inst} = \substw{\I{inst}_i}{\USn}$.
    We know that $\Gamma; \Delta, m : T'^-_M; P_0 \vdash \I{orphs'} \cup \{ \hv{m} \} ~\textsf{solves}~ \substw{\I{inst}_i}{\USn'}$, and
    must
    show that $\Gamma; \Delta, m : T'^-_M; P_0 \vdash \I{orphs'} \cup \{ \hv{m} \} ~\textsf{solves}~ \substw{\I{inst}}{\USn'}$.  By Lemma~\ref{lem:absorb}, we know
    $\substw{\substw{\I{inst}_i}{\USn}}{\USn'} = \substw{\I{inst}_i}{\USn'}$,
    giving us the desired result.
\end{itemize}
\end{proof}

\noindent
Merging is \emph{not} complete:  to begin
with, there may not necessarily exist a greatest upper bound:

\begin{tabular}{p{0.3\textwidth} p{0.3\textwidth} p{0.3\textwidth}}
\begin{verbatim}
signature A where
    data T

    data U = Mk T
\end{verbatim}
&
\begin{verbatim}
signature A where

    data S
    data U = Mk S
\end{verbatim}
&
\begin{verbatim}
signature A where
    data T
    type S = T
    data U = Mk T
\end{verbatim}
\end{tabular}
%
The third signature is indeed a subtype of the first two signatures, but
it is not a supertype of \verb|type T = S; data S = MkS; data U = MkT|, which
\emph{is} a subtype of the first two signatures.  With our subtyping rules,
there is no signature which we can write which is the greatest upper bound.
Indeed, merging will fail on these two signatures, complaining that \verb|T|
in the \verb|Mk| constructor does not match \verb|S|.  It is possible that
this problem could be solved by making GHC Haskell's semantic objects
more ``semantic'' (for example, removing type synonyms from the semantic
objects), but in existing GHC Haskell they exist as semantic entities, and in
\Backpack{} we sought not to make backwards incompatible changes to the
semantics of GHC Haskell.

Additionally, merging does not make any attempt to pick a ``better'' choice
among non-abstract declarations, which that if any choice could result in a
type synonym cycle, we must reject the signature entirely:

\vspace{-1em}
\begin{figure}[H]
\centering
\begin{shortmath}
\begin{tabular}{p{0.30\textwidth} p{0.30\textwidth}}
\begin{lstlisting}
signature A where
    data A
    type B = A
\end{lstlisting}
&
\begin{lstlisting}
signature A where
    type A = B
    type B = Int
\end{lstlisting}
\end{tabular}
\end{shortmath}
\end{figure}

\vspace{-2em}
\noindent
If we prefer the declarations from the second signature, there is no cycle;
but if we take \verb|type A = B| and \verb|type B = A|, there is a cycle.
The acyclicity check verifies that under \emph{all} possible merges, there
are no type synonym cycles.\footnote{This is not yet implemented in GHC.}

\paragraph{Merging instances}  In the definition of merging, we drop
any instances which have exactly duplicate dictionary types $\tau$.
Unlike declaration merging,
the instances produced by this procedure
will not necessarily let us form the subtype of the input modules.
For example, suppose we have \verb|instance K1 a => K a|
and \verb|instance K2 a => K a|.  Assuming that \verb|K1| and \verb|K2|
are distinct, these have different types, so when we merge we keep
both instances.  Unfortunately, putting both instances in the environment
means \emph{neither} instance is derivable, because both instances have
identical instance heads.  As Haskell's constraint solver is guess-free,
it will conservatively report that \verb|instance K2 a => K a| is not
solvable given the environment, because it has no way of telling whether
or not \verb|instance K1 a => K a| or \verb|instance K2 a => K a|.
If \verb|K2| is derivable from \verb|K1|, or vice versa, this instance
\emph{is} solvable, but we have no way of expressing this in Haskell's
language of semantic objects.

\paragraph{Behavior with ill-typed signatures}  In our implementation,
we noticed that the typing rules we have
described above have some surprising behavior when typechecking signatures that
are not well-typed: we may temporarily end up with types in our context
that are ill-kinded, or generally nonsensical.  For example, we might
merge \verb|data T; f :: T| and \verb|data T a; f :: T a| to get
\verb|data T; f :: T a|, which is obviously ill-kinded.  It seems difficult
to avoid behavior like this in the presence of mutual recursion (both MixML
and \OldBackpack{} also exhibit this behavior); fortunately,
this does not affect soundness (eventually we notice that \verb|data T|
and \verb|T a| are incompatible during the subtyping test and reject the merge).


\section{Substitutability and type inference}
\label{sec:substitutability}

One property of the GHC's type system which is required for
the soundness of \Backpack{} is \emph{substitutability}: given that
a module type checks under some type environment, if we apply a
valid name substitution to the environment, the module must
continue to typecheck.

In the absence of type classes and type families (see
\cref{sec:metatheory} for a detailed discussion on these
features), this property largely holds for Haskell's type system.
However, we found one subtle case related to GHC's treatment
of \emph{inaccessible code}.  Inaccessible code occurs when
GHC is able to prove that a constraint is unsatisfiable; for
example:

\begin{lstlisting}
    f :: (Int ~ Bool) => a -> b
    f x = x
\end{lstlisting}
%
Here, the constraint that \verb|Int| be nominally equal to \verb|Bool|
can never be fulfilled, so GHC accepts the otherwise ill-typed
function definition.  In $\mathrm{OutsideIn(X)}$~\cite{Vytiniotis:2011:OMT:2139531.2139533},
this behavior occurs when we apply the \textsc{faildec} constraint
solving rule, which states that
constraint solving should immediately fail if we attempt to unify
two distinct type constructors (i.e., two type constructors with
different original names), on a \emph{given
constraint}: constraint solving failure on a given constraint means that
the code behind the constraint is inaccessible.

Unfortunately, this rule is unsound in the presence of \Backpack{}:

\begin{lstlisting}
    signature A where
        data T
        data S
    module B where
        import A
        f :: (T ~ S) => a -> b
        f x = x
\end{lstlisting}

As \verb|T| and \verb|S|, at the point of typechecking \verb|B|, have
distinct original names, the constraint solver might conclude that
\verb|f|'s body is inaccessible.  But if we subsequently set
\verb|type S = T|, now \verb|T ~ S| does indeed hold!

Fortunately, as the $\mathrm{OutsideIn(X)}$ paper observes, it is sound
to \emph{not} eagerly fail when simplifying givens.  So the fix is simple:
don't eagerly fail if the constraint involves an abstract type.
Intuitively, abstract types in \Backpack{} behave more like \emph{skolem
variables}, in the sense that they don't unify with anything, but cannot
be assumed to be distinct from all types (the client will instantiate
the skolem variable to whatever they please.)

\iffalse%
\section{Role subtyping}

Though not formalized above, Haskell's type system also includes the
concept of \emph{roles}~\cite{Weirich:2011:GTA:1925844.1926411,Breitner:2014:SZC:2692915.2628141}.  The parameters of all type constructor are annotated with roles,
which mark the distinction of whether or not the type constructor
cares about the \emph{nominal} or \emph{representational} identity
of a type at that position.

\Red{Finish this after we decide it's good}
\fi

% Things that are missing
%   - Component sequencing OK
%   - Functorial rules for rename/renamesig OK
%   - Import subtyping / instantiation
%   - Info equality OK
%   - Acyclicity OK
%
% Should add labels

% Things to talk about:
%   - Temporary bogus types. MixML and Backpack have this problem
%     (Backpack in particular: if you define a type and refer to it,
%     but that type is already in context, which do you use.  Name
%     is consistent, but not necessarily definition!)
%   - Cyclic exports when you add signature reexports to recursive
%   Backpack
%   - Recursive design space is big. hs-boot for hsigs? Do signature
%   declarations become before or after dependencies?  Signature
%   should see anything, but also refine recursive occurences of self?