diff --git a/main/functional-correspondence.tex b/main/functional-correspondence.tex index 6714c74..e37f078 100644 --- a/main/functional-correspondence.tex +++ b/main/functional-correspondence.tex @@ -11,26 +11,12 @@ \chapter{The Functional Correspondence}\label{chapter:functional-correspondence} The first one exposes the control structure of the evaluator; the second replaces function values with first-order data structures and their applications with calls to a first-order global function. In the remainder of this chapter I will describe those transformations and illustrate their behavior using the running example of an evaluator for \LC{} from Section \ref{sec:idl}. -The abstract syntax and the corresponding meta-language records are shown in Figure \ref{fig:lambda-calc-stx}. +Let us recall the previously described meta-circular interpreter of Figure \ref{fig:lambda-calc-interp}. The variables are represented as \lstinline!String!s of characters. -We will begin with a standard meta-circular interpreter shown in Figure \ref{fig:lambda-calc-interp} and arrive at a variation of the CEK machine. Since every expression in \LC{} may only evaluate to a function, the values produced by the interpreter are represented as meta-language functions. The interpreter uses environments represented as partial functions from variables to values to handle binding of values to variables during application. The application in object-language is interpreted using application in meta-language so the defined language inherits call-by-value, left-to-right evaluation order. - -\begin{figure} - \begin{center} - \begin{tabular}{r l r} - \LC & Representation & Name \\ - \hline - $x$ & \lstinline!String! & Variable \\ - $ \lambda x . e $ & \lstinline!{Abs String Term}! & Abstraction \\ - $ e \, e $ & \lstinline!{App Term Term}! & Application - \end{tabular} - \end{center} - \caption{\LC{} syntax and its representation} - \label{fig:lambda-calc-stx} -\end{figure} +At the end of this chapter we shall arrive at the CEK machine. \section{Continuation-Passing Style} The first step towards building the abstract machine is capturing the control-flow characteristics of the defined language. @@ -103,6 +89,8 @@ \section{Continuation-Passing Style} \begin{figure} \centering \begin{lstlisting} +(def-data Term ...) + (def eval (env term k) (match term ([String x] (k (env x))) @@ -134,8 +122,7 @@ \section{Defunctionalization} Lastly, we must somehow know for each application point which functions may be applied. The first two challenges can be solved with a static, syntactic analysis of the interpreter. The other challenge can be solved using control-flow analysis as described in Section \ref{sec:selective-defun}. -For the purposes of this example we observe that there are two function spaces with anonymous functions: continuations and environments. -The first is inhabited by th +For the purposes of this example we observe that there are three function spaces with anonymous functions: continuations, representation of abstractions and environments. \begin{figure} \centering @@ -185,43 +172,46 @@ \section{Defunctionalization} \end{itemize} It is worth noting that defunctionalization preserves the tail-call property of a program in CPS. -Applying the defunctionalization procedure to the interpreter yields the definitions in Figure \ref{fig:defun-interp} having performed manual simplification and with carefully chosen names. -We can see that environments form a linked list of pairs variable-value and that the \texttt{lookup} function (which is a generated \textit{apply} function) searches this list for the first matching variable. -The defunctionalized continuations form a stack with \texttt{\{finish\}} at the bottom. -The generated \texttt{continue} function pops a frame from that stack and continues execution. -Evaluating function expression in defined language now produces a closure record which holds the environment and the name of the variable to be bound. -The machine we obtained is similar to the CEK \cite{Felleisen} machine but with explicit handling of environment extension and lookup. +After applying the defunctionalization procedure to functions representing lambda abstractions and to continuations we obtain (again with a bit of manual cleanup) an encoding ot the CEK machine \cite{Felleisen} in Figure \ref{fig:abstract-machine-cek}. +It uses a stack \lstinline!Cont! (which are defunctionalized continuations) to handle the control-flow and \lstinline!Closure!s (which are defunctionalized lambda abstractions) to represent functions. +The environment is left untouched and is still encoded as a partial function. +The machine has two classes of states: \lstinline!eval! and \lstinline!continue!. +In \lstinline!eval! mode the machine dispatches on the shape of the term and either switches to \lstinline!continue! mode when it has found a value (either a variable looked up in the environment or an abstraction) or pushes a new continuation onto the stack and evaluates the expression in function position. +The \lstinline!continue! function is the \textit{apply} function generated by the defunctionalization procedure. +In \lstinline!continue! mode the machine checks the continuation and proceeds accordingly: when it reaches the bottom of the continuation stack \lstinline!Halt! it returns the final value \lstinline!val!; when the continuation is \lstinline!App1! it means that \lstinline!val! holds the function value which will be applied once \lstinline!arg! is computed; the stack frame \lstinline!App2! signifies that \lstinline!val! holds the computed argument and the machine calls a helper function \lstinline!apply! (the second generated \textit{apply} function) to unpack the closure in \lstinline!fn! and evaluate the body of the closure in the extended environment. + \begin{figure} - \centering - \begin{verbatim} -(def extend (env x v k) - (continue k {extend env x v})) - -(def lookup (env y k) - (case env - ({extend env x v} - (if (== x y) (continue k v) (lookup env y k))) - ({init} (error "empty environment")))) - -(def continue (k value) - (case k - ({eval-fun env e k} (eval e env {eval-app value k})) - ({eval-app f k} (apply f value k)) - ({eval-body e k} (eval e value k)) - ({finish} value))) - -(def apply (f v k) - (case f - ({closure env x} (extend env x v {eval-body e k})))) - -(def eval (e env k) - (case e - ({var x} (lookup env x k)) - ({fun x e} (continue k {closure env x})) - ({app f e} (eval f env {eval-fun env e k})))) - -(def main (e) (eval e {init} {finish})) - \end{verbatim} - \caption{A defunctionalized interpreter} - \label{fig:defun-interp} +\begin{lstlisting} +(def-data Term ...) + +(def-data Cont + {Halt} + {App1 arg env cont} + {App2 fn cont}) + +(def-struct {Closure body env x}) + +(def init (x) (error "empty environment")) +(def extend (env k v) ...) + +(def eval (env term cont) + (match term + ([String x] (continue cont (env x))) + ({Abs x body} (continue cont {Closure body env x})) + ({App fn arg} (eval env fn {App1 arg env cont})))) + +(def apply (fn v cont) + (let {Fun body env x} fn) + (eval (extend env x v) body cont)) + +(def continue (cont val) + (match cont + ({Halt} val)) + ({App1 arg env cont} (eval env arg {App2 val cont})) + ({App2 fn cont} (apply fn val cont))) + +(def main ([Term term]) (eval {Init} term {Halt})) +\end{lstlisting} +\caption{An encoding of the CEK machine for \LC{}} +\label{fig:abstract-machine-cek} \end{figure} diff --git a/main/introduction.tex b/main/introduction.tex index a7e6d55..fa3637d 100644 --- a/main/introduction.tex +++ b/main/introduction.tex @@ -4,13 +4,13 @@ \chapter{Introduction}\label{chapter:introduction} The field of formal semantics of programming languages seeks to provide tools to answer such a question. Denotational semantics allow one to relate programs to mathematical objects which describe their behavior. Operational semantics provide means to characterize evaluation of programs by building a relation between programs and final values in case of big-step semantics; by defining a relation allowing for step-by-step transformation of programs in case of small-step semantics; or by specifying an abstract machine with a set of states and a transition function. -These semantic formats all allow for systematic definition of programming languages but differ in style and type of reasoning as well as their limitations. +These semantic formats all allow for systematic definition of programming languages but differ in style and type of reasoning they allow as well as their limitations. Another approach is to provide an interpreter for the language in question (which I will call the \emph{object}-language) written in another language (to which I will refer as the \emph{meta}-language). These definitional interpreters \cite{reynolds} may also be classified into groups similar to the formal semantics. Starting with the most abstract and concise meta-circular interpreters which use meta-language constructs to interpret same constructs in object-language (e.g. using anonymous functions to model functional values, using conditionals for \textit{if} expressions). Via various evaluators with some constructs interpreted by simpler language features (e.g. with environments represented as lists or dictionaries instead of functions) but still relying on the evaluation order of the meta-language. -And ending with a first-order machine-like interpreters which use an explicit stack for handling control-flow of the object-language. +Ending with a first-order machine-like interpreters which use an explicit stack for handling control-flow of the object-language. In his paper \cite{reynolds} Reynolds summarizes techniques which allow one to transform high-level definitional interpreters into lower-level ones using two well-known program transformations: transformation into continuation-passing style and defunctionalization. This connection between evaluators on different levels of abstraction has later been studied by Ager et al.\cite{functional-correspondence} who use it to relate several abstract machines for \LC{} with interpreters embodying their evaluation strategies and called it the functional correspondence. @@ -27,9 +27,13 @@ \chapter{Introduction}\label{chapter:introduction} In the remainder of this chapter I introduce the \textit{Interpreter Definition Language} which is the meta-language accepted by the transformer and will be used in example evaluators throughout the thesis; I also compare the semantics formats with styles of interpreters to which they correspond. In Chapter \ref{chapter:functional-correspondence} I describe the functional correspondence and its constituents. In Chapter \ref{chapter:transformer} I show the algorithmic characterization of the correspondence. -In Chapter \ref{chapter:evaluation} I discuss the performance of the tool and the related work. +In Chapter \ref{chapter:evaluation} I evaluate the performance of the tool and discuss related work. Appendices contain user's and developer's manual for the semantic transformer. +I assume that the reader is familiar with \LC{} and its semantics (both normal (call-by-name) and applicative (call-by-value) order reduction). +Familiarity with formal semantics of programming languages (both denotational and operational) is also assumed although not strictly required for understanding of the main subject of this thesis. +The reader should also be experienced in using a higher-order functional language with pattern matching. + \section{Interpreter Definition Language}\label{sec:idl} The \emph{Interpreter Definition Language} or \emph{IDL} is the meta-language used by \texttt{semt} -- a semantic transformer. It is a purely functional, higher-order, dynamically (strongly) typed language with strict evaluation order. @@ -45,7 +49,7 @@ \section{Interpreter Definition Language}\label{sec:idl} The \lstinline!match! expression matches an expression against a list of patterns. Patterns may be variables (which will be bound to the value being matched), wildcards \lstinline!_!, base type patterns e.g. \lstinline![String x]! or record patterns e.g. \lstinline!{Abs x body}!. The \lstinline!fun! form introduces anonymous function, \lstinline!error "..."! stops execution and signals the error. -Finally, application of a function is written as in \textit{Scheme} -- i.e. as a list of expressions (for example \lstinline!(eval init term)!). +Finally, application of a function is written as in \textit{Scheme}, i.e. as a list of expressions (e.g. \lstinline!(eval init term)!). \begin{figure} \centering @@ -84,7 +88,7 @@ \subsection*{Denotational Semantics} Denotational semantics are considered to be the most abstract way to specify behavior of programs and can lead to very concise definitions. The drawback is that interesting language features such as loops and recursion require more complex mathematical theories to describe the denotations, in particular domain theory and continuous functions. In terms of interpreters, the denotational semantics usually correspond to evaluators that heavily reuse features of the meta-language in order to define the same features of object-language e.g. using anonymous functions to model functional values, using conditionals for if expressions, etc. -This style of interpreters is sometimes called \textit{meta-circular} due to the recursive nature of language definition. +This style of interpreters is sometimes called \textit{meta-circular} due to the recursive nature of the language definition. On the one hand these definitional interpreters allow for intuitive understanding of object-language's semantics given familiarity with meta-language. On the other hand, the formal connection of such an interpreter with the denotational semantics requires formal definition of meta-language and in particular understanding of the domain in which denotations of meta-language programs live. The evaluator of Figure \ref{fig:lambda-calc-interp} is an example of the meta-circular approach. @@ -95,13 +99,13 @@ \subsection*{Big-step Operational Semantics} The format of big-step operational semantics, also known as natural semantics allows for specification of behavior of programs using inference rules. These rules usually decompose terms syntactically and give rise to a relation between programs and values to which they evaluate. The fact of evaluation of a program to a value is proven by showing a derivation tree built using the inference rules. -Non-terminating programs therefore have no derivation tree which makes this semantic-format ill-suited for describing divergent or infinite computations. +Non-terminating programs therefore have no derivation tree which makes this semantic format ill-suited to describing divergent or infinite computations. The interpreters which correspond to big-step operational semantics usually have a form of recursive functions that are not necessarily compositional. The natural semantics may be non-deterministic and relate a program with many results. When turning nondeterministic semantics into an evaluator (in a deterministic programming language) one has to either change the formal semantics or model the nondeterminism explicitly. Let us now turn to a simple interpreter embodying the natural semantics for an imperative language \textit{IMP} shown in Figure \ref{fig:evaluator-imp}. -\begin{figure}\label{fig:evaluator-imp} +\begin{figure}[h] \begin{lstlisting} (def-data AExpr String @@ -141,6 +145,7 @@ \subsection*{Big-step Operational Semantics} (eval init-state cmd)) \end{lstlisting} \caption{An interpreter for \textit{IMP} in the style of natural semantics} +\label{fig:evaluator-imp} \end{figure} Datatypes \lstinline!AExpr!, \lstinline!BExpr! and \lstinline!Cmd! describe abstract syntax of arithmetic expressions, boolean expressions and commands. @@ -157,51 +162,59 @@ \subsection*{Abstract Machines} As with big-step operational semantics, an abstract machine may be nondeterministic. Usually elements of the machine-state tuple are simple and first-order e.g. terms of the object-language, numbers, lists, etc. One way of encoding a deterministic abstract machine in a programming language is to define a function for each subset of machine states with similar structure. -The exact state is determined by the actual parameters of the function at run-time. +The exact configuration is determined by the actual parameters of the function at run-time. The bodies of these (mutually recursive) functions encode the transition function. -Figure \ref{fig:abstract-machine-cek} contains an interpreter corresponding to the CEK abstract machine for \LC{}. -It has been obtained from the denotational semantics of Figure \ref{fig:lambda-calc-interp}. -It uses a stack of continuations \lstinline!Cont! to handle the control-flow and \lstinline!Closure!s to represent functions. +Figure \ref{fig:krivines-machine} contains an interpreter corresponding to Krivine's machine performing normal order (call-by-name) reduction of \LC{} with de Bruijn indices. +It uses two stacks: \lstinline!Cont!inuation and \lstinline!Env!ironment. +Both of them contain \lstinline!Thunk!s -- not-yet-evaluated terms paired with their environment. +The object-language functions are represented as \lstinline!Closure!s -- function bodies paired with their environment. The machine has two classes of states: \lstinline!eval! and \lstinline!continue!. -In the first state the machine dispatches on the shape of the term and either switches to \lstinline!continue! mode when it has found a value or pushes a new continuation onto the stack. -In \lstinline!continue! mode the machine checks the continuation and proceeds accordingly: when it reaches the bottom of the continuation stack \lstinline!Halt! it returns the final value \lstinline!val!; when the continuation is \lstinline!App1! it means that \lstinline!val! holds the function value which will be applied once \lstinline!arg! is computed; the stack frame \lstinline!App2! signifies that \lstinline!val! holds the computed argument and the machine calls a helper function \lstinline!apply! to unpack the closure in \lstinline!fn! and evaluate the body of the closure in extended environment. +The initial configuration is \lstinline!(eval term {Nil} {Halt})! -- i.e. eval with the term of interest and empty stacks. +There are four transitions from \lstinline!eval! configuration. +The first two search for the \lstinline!Thunk! corresponding to the variable (de Bruijn index) in the environment and then evaluate it with old stack but with restored environment. +The third transition switches to \lstinline!continue! configuration with the closure is created by pairing current environment with abstraction's body. +The fourth transition pushes a \lstinline!Thunk! onto continuation stack and begins evaluation of function expression. +In \lstinline!continue! configuration the machine inspects the stack. +If it is empty then the computed function \lstinline!fn! is the final answer which is returned. +Otherwise an argument is popped from the stack and the machine switches to \lstinline!eval!uating the \lstinline!body! of the function in the restored environment extended with \lstinline!arg!. \begin{figure} \begin{lstlisting} (def-data Term - String - {Abs String Term} + Integer + {Abs Term} {App Term Term}) -(def-data Cont - {Halt} - {App1 arg env cont} - {App2 fn cont}) +(def-struct {Closure body env}) +(def-struct {Thunk env term}) -(def-struct {Closure body env x}) +(def-data Env + {Nil} + {Cons Thunk Env}) -(def init (x) (error "empty environment")) -(def extend (env k v) ...) +(def-data Cont + {Push Thunk Cont} + {Halt}) -(def eval (env term cont) +(def eval ([Term term] [Env env] cont) (match term - ([String x] (continue cont (env x))) - ({Abs x body} (continue cont {Closure body env x})) - ({App fn arg} (eval env fn {App1 arg env cont})))) - -(def apply (fn v cont) - (let {Fun body env x} fn) - (eval (extend env x v) body cont)) - -(def continue (cont val) + (0 (match env + ({Nil} (error "empty env")) + ({Cons {Thunk env term} _} (eval term env cont)))) + ([Integer n] (eval (- n 1) env cont)) + ({Abs body} (continue cont {Closure body env})) + ({App fn arg} (eval fn env {Push {Thunk env arg}})))) + +(def continue (cont fn) (match cont - ({Halt} val)) - ({App1 arg env cont} (eval env arg {App2 val cont})) - ({App2 fn cont} (apply fn val cont))) + ({Push arg cont} + (let {Closure body env} fn) + (eval body {Cons arg env} cont)) + ({Halt} fn))) -(def main ([Term term]) (eval {Init} term {Halt})) +(def main ([Term term]) (eval term {Nil} {Halt})) \end{lstlisting} -\caption{An encoding of the CEK machine for \LC{}} -\label{fig:abstract-machine-cek} +\caption{An encoding of Krivine's machine} +\label{fig:krivines-machine} \end{figure} \ No newline at end of file diff --git a/thesis.pdf b/thesis.pdf new file mode 100644 index 0000000..7765078 Binary files /dev/null and b/thesis.pdf differ