title | numbersections | author | geometry | output | colorlinks | urlcolor |
---|---|---|---|---|---|---|
TAL inference |
true |
Lily Lin |
margin=3cm |
pdf_document |
true |
blue |
Code: https://github.com/rctcwyvrn/tal-inference
Run dockerfile: run.sh
In the TAL paper we saw that typed assembly was able to provide a number of useful safety properties for assembly, like not getting stuck and not accessing unallocated memory. However we also saw that TAL requires a set of label types, either generated by a compiler or handwritten by the programmer. Since there are many existing assembly programs that lack these annotations, as well as compilers that aren't designed to output types in the back-end, an interesting idea would be to implement a type inference system for TAL.
This project attempts to do exactly that, inferring all types for TAL with jumps and pointers. The goals are to prevent the same classes of issues that the original TAL and the ATAPL TAL attempted to avoid, namely getting stuck and out of bounds memory errors.
The instructions are
add r1 r2 v
sub r1 r2 v
mov r1 r2
bnz r2 v
load r1 r2 i
store r1 i r2
store-strong r1 i r2
malloc r1 n
commit r1
which all follow the same runtime semantics as what we saw in ATAPL. The only difference is the store-strong
instruction which supports changing the type of the value
Our basic types are
-
$int$ -
$Top$ -
$UniqPtr({ l_i: \tau_i}, \rho)$ -
$Ptr({ l_i: \tau_i}, \rho)$ -
$Code({ r_i: \tau_i})$
Let
As you might expect, the typing rules are going to be very similar to the original TAL systems, but now with constraints being generated. First we follow identical rules for well typing of a value, operand, heap and register as seen in ATAPL.
We have slightly different rules for instruction sequences. Firstly, an instruction sequence now generates a set of constraints
The interpretation of this rule is that an instruction sequence
An instruction sequence is then well typed if we can type and satisfy all constraints with variables as the initial state
where
The last piece of the puzzle is how to typecheck a jump
terminal. As mentioned during the presentation, jumps include subtyping as well as a bunch of very questionable checks to avoid creating recursive types while still constraining labels properly. I don't know how to write it out formally so here's an approximation
When typechecking a bnz v
/jump v
-
$\Psi, \Gamma \vdash v \Rightarrow T$ -
Let
$T' = Code({r: T_r | r\in R})$ where$T_r$ is a fresh type variable -
Add the constraint
$T = T'$ to$C'$ -
If
$T$ is a variable corresponding to a parameter for this label, then for each register$r\in R$ , add constraint$\Gamma[r] = T_r$ to$C'$ if$\Gamma[r]$ is an$int$ ,$Ptr$ , or$UniqPtr$ . -
Add the constraint
$\Gamma <: {r: T_r | r\in R}$ to$C'$ -
The
bnz
/jump
then has type$(\Gamma, C) \rightarrow (\Gamma, C')$
The plan is to use subtyping instead of polymorphism to allow "forgetting" that certain registers were set, similar to what was original TAL. So we add a subtyping rule for contexts and pointers, however note that there is no generic subsumption rule. Subtyping only appears in constraints generated by jumps as seen in the above rule. We only require weakening since pointer indicies are inherently ordered and this is done by quantifying over the labels on the right hand side.
We have a simple bidirectional inference rule, which just looks the register up in the context. This always works because the context will be filled with type variables at the start of each instruction sequence (SeqTy
rule).
Next we have the rules for checking instructions, starting with mov
and arithmetic. Recall from the rule for instruction sequences that an instruction
Let
The memory based instructions are a bit more complex since they involve rho variables.
where
Most rules involve generating a fresh
You might notice that this rule appears to reject loading from a unique pointer. So how does the typechecker implement this rule? The answer is that there's a very sketchy rule in the unification algorithm that UniqPtr <: Ptr
... which is really informal and should be fixed. Either with a more formal subtyping or with two different load instructions like store.
Speaking of store
Again, this interally uses the sketchy subyping system to allow weak stores to unique pointers... Luckily store strong doesn't need it. We do however have to make sure we aren't copying a unique pointer. Note that store-strong
Finally, we have commit
The initial setup of the context for each labelled block is handled by check_block
(checker.rs:303
), which initializes the context with type variables in each register.
Each rule is straightforwardly implemented in check_instruction
(checker.rs:214
) using the constrain_register
, constrain_value
, and update_register
functions. Each function modifies the current context and constraint list as expected.
Jumps are handled by constrain_jump
(checker.rs:152
), which implements the sketchy rules from above.
Unification is handled by Unifier
while Satisfier
handles the subtyping rules (both in unify.rs
)
The basic unification algorithm is fairly straightforward (unify.rs:220
), however things get more complicated for unifying pointers.
A brief rundown of the implementation of pointer unification
-
The entrypoint is
unify_ptrs
(unify.rs:64
). This function checks the two pointers to see if the pointers have rho variables and calls into eitherunify_rho_id_with_mapping
if there is one orunify_rho_mappings
if there are none. -
add
(unify.rs:121
) andunify_subtract
(unify.rs:173
) implement addition and subtraction on records. Subtraction may generate additional constraints which is why it has theunify_
prefix. -
unify_rho_id_with_mapping
(unify.rs:197
) takes in a rho variable and a record type that it has to satisfy. If the variable has already been bound then we check if the current binding is satisfactory inunify_rho_mappings
. Otherwise we just bind the rho variable -
unify_rho_mappings
(unify.rs:143
) takes two record types and checks if they're equal by callingunify_rho_entries
(unify.rs:158
) on each element of the record.unify_rho_entries
(unify.rs:158
) mostly just calls intoconstrain
, though it exists becauseRhoEntry
contains theAbsent
variant. I initially added it because I thought it would be needed but it turns out I didn't, so this extra complexity is just a leftover from that.
The post unification mapping is closed by chase_to_root
(unify.rs:263
) and chase_all_to_root
(unify.rs:297
). This is where type variables are lifted to Top
if they weren't bound (the mapping SeqTy
). chase_all_to_root
creates a direct mapping from variable id to a TyU
(short for unified type, which does not contain the UnifVar
variant and instead contains Any
). This mapping is then used to initialize a Satisfier
The two subtyping rules are handled by Satisfier
in satisfy_jump
(unify.rs:458
) and satisfy_rho
(unify.rs:372
). Both are called through satisfy
(unify.rs:504
) which simply loops through the subtyping constraints remembered from each jump and calls satisfy_jump
.
The not equal constraints are also checked by Satisfier
in check_neq
(unify.rs: 530
)
-
To change the test case that gets run, edit
test.rs
ormain.rs
-
To change the number of registers, edit
MAX_REGISTER
inchecker.rs