Typechecking TAL0 in Agda.

This post is an annotated solution to problem 4.2.11 in the book Advanced Topics in Types and Programming Languages from the chapter Typed Assembly Language by Gerg Morrisett.

Introducing TAL0

I’ll do my best to give an informal introduction to TAL0 right here — if you wish to read something more polished I suggest you consult the original source.

TAL0 aims to provide control flow safety to simple, imperative programs. Control flow safety implies that a program is allowed to jump only to well-defined points, labels, in the code. This is easy to verify if all jump instructions take a constant label as an argument — we only have to verify that the label exists. But we’d like to go further than that and be able to dynamically jump to a location held in a register. With this freedom, the problem becomes a little more interesting.

TAL0 is a toy assembly language. You have a finite number of registers, each of which can hold either an integer or a label. You have the following instructions:

  • jump label jumps to label label.
  • reg0 ←r reg1 moves the value in reg0 to reg1.
  • reg ←l label moves label label to the register reg.
  • reg ←n integer moves integer integer to the register reg.
  • reg0 ← reg1 ⟨ op ⟩ reg2 performs operation op (anything with the type ℕ → ℕ → ℕ) on the integers stored in reg1 and reg2 and stores the result in reg0.
  • if reg0 jump reg1 jumps to the label stored in reg1 if the value in reg0 is not 0.

Informally, we’d like to design a type system that ensures that all the above operations are always well-defined. The type-system should ensure that reg0 ← reg1 ⟨ op ⟩ reg2 doesn’t try to add 42 to a label, for example.

Labels are sequences of instructions. I have take the liberty to constrain sequences to always end with a static jump instruction. I don’t think we lose any generality due to this.

The only mutable “state” in the system are the registers.

The type system

In essence, the approach described in the paper assigns a type to the register file (the array of registers) at each instruction.

A label’s type is a vector of types, one for each register. This label-type, which we’ll call Γ, encodes what the sequence of instructions expects the register file to look like on entry. If, for instance, Γ says that r0 contains a label and r1 an integer on entry, we know that if r1 jump r0 is safe, if it is the first instruction in the sequence. We then morph Γ as we go through the instructions in the sequence based on how each instruction affects the register file. For example, we know that after executing r0 ←n 42, Γ[r0] (the type at r0 in Γ) is Nat. In other words, we assign a (possibly different) type to the register file at each instruction. The type of each register (at a specific instruction) can either be Nat, denoting that it contains an integer or block { r0 : t0, r1 : t1 ... }, denoting that it contains a label to a block (or sequence) of instructions which, on entry, expects r0 to contain a value of type t0, r1 to contain a value of type t1 and so on.

There is one subtlety, however. What type do you assign to this sequence of instructions?

  if r1 jump r0
  jump some_label

You see, going naively by the above scheme, the type r0 has at the point where the if_jump_ has to contain itself. A little more formally, if Γ is the type of the register file at the location of the if_jump_ then Γ[r0] = Γ. One obvious solution is to allow Γ to be an infinite type. Another approach (the one used in the paper) uses a polymorphic type and instantiates it with itself when required. I found this approach a little hard to implement in Agda. Specifically, I ran into issues with positivity of types, something I don’t fully understand, and decided to implement a simpler (but more limited) approach.

I use an anything type that denotes that a particular register can contain, really anything. The type-system is still sound since TAL0 can only manipulate values of this type opaquely — it can neither jump to registers of type anything nor can it do math on them. In the motivating example, we’d set Γ[r0] = label { r0 = anything, r1 = ... }. The type of the r0 is a label, which allows r0 to contain any value whatsoever. Since r0 can contain anything before the jump, it can very well contain a value of type label { r0 = anything, r1 = ... }. The crucial point is that we don’t really allow any kind of self-nesting or recursion here. I found it helpful to think about anything being — the result of an infinite recursion.

The implementation

The first step is declaring Γ, the type of a register file. This is a mutually recursive definition:

mutual
  Γ′ : ℕ → Set
  Γ′ n = Vec Type n

  Γ = Γ′ size

  data Type : Set where
    natType : Type
    blockType : Γ′ size → Type
    anything : Type

Here size is a value of type which is basically the number of registers we have. Secondly, we define a function safeToJump' that checks if it is feasible to jump to the label contained in a specific register from the current point in the instruction stream:

safeToJump : {n : ℕ} → (to : Γ′ n) → (from : Γ′ n) → Set
safeToJump [] [] = ⊤
safeToJump (anything ∷ xs) (x' ∷ xs') = safeToJump xs xs'
safeToJump (x ∷ xs) (x' ∷ xs') =
  if (equalT x x') then (safeToJump xs xs') else ⊥

safeToJump' : Γ → Reg → Set
safeToJump' γ idx with lookup idx γ
... | natType = ⊥
... | blockType γ′ = safeToJump γ′ γ
... | anything = ⊥

equalT is a trivial function that compares types for equality. safeToJump' merely asserts that it is never safe to jump to an integer or an anything. When it sees a block-type it does a straightforward deep comparison, keeping in mind the anything semantics discussed above. Returning and allows us to have an implicit argument of the type safeToJump' γ r and have Agda fill in the value automatically for and fail typechecking for .

The meat of the implementation is a Block data-type. A value of this type is also the proof of the consistency of the stream of instructions it represents. Each sequence is expressed as a discrete value and a program is a mutually recursive set of such values. For instance, here are two labels which just keep jumping to each other:

mutual
  α : Block (natType ∷ natType ∷ [])
  α = jump (♯ β)
  β : Block (natType ∷ natType ∷ [])
  β = jump (♯ α)

Agda’s mixfix syntax allows us to really go wild when defining Block:

data Block : Γ → Set where
  jump : {γ₀ γ₁ : Γ} → ∞ (Block γ₀) -> { _ : safeToJump γ₀ γ₁ } → Block γ₁
  _←r_▶_ : {γ : Γ} → (dest : Reg) → (src : Reg) →
            Block (γ [ dest ]≔ (lookup src γ)) →  Block γ
  _←l_▶_ : {γ γ₀ : Γ} → (dest : Reg) → (src : ∞ (Block γ₀)) →
            Block (γ [ dest ]≔ (blockType γ₀)) →  Block γ
  _←n_▶_ : {γ : Γ} → (dest : Reg) → (src : ℕ) →
            Block (γ [ dest ]≔ natType) →  Block γ
  _←_⟨_⟩_▶_ : {γ : Γ} → (dest : Reg) → (op₀ : Reg) →
               (fn : ℕ → ℕ → ℕ) → (op₁ : Reg) →
               ⦃ _ : lookup op₀ γ ≡ natType ⦄ →
               ⦃ _ : lookup op₁ γ ≡ natType ⦄ →
               Block (γ [ dest ]≔ natType) → Block γ
  if_jump_▶ : {γ : Γ} → (src : Reg) → (dest : Reg) →
              ⦃ _ : lookup src γ ≡ natType ⦄ →
              { _ : safeToJump' γ dest } →
              Block γ → Block γ

Coinduction (the ) is needed to allow, for instance, jump instructions to the same block. We want to be able to type b = jump (♯ b). The ⦃ ... ⦄ is Agda syntax for instance arguments.

We build the blocks bottom up. Every instruction (other than jump which is always the last instruction) takes a block of type Γ₀ and produces a block of type Γ₁, where running the instruction will change a register file of type Γ₁ to a register file of type Γ₀.

The individual rules are rather simple and I will annotate only one of them: the _←_⟨_⟩_▶_ rule says that we expect both op₀ and op₁ to contain integers and the rest of the block will see an integer at the destination register. It results in a block that may not contain an integer at the destination register. The approach may look a little mysterious (and ass-backwards), but Agda is perfectly capable of doing this kind of type inference, if it knows the expected type of resultant Block. It isn’t hard to see how — given γ, it is always possible to compute γ [ dest ]≔ natType.

And yeah, this is about it! The full code (with a few trivial additions, like the equalT function) is available at https://github.com/sanjoy/Snippets/blob/master/TAL0.agda

Programming in Agda is awesome to the point of being unfair.

Advertisements
This entry was posted in Computers and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s