Type inferencing

Introduction

We want to infer types for variables so that the rule has the most chances to apply therefore we want to maximize the inferred sorts. Because of the way we mix user syntax with K syntax, we introduce a new subsort between K and KBott for every user sort. To make this easier to visualize we can think of all the subsort relations as a DAG where K is at the top and KBott is the smallest element. Inferring a sort means finding the highest sort from the graph which meets all the constraints.

Let's take an example (1):

// A < B < C < D
syntax D ::= C
syntax C ::= B
syntax B ::= A
syntax A ::= f(C)
rule f(X) => .K

The sort of X can be any of A, B or C. But to maximize matching we must infer C.

Why we need a constraint solver to disambiguate properly?

Let's take this example (2):

syntax Exps  ::= Exp “,” Exps | Exp
syntax Exp   ::= Id
syntax Id    ::= [A-Za-z][A-Za-z0-9]*
syntax Stmt  ::= “val” Exps “;” Stmt
syntax KBott ::= “(“ K “)” | “(“ Id “,” Stmt “)”
syntax K     ::= Exps | Exp | Id
rule val X; S => (X,S)
rule =>(val_;_(X:Exps, S:Stmt), 
        amb((_)(_,_(X:Exp, S:Exps))
            (_,_)(  X:Id,  S:Stmt))

Based on each appearance of each variable and ambiguities we have the following constraints:

variant 1: - no solution
  X:{Exps, Exp}
  S:{Stmt, Exps}
variant 2:
  X:{Exps, Id}
  S:{Stmt, Stmt}

For variant 1, we can infer X:Exp as being the most general sort that fits all constraints, but for S we end up with KBott which is not going to match on anything. So we have to discard this solution. Variant 2 will infer X:Id, which is smaller than the previous solution, but we can find a valid option for S:Stmt this time. We can take this solution and, with a simple visitor, can eliminate the rest of the ambiguities from the AST.

Type inference here can be viewed as a constraint solving problem where we must solve X and S in:

X:Exps /\ S:Stmt /\ ((X:Exp /\ S:Exps) \/ (X:Id /\ S:Stmt))

Modeling in SMT

After playing with Z3 for a bit, we ended up working with CVC4 because of their builtin support for Sets.

First step in modeling the problem was to represent the sorts and the subsort relations in SMT2.

(declare-datatypes ((Sort 0))
   (((A) (B) (C) (D))))

; A < B < C < D
(define-fun tsubs () (Set (Tuple Sort Sort))
  (tclosure (insert ; initial subsorts
    (mkTuple A B)
    (mkTuple B C) (singleton
    (mkTuple C D)))))

We introduce the sort names as datatypes and the subsort relations as a set of tuples, transitively closed.

(define-fun isSubsortedStrict ((x Sort) (y Sort)) Bool
   (member (mkTuple x y) tsubs))
(define-fun isSubsorted ((x Sort) (y Sort)) Bool
   (or (= x y) (isSubsortedStrict x y)))

Subsorting can be represented as uninterpreted functions which just simply check membership of the relation in question in the tsubs set.

Modeling the constraints on X is as simple as:

(declare-const X Sort)
(assert (and (isSubsorted X C))) ; constraints on variable

At this point, the SMT solver would find any of the A, B, or C sorts as satisfiable. To make sure we get the maximum, we must add a new constraint:

(assert (not (exists ((y Sort))  ; maximality
                (and (isSubsorted y C)
                     (isSubsortedStrict X y)))))

which says that there is no y which is strictly bigger than X and meets the same constraints.

The final step is to model multiple variables at the same time. For this, we define a solution as being a tuple of Sorts on which we apply the same restrictions as above.

Here is how example (2) would look like:

(set-logic ALL)
(set-info :status sat)
; use finite model finding heuristic for quantifier instantiation
(set-option :finite-model-find true)
(set-option :produce-models true)
(declare-datatypes ((Sort 0))
   (((Id) (Exp) (Exps) (Stmt))))

; Id < Exp < Exps

; subsorts POSet
(define-fun tsubs () (Set (Tuple Sort Sort))
    (tclosure (insert
        (mkTuple Id Exp) (singleton
        (mkTuple Exp Exps)))))

(define-fun isSubsortedStrict ((x Sort) (y Sort)) Bool
   (member (mkTuple x y) tsubs))
(define-fun isSubsorted ((x Sort) (y Sort)) Bool
   (or (= x y) (isSubsortedStrict x y)))

; constraints predicate: rule var X; S => (X, S)
(define-fun constraints ((x Sort) (s Sort)) Bool
    (and (isSubsorted x Exps)
         (isSubsorted s Stmt)
         (or (and (isSubsorted x Exp)
                  (isSubsorted s Exps))
             (and (isSubsorted x Id)
                  (isSubsorted s Stmt)))))

; maximality
(define-fun maximality ((x Sort) (s Sort)) Bool
    (not (exists ((xp Sort) (sp Sort))
                (and (constraints xp sp)
                     (or (isSubsortedStrict x xp)
                         (isSubsortedStrict s sp))))))

(define-fun isSol ((x (Tuple Sort Sort))) Bool
    (and (constraints ((_ tupSel 0) x) ((_ tupSel 1) x))
         (maximality  ((_ tupSel 0) x) ((_ tupSel 1) x))))

; solution to be found
(declare-fun solSet () (Set (Tuple Sort Sort)))

; completeness
(assert (forall ((x (Tuple Sort Sort)))
              (and (=> (isSol x) (member x solSet))
                   (=> (member x solSet) (isSol x)))))

(check-sat)
(get-model)

Output: (define-fun solSet () (Set (Tuple Sort Sort)) (singleton (mkTuple Id Stmt)))

Since we don't always have a top sort, we may end up with multiple solutions. CVC4 allows us to find all the solutions by adding a "completeness" assert.

(set-logic ALL)
(set-info :status sat)
; use finite model finding heuristic for quantifier instantiation
(set-option :finite-model-find true)
(set-option :produce-models true)
(declare-datatypes ((Sort 0))
   (((A) (B) (C))))

; A < B C

; subsorts POSet
(define-fun tsubs () (Set (Tuple Sort Sort))
    (tclosure (insert
        (mkTuple A B) (singleton
        (mkTuple A C)))))

(define-fun isSubsortedStrict ((x Sort) (y Sort)) Bool
   (member (mkTuple x y) tsubs))
(define-fun isSubsorted ((x Sort) (y Sort)) Bool
   (or (= x y) (isSubsortedStrict x y)))

; constraints predicate
(define-fun constraints ((x Sort)) Bool
    (or (isSubsorted x B)
        (isSubsorted x C)))

; maximality
(define-fun maximality ((x Sort)) Bool
    (not (exists ((xp Sort))
                (and (constraints xp)
                     (isSubsortedStrict x xp)))))

(define-fun isSol ((x (Tuple Sort))) Bool
    (and (constraints ((_ tupSel 0) x))
         (maximality  ((_ tupSel 0) x))))

; solution to be found
(declare-fun solSet () (Set (Tuple Sort)))

; completeness
(assert (forall ((x (Tuple Sort)))
              (and (=> (isSol x) (member x solSet))
                   (=> (member x solSet) (isSol x)))))

(check-sat)
(get-model)

Output: (define-fun solSet () (Set (Tuple Sort)) (union (singleton (mkTuple B)) (singleton (mkTuple C))))

Extra details

The CVC4 option is important: (set-option :finite-model-find true) use finite model finding heuristic for quantifier instantiation.
To extract the solution just take every tuple from the solSet and match it with the same variables used at generation time.
Not modeled here, but it is possible to have no solution for certain variables. In order to still provide the user with useful feedback, we can forcefully add a KBott sort as a bottom, which ensures that the SMT solver will always find a solution. Then it's just a matter of post-processing to find the correct solution and return an error when needed.

TODO:

Overloaded productions

syntax Exp  ::= Id
syntax Ids  ::= Id  "," Ids  | Id
syntax Exps ::= Exp "," Exps | Exp | Ids
rule A, B => .K

rule =>(amb(ids(A:Id, B:Ids), exps(A:Exp, B:Exps)), .K)
variant 1:
  A:{Id}
  B:{Ids}
variant 2:
  A:{Exp}
  B:{Exps}

Both solutions are valid, but the second one is more general than the first (all sorts are subsorts) so we choose variant 2, then prune all ambiguities. It's possible that in other parts of the rule we have additional constraints and end up choosing variant 1. Then we are still left with ambiguities, then we apply a filter for overloaded productions. Prod1 < Prod2 if all NT1 <= NT2. In this case ids < exps.

Priority and associativity

syntax Exp ::= Exp "*" Exp [left]
             > Exp "+" Exp [left]
rule 1 + 2 * 3 => .K
rule =>(amb(plus(1, mul(2,3)), mul(plus(1,2),3)), .K)

we eliminate the second branch simply because mul is a direct parent of plus. Associativity looks at direct parent-child relations in the left hand side or right hand side only.

Interesting examples

Parametric sorts and productions

  syntax KItem ::= "start"
  configuration
    <T> 
      <k> start </k>
      <env> 1 |-> ListItem(2) ListItem(3) ListItem(4)
            5 |-> ListItem(6) ListItem(7)
            8 |-> ListItem(9) </env>
    </T>

  //rule <k> start => E1 </k> // lookup the first element from the mapping
  //     <env> 1 |-> ListItem(E1) LRest:List ...</env>
 
  rule <k> start => {Mp [1]}:>List [0] </k>
       <env> Mp </env>
  //rule <k> start => Mp [1] [0] </k> // no sort annotation is desired
  //     <env> Mp </env>

Right now the Map.lookup and List.get don't have sort information:

syntax KItem ::= Map  "[" KItem "]"         [function, hook(MAP.lookup), klabel(Map:lookup), symbol]
syntax KItem ::= List "[" Int   "]"         [function, hook(LIST.get), klabel(List:get), symbol]
syntax List  ::= ListItem(KItem)            [function, functional, hook(LIST.element), klabel(ListItem), symbol]
syntax Map   ::= Map Map                    [function, hook(MAP.concat), klabel(_Map_), symbol, left, assoc, comm, unit(.Map), element(_|->_)]
syntax List  ::= List List                  [function, functional, hook(LIST.concat), klabel(_List_), symbol, left,  smtlib(smt_seq_concat), assoc, unit(.List), element(ListItem)]

but with parametric sorts, they can be defined as:

syntax {K,V}    V  ::= Map{K,V}  "[" K "]"  [function, hook(MAP.lookup), klabel(Map:lookup), symbol]
syntax {S}      S  ::= List{S} "[" Int "]"  [function, hook(LIST.get), klabel(List:get), symbol]
syntax {S} List{S} ::= ListItem(S)          [function, functional, hook(LIST.element), klabel(ListItem), symbol]
syntax Map{K,V}    ::= Map{K,V} Map{K,V}    [function, hook(MAP.concat), klabel(_Map_), symbol, left, assoc, comm, unit(.Map), element(_|->_)]
syntax List{S}     ::= List{S} List{S}      [function, functional, hook(LIST.concat), klabel(_List_), symbol, left,  smtlib(smt_seq_concat), assoc, unit(.List), element(ListItem)]

From the configuration I should be able to do a bottom-up traversal and conclude that the sort of cell is Map{Int,List{Int}}. For the rules, I have to do a top-down traversal which carries the sort restrictions for every child term, and a bottom-up traversal to determine the sort of each production by finding the LUB of all children.

Parsing becomes interesting, because a full type erasure may introduce too many ambiguities. Maybe a partial type erasure is better, where we keep the head of the sort (Map{Int, List{Int}} -> Map) and keep low the productions we try at each step.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly