-
Notifications
You must be signed in to change notification settings - Fork 4
Type inferencing
We want to infer types for variables so that the rule has the most chances to apply therefore we want to
maximize the inferred sorts. Because of the way we mix user syntax with K syntax, we introduce a new subsort
between K
and KBott
for every user sort. To make this easier to visualize we can think of all the subsort relations as a DAG where K
is at the top and KBott
is the smallest element. Inferring a sort means finding the highest sort from the graph which meets all the constraints.
Let's take an example (1):
// A < B < C < D
syntax D ::= C
syntax C ::= B
syntax B ::= A
syntax A ::= f(C)
rule f(X) => .K
The sort of X
can be any of A
, B
or C
. But to maximize matching we must infer C
.
Let's take this example (2):
syntax Exps ::= Exp “,” Exps | Exp
syntax Exp ::= Id
syntax Id ::= [A-Za-z][A-Za-z0-9]*
syntax Stmt ::= “val” Exps “;” Stmt
syntax KBott ::= “(“ K “)” | “(“ Id “,” Stmt “)”
syntax K ::= Exps | Exp | Id
rule val X; S => (X,S)
rule =>(val_;_(X:Exps, S:Stmt),
amb((_)(_,_(X:Exp, S:Exps))
(_,_)( X:Id, S:Stmt))
Based on each appearance of each variable and ambiguities we have the following constraints:
variant 1: - no solution
X:{Exps, Exp}
S:{Stmt, Exps}
variant 2:
X:{Exps, Id}
S:{Stmt, Stmt}
For variant 1, we can infer X:Exp
as being the most general sort that fits all constraints, but for S
we end up with KBott
which is not going to match on anything. So we have to discard this solution.
Variant 2 will infer X:Id
, which is smaller than the previous solution, but we can find a valid option for S:Stmt
this time. We can take this solution and, with a simple visitor, can eliminate the rest of the ambiguities from the AST.
Type inference here can be viewed as a constraint solving problem where we must solve X
and S
in:
X:Exps /\ S:Stmt /\ ((X:Exp /\ S:Exps) \/ (X:Id /\ S:Stmt))
After playing with Z3 for a bit, we ended up working with CVC4 because of their builtin support for Sets.
First step in modeling the problem was to represent the sorts and the subsort relations in SMT2.
(declare-datatypes ((Sort 0))
(((A) (B) (C) (D))))
; A < B < C < D
(define-fun tsubs () (Set (Tuple Sort Sort))
(tclosure (insert ; initial subsorts
(mkTuple A B)
(mkTuple B C) (singleton
(mkTuple C D)))))
We introduce the sort names as datatypes and the subsort relations as a set of tuples, transitively closed.
(define-fun isSubsortedStrict ((x Sort) (y Sort)) Bool
(member (mkTuple x y) tsubs))
(define-fun isSubsorted ((x Sort) (y Sort)) Bool
(or (= x y) (isSubsortedStrict x y)))
Subsorting can be represented as uninterpreted functions which just simply check membership of the relation in question in the tsubs
set.
Modeling the constraints on X
is as simple as:
(declare-const X Sort)
(assert (and (isSubsorted X C))) ; constraints on variable
At this point, the SMT solver would find any of the A
, B
, or C
sorts as satisfiable. To make sure we get the maximum, we must add a new constraint:
(assert (not (exists ((y Sort)) ; maximality
(and (isSubsorted y C)
(isSubsortedStrict X y)))))
which says that there is no y
which is strictly bigger than X
and meets the same constraints.
The final step is to model multiple variables at the same time. For this, we define a solution as being a tuple of Sorts on which we apply the same restrictions as above.
Here is how example (2) would look like:
(set-logic ALL)
(set-info :status sat)
; use finite model finding heuristic for quantifier instantiation
(set-option :finite-model-find true)
(set-option :produce-models true)
(declare-datatypes ((Sort 0))
(((Id) (Exp) (Exps) (Stmt))))
; Id < Exp < Exps
; subsorts POSet
(define-fun tsubs () (Set (Tuple Sort Sort))
(tclosure (insert
(mkTuple Id Exp) (singleton
(mkTuple Exp Exps)))))
(define-fun isSubsortedStrict ((x Sort) (y Sort)) Bool
(member (mkTuple x y) tsubs))
(define-fun isSubsorted ((x Sort) (y Sort)) Bool
(or (= x y) (isSubsortedStrict x y)))
; constraints predicate: rule var X; S => (X, S)
(define-fun constraints ((x Sort) (s Sort)) Bool
(and (isSubsorted x Exps)
(isSubsorted s Stmt)
(or (and (isSubsorted x Exp)
(isSubsorted s Exps))
(and (isSubsorted x Id)
(isSubsorted s Stmt)))))
; maximality
(define-fun maximality ((x Sort) (s Sort)) Bool
(not (exists ((xp Sort) (sp Sort))
(and (constraints xp sp)
(or (isSubsortedStrict x xp)
(isSubsortedStrict s sp))))))
(define-fun isSol ((x (Tuple Sort Sort))) Bool
(and (constraints ((_ tupSel 0) x) ((_ tupSel 1) x))
(maximality ((_ tupSel 0) x) ((_ tupSel 1) x))))
; solution to be found
(declare-fun solSet () (Set (Tuple Sort Sort)))
; completeness
(assert (forall ((x (Tuple Sort Sort)))
(and (=> (isSol x) (member x solSet))
(=> (member x solSet) (isSol x)))))
(check-sat)
(get-model)
Output: (define-fun solSet () (Set (Tuple Sort Sort)) (singleton (mkTuple Id Stmt)))
Since we don't always have a top sort, we may end up with multiple solutions. CVC4 allows us to find all the solutions by adding a "completeness" assert.
(set-logic ALL)
(set-info :status sat)
; use finite model finding heuristic for quantifier instantiation
(set-option :finite-model-find true)
(set-option :produce-models true)
(declare-datatypes ((Sort 0))
(((A) (B) (C))))
; A < B C
; subsorts POSet
(define-fun tsubs () (Set (Tuple Sort Sort))
(tclosure (insert
(mkTuple A B) (singleton
(mkTuple A C)))))
(define-fun isSubsortedStrict ((x Sort) (y Sort)) Bool
(member (mkTuple x y) tsubs))
(define-fun isSubsorted ((x Sort) (y Sort)) Bool
(or (= x y) (isSubsortedStrict x y)))
; constraints predicate
(define-fun constraints ((x Sort)) Bool
(or (isSubsorted x B)
(isSubsorted x C)))
; maximality
(define-fun maximality ((x Sort)) Bool
(not (exists ((xp Sort))
(and (constraints xp)
(isSubsortedStrict x xp)))))
(define-fun isSol ((x (Tuple Sort))) Bool
(and (constraints ((_ tupSel 0) x))
(maximality ((_ tupSel 0) x))))
; solution to be found
(declare-fun solSet () (Set (Tuple Sort)))
; completeness
(assert (forall ((x (Tuple Sort)))
(and (=> (isSol x) (member x solSet))
(=> (member x solSet) (isSol x)))))
(check-sat)
(get-model)
Output: (define-fun solSet () (Set (Tuple Sort)) (union (singleton (mkTuple B)) (singleton (mkTuple C))))
- The CVC4 option is important:
(set-option :finite-model-find true)
use finite model finding heuristic for quantifier instantiation. - To extract the solution just take every tuple from the
solSet
and match it with the same variables used at generation time. - Not modeled here, but it is possible to have no solution for certain variables. In order to still provide the user with useful feedback, we can forcefully add a
KBott
sort as a bottom, which ensures that the SMT solver will always find a solution. Then it's just a matter of post-processing to find the correct solution and return an error when needed.
syntax Exp ::= Id
syntax Ids ::= Id "," Ids | Id
syntax Exps ::= Exp "," Exps | Exp | Ids
rule A, B => .K
rule =>(amb(ids(A:Id, B:Ids), exps(A:Exp, B:Exps)), .K)
variant 1:
A:{Id}
B:{Ids}
variant 2:
A:{Exp}
B:{Exps}
Both solutions are valid, but the second one is more general than the first (all sorts are subsorts) so we choose variant 2, then prune all ambiguities. It's possible that in other parts of the rule we have additional constraints and end up choosing variant 1. Then we are still left with ambiguities, then we apply a filter for overloaded productions. Prod1 < Prod2 if all NT1 <= NT2. In this case ids < exps.
syntax Exp ::= Exp "*" Exp [left]
> Exp "+" Exp [left]
rule 1 + 2 * 3 => .K
rule =>(amb(plus(1, mul(2,3)), mul(plus(1,2),3)), .K)
we eliminate the second branch simply because mul is a direct parent of plus. Associativity looks at direct parent-child relations in the left hand side or right hand side only.
syntax KItem ::= "start"
configuration
<T>
<k> start </k>
<env> 1 |-> ListItem(2) ListItem(3) ListItem(4)
5 |-> ListItem(6) ListItem(7)
8 |-> ListItem(9) </env>
</T>
//rule <k> start => E1 </k> // lookup the first element from the mapping
// <env> 1 |-> ListItem(E1) LRest:List ...</env>
rule <k> start => {Mp [1]}:>List [0] </k>
<env> Mp </env>
//rule <k> start => Mp [1] [0] </k> // no sort annotation is desired
// <env> Mp </env>
Right now the Map.lookup and List.get don't have sort information:
syntax KItem ::= Map "[" KItem "]" [function, hook(MAP.lookup), klabel(Map:lookup), symbol]
syntax KItem ::= List "[" Int "]" [function, hook(LIST.get), klabel(List:get), symbol]
syntax List ::= ListItem(KItem) [function, functional, hook(LIST.element), klabel(ListItem), symbol]
syntax Map ::= Map Map [function, hook(MAP.concat), klabel(_Map_), symbol, left, assoc, comm, unit(.Map), element(_|->_)]
syntax List ::= List List [function, functional, hook(LIST.concat), klabel(_List_), symbol, left, smtlib(smt_seq_concat), assoc, unit(.List), element(ListItem)]
but with parametric sorts, they can be defined as:
syntax {K,V} V ::= Map{K,V} "[" K "]" [function, hook(MAP.lookup), klabel(Map:lookup), symbol]
syntax {S} S ::= List{S} "[" Int "]" [function, hook(LIST.get), klabel(List:get), symbol]
syntax {S} List{S} ::= ListItem(S) [function, functional, hook(LIST.element), klabel(ListItem), symbol]
syntax Map{K,V} ::= Map{K,V} Map{K,V} [function, hook(MAP.concat), klabel(_Map_), symbol, left, assoc, comm, unit(.Map), element(_|->_)]
syntax List{S} ::= List{S} List{S} [function, functional, hook(LIST.concat), klabel(_List_), symbol, left, smtlib(smt_seq_concat), assoc, unit(.List), element(ListItem)]
From the configuration I should be able to do a bottom-up traversal and conclude that the sort of cell is Map{Int,List{Int}}
.
For the rules, I have to do a top-down traversal which carries the sort restrictions for every child term, and a bottom-up traversal to determine the sort of each production by finding the LUB of all children.
Parsing becomes interesting, because a full type erasure may introduce too many ambiguities. Maybe a partial type erasure is better, where we keep the head of the sort (Map{Int, List{Int}}
-> Map
) and keep low the productions we try at each step.