Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Prelude extractor #5621

Merged
merged 57 commits into from
Aug 30, 2024
Merged

Conversation

RustanLeino
Copy link
Collaborator

Description

This PR adds a way to extract a Dafny model into set of Boogie declarations. This new feature can be of general use, but its main application is to justify various axioms in Dafny's DafnyPrelude.bpl by providing a model for them.

Impact on the Dafny tool

The model for Dafny's underlying type Seq is given in Source/DafnyCore/Prelude/Sequences.dfy.

The file Source/DafnyCore/DafnyPrelude.bpl should no longer be edited by hand. Instead, it is now generated from Source/DafnyCore/Prelude/PreludeCore.bpl and from the model Sequences.dfy (and other models in the future, for example for multisets).

To generate a new version of DafnyPrelude.bpl, run make extract in the Source/DafnyCore folder. This will run make in the Source/DafnyCore/Prelude folder and will then copy the generated Prelude/DafnyPrelude.bpl to DafnyPrelude.bpl.

CLI option

This PR adds a CLI option --extract:<file> (where <file> is allowed to be - to denote standard output, just like for the various print options). This option works with the dafny verify command.

Caveats:

This gets the job done, but is probably not the best way of doing it. For instance, one could imagine a dafny extract command that performs this task.

I'm unsure of the code I added to handle this CLI option. In particular, I don't understand if I have added it just for the dafny verify command or also for other commands, and I don't understand if I have made the option available via the language server (or if we care).

Changes to DafnyPrelude.bpl

This PR contains changes to DafnyPrelude.bpl. The claim is that these changes make no semantic difference. The changes are:

  • differences in whitespace
  • other changes to format the code like the Boogie /print option does (for example, s: Seq, t: Seq instead of s, t: Seq)
  • in the Seq part of DafnyPrelude.bpl, the declarations that are not part of the model in Sequences.dfy have
    been moved to the end, so that the extracted declarations can be in one contiguous part of the file

The third of these bullets do change the order in which things are sent to the SMT solver, and thus this may impact some users.

Extract mechanism

Upon successful verification, the new extract mechanism added by this PR will visit the AST of the given program. For any module marked with {:extract}, the extract-worthy material from the module is output. The output declarations will be in the same order as they appear textually in the module (in particular, the fact that module-level Dafny declarations are collected in an internal class _default has no bearing on the output order).

Three kinds of declarations are extract-worthy:

  • A type declaration A<X, Y, Z> that bears an attribute {:extract_name B} is extracted into a Boogie type declaration type B _ _ _;.

    The definition of the type is ignored. (The intended usage for an extracted type is that the Dafny program give a definition for the type, which goes to show the existence of such a type.)

  • A function declaration F(x: X, y: Y): Z that bears an attribute {:extract_name G} is extracted into a Boogie function declaration function G(x: X, y: Y): Z;.

    The body of the Dafny function is ignored. (The intended usage for an extracted function is that the Dafny program give a definition for the function, which goes to show the existence of such a function.)

  • A lemma declaration L(x: X, y: Y) requires P ensures Q that bears an attribute {:extract_pattern ...} or an attribute {:extract_used_by ...} is extracted into a Boogie axiom. The axiom has the basic form axiom (forall x: X, y: Y :: P ==> Q);.

    If the lemma has an attribute {:extract_used_by F}, then the axiom will be emitted into the uses clause of the Boogie function generated for Dafny function F.

    If the lemma has no in-parameters, the axiom is just P ==> Q.

    If the lemma has in-parameters, then any attribute {:extract_pattern E, F, G} adds a matching pattern { E, F, G } to the emitted quantifier. Also, any attribute {:extract_attribute "name", E, F, G} adds an attribute {:name E, F, G} to the quantifier.

Expressions

The pre- and postconditions of extracted lemmas turn into analogous Boogie expressions, and the types of function/lemma parameters and bound variables are extracted into analogous Boogie types. The intended usage of the extract mechanism is that these expressions and types do indeed have analogous Boogie types.

At this time, only a limited set of expressions and types are supported, but more can be added in the future.

Any forall and exists quantifiers in expressions are allowed to use :extract_pattern and :extract_attribute attributes, as described above for lemmas.

Some extracted expressions are simplified. For example, true && !!P is simplified to P.

Soundness

The Dafny program that is used as input for the extraction is treated like any other Dafny program. The intended usage of the extraction mechanism is to prove parts of the axiomatization in DafnyPrelude.bpl to be logically consistent. Whether or not the extracted Boogie declarations meet this goal depends on the given Dafny program. For example, if the given Dafny program formalizes sequences in terms of maps and formalizes maps in terms of sequences, then the extraction probably does not provide guarantees of consistency.

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

The difference between the legacy/refresh resolver runs is that some bound variables and an empty set are typed with …Node… in legacy and …Node?… in refresh.
Copy link
Member

@atomb atomb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks great. The one change I think would be good to include is the one you suggest in the summary: to make it a separate top-level command. Along the way, I think it might be better to use a more descriptive term than "extract". I was thinking "extract-boogie-axioms", though it could be just "extract-axioms" if there's a possibility it might support other languages eventually.

@@ -28,6 +36,13 @@ public static class VerifyCommand {
public static readonly Option<string> FilterPosition = new("--filter-position",
@"Filter what gets verified based on a source location. The location is specified as a file path suffix, optionally followed by a colon and a line number. For example, `dafny verify dfyconfig.toml --filter-position=source1.dfy:5` will only verify things that range over line 5 in the file `source1.dfy`. In combination with `--isolate-assertions`, individual assertions can be verified by filtering on the line that contains them. When processing a single file, the filename can be skipped, for example: `dafny verify MyFile.dfy --filter-position=:23`");

public static readonly Option<string> ExtractTarget = new("--extract", @"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer for this to be a top-level command, and perhaps a little more explanatory. One possibility would be dafny extract-boogie-axioms. (The types and functions are missing definitions, and are essentially axioms, as well.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible downside is that I'm not sure whether commands can be hidden...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I, too, would like to see this as a top-level command. However, since I don't understand how commands are added to the source or how we make a command do certain things, I'm leaving this as is for now. I will ask @keyboardDrummer for help to make such a change in a separate PR.

Source/DafnyDriver/Commands/VerifyCommand.cs Outdated Show resolved Hide resolved
@@ -21,19 +21,19 @@ Fuel.legacy.dfy(324,21): Related location
Fuel.legacy.dfy(313,41): Related location
Fuel.legacy.dfy(335,26): Error: function precondition could not be proved
Fuel.legacy.dfy(324,21): Related location
Fuel.legacy.dfy(314,72): Related location
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someday I want to refactor this test so that the output doesn't get reordered every time we breathe on the verifier. :)

invariant forall j :: 0 <= j < i ==> j % 2 == 1 ==> ret[j] == "odd"
invariant forall j :: 0 <= j < i ==> j % 2 == 0 ==> ret[j] == "even"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm continually amazed by the edits that can resolve brittleness issues!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, here, I think it picked a different path or picked a different failing value for j.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... or what can cause them in the first place (here, some reordering of axioms in DafnyPrelude.bpl).

import opened Boxes

// boogie: type Seq;
type {:extract_name "Seq"} Seq = List<Box>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type {:extract_name "Seq"} Seq = List<Box>
type {:boogie_name "Seq"} Seq = List<Box>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And so on, elsewhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed :extract_name to :extract_boogie_name, and changed (the unused) :extract to :extract_boogie. I also modified the source to define these as constants in BoogieExtractor.cs, so they can easily be renamed in the future.

// Seq#Contains(Seq#Drop(s, n), x) <==>
// (exists i: int :: { Seq#Index(s, i) }
// 0 <= n && n <= i && i < Seq#Length(s) && Seq#Index(s, i) == x));
lemma {:extract_pattern Contains(Drop(s, n), x)} DropContains(s: Seq, n: int, x: Box)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tempted to have a different name for this attribute, too, though I'm not sure what...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left these as is. If we think of a better naming scheme in the future, we can change them then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe BoogieExtractor.cs?

return null;
}

private Boogie.Expr ExtractExpr(Expression expr) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a possibility of making this share more with the verifier?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't clear. For example, the verifier doesn't offer to rename types and functions using the extract names. At least the subset of Dafny used in these model files will be small.

One thing that is shared is the pretty printing of the Boogie itself. For example, BoogieExtractor.cs here does not need to worry about operator precedence in Boogie, since the Boogie pretty printer will add parentheses where they are needed.

RustanLeino and others added 5 commits August 20, 2024 14:18
# Conflicts:
#	Source/DafnyDriver/Commands/VerifyCommand.cs
#	Source/IntegrationTests/TestFiles/LitTests/LitTest/dafny1/SchorrWaite.dfy.expect
#	Source/IntegrationTests/TestFiles/LitTests/LitTest/dafny1/SchorrWaite.dfy.refresh.expect
atomb
atomb previously approved these changes Aug 21, 2024
Copy link
Member

@atomb atomb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm excited about this!

# Conflicts:
#	Source/IntegrationTests/TestFiles/LitTests/LitTest/dafny0/SubsetTypes.dfy.expect
@RustanLeino RustanLeino enabled auto-merge (squash) August 30, 2024 00:46
@RustanLeino RustanLeino merged commit b3f70a8 into dafny-lang:master Aug 30, 2024
22 checks passed
@keyboardDrummer
Copy link
Member

I'm a bit spooked by the edits to SchorrWaite.dfy. The method SchorrWaite now has both {:isolate_assertions} and focus blocks, so I imagine there's large amount of assertions that are verified as asserts, multiple times.

@@ -21,6 +22,13 @@ static VerifyCommand() {
OptionRegistry.RegisterOption(FilterSymbol, OptionScope.Cli);
OptionRegistry.RegisterOption(FilterPosition, OptionScope.Cli);
OptionRegistry.RegisterOption(PerformanceStatisticsOption, OptionScope.Cli);
OptionRegistry.RegisterOption(ExtractTarget, OptionScope.Cli);

DafnyOptions.RegisterLegacyBinding(ExtractTarget, (options, f) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using the legacy way of introducing new options here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I've tried, I still don't understand where to put these options to make them fire for the right commands. I would very much welcome an improvement of how extraction is handled. As mentioned above, it would be nice if it were a verb extract rather than an option for the verify verb.

@@ -35,6 +43,13 @@ static VerifyCommand() {
public static readonly Option<string> FilterPosition = new("--filter-position",
@"Filter what gets verified based on a source location. The location is specified as a file path suffix, optionally followed by a colon and a line number. For example, `dafny verify dfyconfig.toml --filter-position=source1.dfy:5` will only verify things that range over line 5 in the file `source1.dfy`. In combination with `--isolate-assertions`, individual assertions can be verified by filtering on the line that contains them. When processing a single file, the filename can be skipped, for example: `dafny verify MyFile.dfy --filter-position=:23`");

public static readonly Option<string> ExtractTarget = new("--extract", @"
Extract Dafny types, functions, and lemmas to Boogie.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do? It's still not clear to me how this is different from --bprint. Does this ignore the input program and only output something relates to Dafny internals? If so, why is it part of dafny verify ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extraction is the output that the Dafny tool will incorporate into DafnyPrelude.bpl. These are not verification conditions but axioms and other definitions. In contrast, --bprint shows the verification conditions that are sent to Boogie.

To accomplish its task, extraction pays attention to :extract_boogie_name attributes and renames things accordingly. Extraction turns lemmas into universally quantified axioms (whereas VC generation turns them into Boogie procedures). Finally, extraction looks at the input Dafny expressions as they were Boogie expressions, more or less. So, the extraction output is free of well-formedness conditions and $Is predicates and other such things that are part of VC generation.

var extractedProgram = BoogieExtractor.Extract(resolution.ResolvedProgram);
engine.PrintBplFile(options.BoogieExtractionTargetFile, extractedProgram, true, pretty: true);
} catch (ExtractorError extractorError) {
options.OutputWriter.WriteLine($"Boogie axiom extraction error: {extractorError.Message}");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you resolve these warnings?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying to remove this try catch? These errors are generated if the input to the extractor is invalid, so they need to stay.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the build was outputting warnings because this was using WriteLine instead of WriteLineAsync

@RustanLeino RustanLeino deleted the prelude-extractor branch September 27, 2024 17:39
keyboardDrummer added a commit that referenced this pull request Sep 30, 2024
Touchup PR for #5621

### Description
- Change the --extractTarget option into a hidden command
- Make `--help` work when used on hidden commandds

### How has this been tested?
- Ran `make extract` in DafnyCore

<small>By submitting this pull request, I confirm that my contribution
is made under the terms of the [MIT
license](https://github.com/dafny-lang/dafny/blob/master/LICENSE.txt).</small>
RustanLeino added a commit that referenced this pull request Dec 9, 2024
## Description

This PR adds models for `set` and `multiset` and proves from those
models the consistency of the `set`/`multiset` axioms in the Dafny
prelude. This is accomplished using the extract-to-Boogie functionality
that was added in #5621, which had also defined a model for `seq`.

In addition to adding and using these models, the PR

* Improves `Source/DafnyCore/Prelude/Makefile`
    - Put generated files into `Output` folder
- Add `SourceModuleName` parameter to `dafny extract SourceModuleName
TargetBoogieFile DafnyFiles`
- Run extraction during CI and confirm that the checked-in
`DafnyPrelude.bpl` has been generated. (This was missing from before,
and when I started working on this PR, I found that Dafny had got out of
synch with the models and was crashing on them.)
* Fixes several occurrences in `DafnyPrelude.bpl` and the
`BoogieGenerator` where a `Set#...` symbol was used instead of
`ISet#...`. (This was previously undetected, because the `Set` and
`ISet` types were both synonyms for `[Box]bool`.
* Makes small changes to a few test files to make their proofs more
stable.

## Details of the models

In order to prove the extensionality property of `set` and `multiset`,
it's important to have a canonical representation of sets and multisets
in the model. To do this, I postulated (without a supporting model) a
total order on the type `Box`. (Such postulates are justified by the
Axiom of Choice. That seems okay, since Dafny's logic already allows you
to prove the Axiom of Choice.)

The aim was for the generated models to be exactly like the handwritten
ones. This was not possible, for minor annoying reasons:
- The exaction mechanism does not offer a way to output a Boogie type
synonym (the handwritten prelude had contained `type Set = [Box]bool;`).
- The handwritten declarations sometimes omitted parameter names and
sometimes gave one `: Type` to two variables. Boogie allows these, but
the extraction mechanism always generates `x: Type` for each parameter
and bound variable.

If token-by-token equality with the handwritten prelude was
non-negotiable, then these minor annoyances could have been fixed by
embellishing the extractor. However, there is a good reason to change
this part of the axiomatizations. Previously, both `set` and `multiset`
had been defined in terms of Boogie's built-in maps. Thus, to really
prove the models consistent, one would have to prove that the additional
axioms are a conservative extension of Boogie's maps. This is an
unnecessary proof burden. So, instead, the new `set` and `multiset`
models generate `Set` and `MultiSet` as uninterpreted types.

There are three consequences of using uninterpreted types rather than
Boogie's built-in map types:
- This affects other places in the handwritten `DafnyPrelude.bpl` that
assumed maps (for example, `Map#Domain(m)[x]` now is written
`Set#IsMember(Map#Domain(m), x)`) and affects what `BoogieGenerator`
emits. The PR includes these changes. They have been tested by running
the test suite to make sure the correct symbols are emitted.
- The `multiset` model used both Boogie's map-select (`_[_]`) operator
and its map-store (`_[_:=_]`) operator. Thus, the "select of store"
axioms had to be added to the model (where they are proved as lemmas).
- The `set` model used only the map-select operator. However, the
translation into Boogie also made use of Boogie's map comprehensions
(`lambda` binders). This is convenient for `BoogieGenerator`, because
Boogie performs lambda lifting on these `lambda` expressions. The right
thing to do is to move this lambda lifting into Dafny instead of Boogie,
and thus to not make use of Boogie's `lambda` expressions for Dafny set
comprehensions. This PR does _not_ do that and instead leaves this as a
`FIXME`. Instead, this PR adds a function `Set#FromBoogieMap` function,
which converts a Boogie `[Box]bool` map to a `Set`. This is _not_ sound
(indeed, this was not sound in the handwritten axioms, either, which I
realized as I was going along). The reason is that one can construct a
Boogie `[Box]bool` map that gives `true` for an infinite number of
`Box`es. As a temporary solution (before we implement the necessary
set-comprehension lifting directly in Dafny), this seems alright. First,
this is no worse than before, since `lambda` expressions were used
before, too. Second, to discover the unsoundness, one needs induction,
and neither Boogie nor the SMT solver knows about induciton.

Note that `iset` still uses the `[Box]bool` representation in Boogie,
since `iset` is not part of this PR.

## Missed opportunities

* The parameters and bound variables don't follow a consistent naming
convention. When I started writing the models, I tried to follow exactly
what was in the handwritten `DafnyPrelude.bpl`, rather than doing such
clean-up. However, now that we're changing a lot of things anyway, it
would have been nice to have cleaned up those names.
* In a similar way, I'm thinking it would be nice to change the
capitalization of `MultiSet` to `Multiset`.
* There's a predicate `$IsGoodMultiSet` in the model. I _think_ it had
been introduced in the handwritten `DafnyPrelude.bpl` to fix a soundness
issue with the axioms. Surprisingly, I was able to define it as `true`.
In other words, the predicate is effectively useless and the lemmas in
the model all hold even if this predicate were removed. That would be
nice, because it simplifies Dafny's axiomatization slightly. I did not
think about this deeply (because I couldn't find the unsoundness Issue
that I seem to recall that `$IsGoodMultiSet` was there to fix), but
think the reason it's okay to leave it out now is that the new multiset
axioms don't use Boogie's built-in map type and operators.

## Note about reviewing this PR

The `set` and `multiset` models comprise more than 2000 lines of Dafny.
However, they need not be reviewed line by line, since Dafny checks the
proofs. It suffices to review that no cheating is going on (e.g.,
`assume` statements or body-less lemmas), and this is easy since the
`Makefile` invokes `dafny extract` without `--allow-warnings` or the
like. It would be nice if I could also say "The definition of the
functions and the statement of the lemmas also don't have to be checked
in details--it suffices to check that the extracted Boogie declarations
are identical to the previous handwritten ones.". Alas, in addition to
whitespace changes and the minor annoyances mentioned above, the new
axioms use `Set#IsMember`, `MultiSet#Multiplicity`, and
`MultiSet#UpdateMultiplicity` in lieu of `_[_]`, `_[_]`, and `_[_:=_]`,
respectively. Regrettably, these changes make it much harder to compare
the new definitions with the old. If it's of any consolation, I made the
whitespace changes are very carefully as I could to avoid errors and did
the renamings only after I had made the whitespace changes. Plus, the
test suite still passes after these changes.

A good thing to think about during the review is if it is really okay
that `$IsGoodMultiSet` is defined to be `true`. It is not needed for
proving the other axioms. So why had it been introduced in the first
place? (I thought it was to fix a previously discovered unsoundness
among the `multiset` axioms, but I didn't find the exact Issue and its
PR fix.)

<small>By submitting this pull request, I confirm that my contribution
is made under the terms of the [MIT
license](https://github.com/dafny-lang/dafny/blob/master/LICENSE.txt).</small>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants