Add a section to the manual for Cryptol->Coq extraction.

Other minor manual fixes.
GaloisInc · Nov 10, 2021 · 58fcd63 · 58fcd63
1 parent a12b6b1
commit 58fcd63
Show file tree

Hide file tree

Showing 3 changed files with 249 additions and 29 deletions.
diff --git a/doc/manual/galois-whitepaper.cls b/doc/manual/galois-whitepaper.cls
@@ -82,7 +82,7 @@
 \rhead{\sffamily \textit{\@title{}}}
 \makeatother
 
-\lfoot{\sffamily\color{lightgray}\small\copyright{} 2019 Galois, Inc.}
+\lfoot{\sffamily\color{lightgray}\small\copyright{} 2021 Galois, Inc.}
 
 \rfoot{
   % \sffamily \color{galoislightgray}\small

diff --git a/doc/manual/manual.md b/doc/manual/manual.md
@@ -74,7 +74,7 @@ command-line options:
 
 `-b path, --java-bin-dirs`
 
-  ~ Specfy a colon-delimited list of paths to search for a Java
+  ~ Specify a colon-delimited list of paths to search for a Java
     executable.
 
 `-d num, --sim-verbose=num`
@@ -498,7 +498,7 @@ understand the role of the Cryptol language in SAW.
 
 # Cryptol and its Role in SAW
 
-Cyptol is a domain-specific language originally designed for the
+Cryptol is a domain-specific language originally designed for the
 high-level specification of cryptographic algorithms. It is general
 enough, however, to describe a wide variety of programs, and is
 particularly applicable to describing computations that operate on
@@ -1082,19 +1082,19 @@ Whether proving small lemmas (in the form of rewrite rules) or a
 top-level theorem, the process builds on the idea of a *proof script*
 that is run by one of the top level proof commands.
 
-* `prove_print : ProofScript SatResult -> Term -> TopLevel Theorem`
+* `prove_print : ProofScript () -> Term -> TopLevel Theorem`
 takes a proof script (which we'll describe next) and a `Term`. The
 `Term` should be of function type with a return value of `Bool` (`Bit`
 at the Cryptol level). It will then use the proof script to attempt to
 show that the `Term` returns `True` for all possible inputs. If it is
 successful, it will print `Valid` and return a `Theorem`. If not, it
 will abort.
 
-* `sat_print : ProofScript SatResult -> Term -> TopLevel ()` is similar
+* `sat_print : ProofScript () -> Term -> TopLevel ()` is similar
 except that it looks for a *single* value for which the `Term` evaluates
 to `True` and prints out that value, returning nothing.
 
-* `prove_core : ProofScript SatResult -> String -> TopLevel Theorem`
+* `prove_core : ProofScript () -> String -> TopLevel Theorem`
 proves and returns a `Theorem` from a string in SAWCore syntax.
 
 ## Automated Tactics
@@ -1197,11 +1197,11 @@ provers is to unfold everything before sending a goal to a prover.
 However, with some provers it is possible to indicate that specific
 named subterms should be represented as uninterpreted functions.
 
-* `unint_cvc4 : [String] -> ProofScript SatResult`
+* `unint_cvc4 : [String] -> ProofScript ()`
 
-* `unint_yices : [String] -> ProofScript SatResult`
+* `unint_yices : [String] -> ProofScript ()`
 
-* `unint_z3 : [String] -> ProofScript SatResult`
+* `unint_z3 : [String] -> ProofScript ()`
 
 The list of `String` arguments in these cases indicates the names of the
 subterms to leave folded, and therefore present as uninterpreted
@@ -1218,15 +1218,15 @@ In addition to the built-in automated provers already discussed, SAW
 supports more generic interfaces to other arbitrary theorem provers
 supporting specific interfaces.
 
-* `external_aig_solver : String -> [String] -> ProofScript SatResult`
+* `external_aig_solver : String -> [String] -> ProofScript ()`
 supports theorem provers that can take input as a single-output AIGER
 file. The first argument is the name of the executable to run. The
 second argument is the list of command-line parameters to pass to that
 executable. Any element of this list equal to `"%f"` will be replaced
 with the name of the temporary AIGER file generated for the proof goal.
 The output from the solver is expected to be in DIMACS solution format.
 
-* `external_cnf_solver : String -> [String] -> ProofScript SatResult`
+* `external_cnf_solver : String -> [String] -> ProofScript ()`
 works similarly but for SAT solvers that take input in DIMACS CNF format
 and produce output in DIMACS solution format.
 
@@ -1237,15 +1237,15 @@ until a later time, there are functions to write the current goal to a
 file in various formats, and then assume that the goal is valid through
 the rest of the script.
 
-* `offline_aig : String -> ProofScript SatResult`
+* `offline_aig : String -> ProofScript ()`
 
-* `offline_cnf : String -> ProofScript SatResult`
+* `offline_cnf : String -> ProofScript ()`
 
-* `offline_extcore : String -> ProofScript SatResult`
+* `offline_extcore : String -> ProofScript ()`
 
-* `offline_smtlib2 : String -> ProofScript SatResult`
+* `offline_smtlib2 : String -> ProofScript ()`
 
-* `offline_unint_smtlib2 : [String] -> String -> ProofScript SatResult`
+* `offline_unint_smtlib2 : [String] -> String -> ProofScript ()`
 
 These support the AIGER, DIMACS CNF, shared SAWCore, and SMT-Lib v2
 formats, respectively. The shared representation for SAWCore is
@@ -1259,23 +1259,26 @@ listed in its first argument as uninterpreted functions.
 Some proofs can be completed using unsound placeholders, or using
 techniques that do not require significant computation.
 
-* `assume_unsat : ProofScript SatResult` indicates that the current goal
-should be assumed to be unsatisfiable. At the moment,
-`jvm_verify` and `llvm_verify` (described below) run
-their proofs in a satisfiability-checking (negated) context, so
-`assume_unsat` indicates that the property being checked should be
-assumed to be true. This is likely to change in the future.
+* `assume_unsat : ProofScript ()` indicates that the current goal
+should be assumed to be unsatisfiable. This is an alias for
+`assume_valid`. Users should prefer to use `admit` instead.
 
-* `assume_valid : ProofScript ProofResult` indicates that the current
-goal should be assumed to be valid.
+* `assume_valid : ProofScript ()` indicates that the current
+goal should be assumed to be valid.  Users should prefer to
+use `admit` instead
 
-* `quickcheck : Int -> ProofScript SatResult` runs the goal on the given
+* `admit : String -> ProofScript ()` indicates that the current
+goal should be assumed to be valid without proof. The given
+string should be used to record why the user has decided to
+assume this proof goal.
+
+* `quickcheck : Int -> ProofScript ()` runs the goal on the given
 number of random inputs, and succeeds if the result of evaluation is
 always `True`. This is unsound, but can be helpful during proof
 development, or as a way to provide some evidence for the validity of a
 specification believed to be true but difficult or infeasible to prove.
 
-* `trivial : ProofScript SatResult` states that the current goal should
+* `trivial : ProofScript ()` states that the current goal should
 be trivially true (i.e., the constant `True` or a function that
 immediately returns `True`). It fails if that is not the case.
 
@@ -1354,9 +1357,6 @@ operate on AIGs.
 "blasting" all of its primitive operations (things like bit-vector
 addition) down to the level of individual bits.
 
-* `cec : AIG -> AIG -> TopLevel ProofResult` compares two `AIG` values,
-returning a `ProofResult` representing whether the two are equivalent.
-
 * `load_aig : String -> TopLevel AIG` loads an `AIG` from an external
 AIGER file.
 
@@ -2757,3 +2757,223 @@ let main : TopLevel () = do {
 ~~~~
 
 [^4]: https://en.wikipedia.org/wiki/Salsa20
+
+
+# Extraction to the Coq theorem prover
+
+In addition to the (semi-)automatic and compositional proof modes
+already discussed above, SAW has experimental support for exporting
+Cryptol and SAWCore values as terms to the Coq proof assistant[^5].
+This is intended to support more manual proof efforts for properties
+that go beyond what SAW can support (for example, proofs requiring
+induction) or for connecting to preexisting formalizations in Coq
+of useful algorithms (e.g. the fiat crypto library[^6]).
+
+This support consists of two related pieces. The first piece is a library of
+formalizations of the primitives underlying Cryptol and SAWCore and
+various supporting concepts that help bridge the conceptual gap
+between SAW and Coq.  The second piece is a term translation
+that maps the syntactic forms of SAWCore onto corresponding
+concepts in Coq syntax, designed to dovetail with the concepts
+defined in the support library.  SAWCore is a quite similar language
+to the core calculus underlying Coq, so much of this translation is
+quite straightforward; however, the languages are not exactly
+equivalent, and there are some tricky cases that mostly arise from
+Cryptol code that can only be partially supported.  We will note
+these restrictions later in the manual.
+
+We expect this extraction process to work with a fairly wide range of Coq
+versions, as we are not using bleeding-edge Coq features. It has been
+most fully tested with Coq version 8.13.2.
+
+## Support Library
+
+In order to make use of SAW's extraction capabilities, one must first
+compile the support library using Coq so that the included definitions
+and theorems can be referenced by the extracted code. From the top
+directory of the SAW source tree, the source code for this support
+library can be found in the `saw-core-coq/coq` subdirectory.
+In this subdirectory you will find a `_CoqProject` and a `Makefile`.
+A simple `make` invocation should be enough to compile
+all the necessary files, assuming Coq is installed and `coqc` is
+available in the user's `PATH`. HTML documentation for the support
+library can also be generated by `make html` from the same directory.
+
+Once the library is compiled, the recommended way to import
+it into your subsequent development is by adding the following
+lines to your `_CoqProject` file:
+
+```
+-Q <SAWDIR>/saw-core-coq/coq/generated/CryptolToCoq CryptolToCoq
+-Q <SAWDIR>/saw-core-coq/coq/handwritten/CryptolToCoq CryptolToCoq
+```
+
+Here `<SAWDIR>` refers to the location on your system where the
+SAWScript source tree is checked out. This will add the relevant
+library files to the `CryptolToCoq` namespace, where the extraction
+process will expect to find them.
+
+The support library for extraction is broken into two parts: those
+files which are handwritten, versus those that are automatically
+generated. The handwritten files are generally fairly readable and
+are reasonable for human inspection; they define most of the
+interesting pipe-fitting that allows Cryptol and SAWCore definitions
+to connect to corresponding Coq concepts.  In particular the
+file `SAWCoreScaffolding.v` file defines most of the bindings of base
+types to Coq types, and the `SAWCoreVectorsAsCoqVectors.v` defines the
+core bitvector operations. The automatically generated files are direct
+translations of the SAWCore source files
+(`saw-core/prelude/Prelude.sawcore` and
+`cryptol-saw-core/saw/Cryptol.sawcore`) that correspond to the
+standard libraries for SAWCore and Cryptol, respectively.
+
+The autogenerated files are intended to be kept up-to-date with
+changes in the corresponding `sawcore` files, and end users should
+not need to generate them. Nonetheless, if they are out of sync for some
+reason, these files may be regenerated using the `saw` executable
+by running `(cd saw-core-coq/coq; saw saw/generate_scaffolding.saw)`
+from the top-level of the SAW source directory before compiling them
+with Coq as described above.
+
+You may also note some additional files and concepts in the standard
+library, such as `CompM.v`, and a variety of lemmas and definitions
+related to it. These definitions are related to the "heapster" system,
+which form a separate use-case for the SAWCore to Coq translation.
+These definitions will not be used for code extracted from Cryptol.
+
+## Cryptol module extraction
+
+There are several modes of use for the SAW to Coq term extraction
+facility, but the easiest to use is whole Cryptol module extraction.
+This will extract all the definitions in the given Cryptol module,
+together with it's transitive dependencies, into a single Coq module
+which can then be compiled and pulled into subsequence developments.
+
+Suppose we have a Cryptol source file named `source.cry` and we want
+to generate a Coq file named `output.v`.  We can accomplish this by
+running the following commands in saw (either directly from the `saw`
+command prompt, or via a script file)
+
+```
+enable_experimental;
+write_coq_cryptol_module "source.cry" "output.v" [] [];
+```
+
+In this default mode, identifiers in the Cryptol source will be
+directly translated into identifiers in Coq. This may occasionally
+cause problems if source identifiers clash with Coq keywords or
+preexisting definitions. The third argument to
+`write_coq_cryptol_module` can be used to remap such names if
+necessary by giving a list of `(in,out)` pairs of names.  The fourth
+argument is a list of source identifiers to skip translating, if
+desired.  Authoritative online documentation for this command can be
+obtained directly from the `saw` executable via `:help
+write_coq_cryptol_module` after `enable_experimental`.
+
+The resulting "output.v" file will have some of the usual hallmarks of
+computer-generated code; it will have poor formatting and, explicit
+parenthesis and fully-qualified names.  Thankfully, once loaded into
+Coq, the Coq pretty-printer will do a much better job of rendering
+these terms in a somewhat human-readable way.
+
+## Proofs involving uninterpreted functions
+
+It is possible to write a Cryptol module that references uninterpreted
+functions by using the `primitive` keyword to declare them in your
+Cryptol source. Primitive Cryptol declarations will be translated into
+Coq section variables; as usual in Coq, uses of these section
+variables will be translated into additional parameters to the
+definitions from outside the section.  In this way, consumers of
+the translated module can instantiate the declared Cryptol functions
+with corresponding terms in subsequent Coq developments.
+
+Although the Cryptol interpreter itself will not be able to compute
+with declared but undefined functions of this sort, they can be used
+both to provide specifications for functions to be verified with
+`llvm_verify` or `jvm_verify` and also for Coq extraction.
+
+For example, if I write the following Cryptol source file:
+
+```
+primitive f : Integer -> Integer
+
+g : Integer -> Bool
+g x = f (f x) > 0
+```
+
+After extraction, the generated term `g` will have Coq type:
+
+```
+(Integer -> Integer) -> Integer -> Bool
+```
+
+## Translation limitations and caveats
+
+Translation from Cryptol to Coq has a number of fundamental
+limitations that must be addressed. The most severe of these is that
+Cryptol is a fully general-recursive language, and may exhibit runtime
+errors directly via calls to the `error` primitive, or via partial
+operations (such as indexing a sequence out-of-bounds).  The internal
+language of Coq, by contrast, is a strongly-normalizing language of
+total functions.  As such, our translation is unable to extract all
+Cryptol programs.
+
+### Recursive programs
+
+The most severe of the currently limitations for our system
+is that the translation is unable to translate any recursive Cryptol
+program.  Doing this would require us to attempt to find some
+termination argument for the recursive function sufficient to satisfy
+Coq; for now, no attempt is made to do so. if you attempt to
+extract a recursive function, SAW will produce an error
+about a "malformed term" with `Prelude.fix` as the head symbol.
+
+Certain limited kinds of recursion are available via the
+`foldl` Cryptol primitive operation, which is translated directly
+into a fold operation in Coq. This is sufficient for many basic
+iterative algorithms.
+
+### Type coercions
+
+Another limitation of the translation system is that Cryptol uses SMT
+solvers during its typechecking process and uses the results of solver
+proofs to justify some of its typing judgments. When extracting these
+terms to Coq, type equality coercions must be generated. Currently, we
+do not have a good way to transport the reasoning done inside
+Cryptol's typechecker into Coq, so we just supply a very simple `Ltac`
+tactic to discharge these coercions (see `solveUnsafeAssert` in
+`CryptolPrimitivesForSAWCoreExtra.v`).  This tactic is able to
+discover simple coercions, but for anything nontrivial it may
+fail. The result will be a typechecking failure when compiling the
+generated code in Coq when the tactic fails. If you encounter this
+problem, it may be possible to enhance the `solveUnsafeAssert` tactic
+to cover your use case.
+
+### Error terms
+
+A final caveat that is worth mentioning is that Cryptol can sometimes
+produce runtime errors. These can arise from explicit calls to the
+`error` primitive, or from partially defined operations (e.g.,
+division by zero or sequence access out of bounds).  Such instances
+are translated to occurrences of an unrealized Coq axiom named `error`.
+In order to avoid introducing an inconsistent environment, the `error`
+axiom is restricted to apply only to inhabited types. All the types
+arising from Cryptol programs are inhabited, so this is no problem
+in principle. However, collecting and passing this information around
+on the Coq side is a little tricky.
+
+The main technical device we use here is the `Inhabited` type class;
+it simply asserts that a type has some distinguished inhabitant.  We
+provide instances for the base types and type constructors arising
+from Cryptol, so the necessary instances ought to be automatically
+constructed when needed.  However, polymorphic Cryptol functions add
+some complications, as type arguments must also come together with
+evidence that they are inhabited. The translation process takes care
+to add the necessary `Inhabited` arguments, so everything ought to
+work out. However, if Coq typechecking of generated code fails with
+errors about `Inhabited` class instances, it likely represents some
+problem with this aspect of the translation.
+
+
+[^5]: https://coq.inria.fr
+[^6]: https://github.com/mit-plv/fiat-crypto
diff --git a/doc/manual/manual.pdf b/doc/manual/manual.pdf