Skip to content

Commit

Permalink
don't include internal declarations, README
Browse files Browse the repository at this point in the history
  • Loading branch information
kim-em committed Nov 22, 2023
1 parent 40cfeb2 commit 86ba39a
Show file tree
Hide file tree
Showing 4 changed files with 49 additions and 4 deletions.
18 changes: 18 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ We would like to enable sharing of data and easy access to the Lean libraries fo
: Export all goal / tactic pairs from the library, with additional context.
`--proofstep` outputs in `[GOAL]...[PROOFSTEP]...` format.

`comment_data`
: Export all declaration / type / doc-string tuples from the library, with additional context.

`tactic_benchmark`
: Run a tactic against every declaration in the library, tracking runtimes and successes.

Expand Down Expand Up @@ -195,6 +198,21 @@ Here:
There is also `scripts/training_data.sh`, which will run in parallel over all of Mathlib,
recording results in `out/training_data/`.

### `comment_data`

`lake exe comment_data Mathlib.Logic.Hydra` will output information about all doc-strings in a file.

The output is a json array of dictionaries with fields
* `declName`: the declaration name
* `declType`: the pretty-printed type of the declaration
* `docString`: the declaration doc-string, if it is present
* `decl`: the entire body of the declaration
* `context`: the file source up to before the declaration
(this currently does not include the imports)

There is also `scripts/comment_data.sh`, which will run in parallel over all of Mathlib,
recording results in `out/comment_data/`.

### `tactic_benchmark`

`lake exe tactic_benchmark --aesop Mathlib.Topology.ContinuousFunction.Ordered`
Expand Down
26 changes: 22 additions & 4 deletions TrainingData/Frontend.lean
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ The functions `compileModule : Name → IO (List CompilationStep)` and

set_option autoImplicit true

open Lean Elab Frontend
open Lean Elab Frontend Meta

namespace Lean.PersistentArray

Expand Down Expand Up @@ -71,6 +71,22 @@ partial def runStateRefT [Monad m] [MonadLiftT (ST ω) m] (L : MLList (StateRefT

end MLList

private def isInternal' (declName : Name) : Bool :=
declName.isInternal ||
match declName with
| .str _ s => "match_".isPrefixOf s || "proof_".isPrefixOf s
| _ => true

-- from Lean.Server.Completion
private def isBlackListed {m} [Monad m] [MonadEnv m] (declName : Name) : m Bool := do
if declName == ``sorryAx then return true
if declName matches .str _ "inj" then return true
if declName matches .str _ "noConfusionType" then return true
let env ← getEnv
pure $ isInternal' declName
|| isAuxRecursor env declName
|| isNoConfusion env declName
<||> isRec declName <||> isMatcher declName
namespace Lean.Elab.IO

/--
Expand Down Expand Up @@ -138,11 +154,13 @@ structure DeclInfo where

/-- Return info about each new declaration added during the processed command. -/
def newDecls (cmd : CompilationStep) : IO (List DeclInfo) := do
cmd.diff.mapM fun ci =>
return {
cmd.diff.filterMapM fun ci => cmd.runMetaMBefore do
if ← isBlackListed ci.name then
pure none
else pure <| some {
name := ci.name
type := ci.type
ppType := toString (← cmd.runMetaMBefore <| Meta.ppExpr ci.type)
ppType := toString (← Meta.ppExpr ci.type)
docString := ← findDocString? cmd.after ci.name
}

Expand Down
8 changes: 8 additions & 0 deletions make.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@
DATE=`date "+%Y-%m-%d"`
mkdir -p out

lake build

lake exe declaration_types > out/$DATE-declaration_types
gzip -f out/$DATE-declaration_types

Expand Down Expand Up @@ -37,5 +39,11 @@ cd out
tar cvzf $DATE-proof_step.tgz training_data
cd ..

rm -rf out/comment_data
./scripts/comment_data.sh
cd out
tar cvzf $DATE-comment_data.tgz comment_data
cd ..

rm -rf out/tactic_benchmark
./scripts/tactic_benchmark.sh --aesop
1 change: 1 addition & 0 deletions scripts/comment_data.lean
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ The output is a json array of dictionaries with fields
* `docString`: the declaration doc-string, if it is present
* `decl`: the entire body of the declaration
* `context`: the file source up to before the declaration
(this currently does not include the imports)
"

ARGS:
Expand Down

0 comments on commit 86ba39a

Please sign in to comment.