You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Agda evaluator isn't always completely up to date with the Haskell one, and because of this we have a feature that allows us to mark some conformance tests as being expected to fail. This uses expectFail from Test.Tasty.ExpectedFailure. For example, some of the bitwise builtins use a special size measure for budgeting that the metatheory doesn't know about at the time of writing and when you run cabal test agda-conformance you get things like this
Test suite agda-conformance: RUNNING...
UPLC evaluation tests
evaluation
builtin
semantics
writeBits
case-32
case-32 (evaluation): OK (0.07s)
case-32 (budget): FAIL (expected) (0.09s)
[etc. etc. etc.]
(the problem being that there are golden files containing the expected budget and the Agda evaluator currently produces results that disagree with these). When you run the haskell-confomance tests these all pass.
However (as discovered by @Unisay), if you run cabal test agda-conformance --test-option --accept then the incorrect budget results are accepted and the golden files are updated accordingly. This is arguably a bug in Test.Tasty.ExpectedFailure.
Now Test.Testy.ExpectedFailure also provides ignoreTest and when you use that instead of expectFail the output changes to
...
writeBits
case-32
case-32 (evaluation): OK (0.06s)
case-32 (budget): IGNORED
Futhermore, if you use --accept then the golden files are not accepted, which is what we want. The disadvantage to doing this is that with expectFailure, when we fix the Agda evaluator to behave identically to the Haskell one, the tests which are expected to fail now start to pass, which causes an error and reminds us to remove the formerly problematic tests from the list of tests which are expected to fail; with ingoreTest they just continue to be ignored, so we might forget to update the list of problematic tests.
We should workout whether the behaviour with expectFail and --accept really is a bug and report it if so. In the meantime we should maybe use ignoreTest instead of expectFail.
The text was updated successfully, but these errors were encountered:
The Agda evaluator isn't always completely up to date with the Haskell one, and because of this we have a feature that allows us to mark some conformance tests as being expected to fail. This uses
expectFail
from Test.Tasty.ExpectedFailure. For example, some of the bitwise builtins use a special size measure for budgeting that the metatheory doesn't know about at the time of writing and when you runcabal test agda-conformance
you get things like this(the problem being that there are golden files containing the expected budget and the Agda evaluator currently produces results that disagree with these). When you run the
haskell-confomance
tests these all pass.However (as discovered by @Unisay), if you run
cabal test agda-conformance --test-option --accept
then the incorrect budget results are accepted and the golden files are updated accordingly. This is arguably a bug inTest.Tasty.ExpectedFailure
.Now
Test.Testy.ExpectedFailure
also providesignoreTest
and when you use that instead ofexpectFail
the output changes toFuthermore, if you use
--accept
then the golden files are not accepted, which is what we want. The disadvantage to doing this is that withexpectFailure
, when we fix the Agda evaluator to behave identically to the Haskell one, the tests which are expected to fail now start to pass, which causes an error and reminds us to remove the formerly problematic tests from the list of tests which are expected to fail; withingoreTest
they just continue to be ignored, so we might forget to update the list of problematic tests.We should workout whether the behaviour with
expectFail
and--accept
really is a bug and report it if so. In the meantime we should maybe useignoreTest
instead ofexpectFail
.The text was updated successfully, but these errors were encountered: