-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make grammar creation deterministic #89
Conversation
The docstrings say the parameter is deprecated, so I have removed it everywhere but external interfaces.
There was an `if` statement in `_create_grammar_str_from_dict()` to ensure `arrangement` would count as a start symbol. However, when `string_type` was set to `arrangement`, the line could never be run because the `if` statement was only executed when `string_type == message`. I modified the condition to run for either of these values of `string_type`.
The symbols are read into sets, which removes the original ordering present in the grammars. The order of symbols matters when parsing, so this would cause parsing to sometimes return different results or fail entirely on the same input. Re-sorting the symbols after the set operations prevents the failures and makes the output consistent across runs.
Thanks for merging! By the way, I did intend for this to merged without squashing and wrote the commit history carefully for that. For future PRs (it's obviously too late for this one now), should I note that if it's what I want? I had expected it here because previous PRs weren't squashed. I'm just not really sure about the conventions for this repo. |
Sorry about that! Lemme double check about whether that would conflict with our auto-versioning system. If it's fine I could revert the merge and we could re-merge without the squash if you'd like. I'll also update the PR template for people to specify whether they want to squash or not. |
It's not a big deal. I don't think it's worth the effort to do it now. Thanks for the offer, though! |
I was encountering a bug in our agent where the same input DAIDE was being parsed properly by
daidepp
in some cases but not in others. I assumed it was the fault of the agent, but I ended up creating a minimal reproduction using onlydaidep
:The string
PRP(XDO((ENG AMY LVP) MTO WAL))
should always be parsed to an object of typePRP
, but it was often being parsed to alist
instead. Running the above script repeatedly will show the output can differ.I ended up figuring out that
daidepp
was generating grammars non-deterministically across runs. Sometimes the grammar ended up leading to a different, incorrect parse, and I was able to even show that (with a low probability) parsing can fail entirely. I have modified this library to generate the grammar deterministically. I also fixed a related bug that prevented proper parsing of arrangements and made some smaller code changes to improve some problems I found during my work.Notes
To repeatedly run the above Python script, I used the following Bash script:
To view the grammar being generated, I used the following patch:
It assumes that a directory named
grammars
already exists. This specific diff used 6c10f70 as its base.