You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm currently improving rule selection along with #41.
Rules are nested in up to three layers in the LanguageTool XML:
<category ...>
<rulegroup ...> <!-- can be omitted, there's also <rule>s at this level -->
<rule>
</rule>
</rulegroup>
</category>
At the moment the id is just a string with <group_name>|<rule_name>.<number>. I want to streamline this to allow easily disabling e. g. an entire category.
API
This is the API I currently have in mind for this. There will be one struct for each rule level ID:
It will only be possible to create CategoryIDs directly. It can then be joined to create IDs at different levels. The id() field of Rule will become an IndexID (currently a String).
Selector
The structures above won't do any work on their own. For that, there is a RuleSelector which, given an IndexID, determines if it matches. It can also be disabled to inverse the match:
and corresponding methods to create a selector from IDs at various levels:
implFrom<GroupID>forIDSelector{};// same for others
Selectors can also be cast to / from strings with the representation <category>/<group>/<index> e. g. "typos/verb_apostrophe_s/3" or "grammar". Selectors will be case insensitive.
Usage
Rust
The new user-facing RulesOptions (passed via .new_with_options(..., options: &RulesOptions)) will initially look like this:
structRulesOptions{selectors:Vec<IDSelector>,}
where selectors is a list of selectors which are applied in order, and selectors which are disabled by default for the language are implicitly prepended to the list. E. g. to disable all rules in the typos category but verb_apostrophe_s:
let rules = Rules::new_with_options("rules.bin",RulesOptions{selectors:vec![(CategoryID::new("typos").into(),false),(CategoryID::new("typos").join("verb_apostrophe_s").into(),true)],})
alternatively:
let rules = Rules::new_with_options("rules.bin",RulesOptions{selectors:vec![("typos".try_into().unwrap(),false),("typos/verb_apostrophe_s".try_into().unwrap(),true),],})
or, conversely, to enable all typos and only disable verb_apostrophe_s:
let rules = Rules::new_with_options("rules.bin",RulesOptions{selectors:vec![("typos".try_into().unwrap(),true),("typos/verb_apostrophe_s".try_into().unwrap(),false),],})
There will also be a method which returns all the currently used selectors, including the ones which are implicitly prepended by default:
and there will be a .selectors() method returning a list of tuples of selectors / enabled state.
All of this will only be implemented for the Rules, not for the Tokenizer. Rules in the Tokenizer are applied hierarchically so disabling one can have an effect on the others.
Points of discussion
I am not so sure about the selectors terminology and about the names CategoryID, GroupID and IndexID.
I'm quite happy with the abstraction itself. But it does add additional complexity and it might not be as intuitive at first glance as a simple enable / disable list of IDs. Although it is much more expressive (e. g. it is impossible to express the simple scenarios from above with an enable / disable list).
As always, I appreciate discussion about this a lot :). Writing it down helped me a lot to define the API (and it actually changed significantly while writing this), more discussion helps even more.
Update: Selectors will not have an on / off state. Instead, the selectors argument will be a vector of tuples of (selector, enabled). Having an on / off state makes it unclear how finding rules by selector should work.
The text was updated successfully, but these errors were encountered:
I'm currently improving rule selection along with #41.
Rules are nested in up to three layers in the LanguageTool XML:
At the moment the id is just a string with
<group_name>|<rule_name>.<number>
. I want to streamline this to allow easily disabling e. g. an entire category.API
This is the API I currently have in mind for this. There will be one struct for each rule level ID:
with conversions between them:
It will only be possible to create
CategoryID
s directly. It can then bejoin
ed to create IDs at different levels. Theid()
field ofRule
will become anIndexID
(currently aString
).Selector
The structures above won't do any work on their own. For that, there is a
RuleSelector
which, given anIndexID
, determines if it matches.It can also be disabled to inverse the match:and corresponding methods to create a selector from IDs at various levels:
Selector
s can also be cast to / from strings with the representation<category>/<group>/<index>
e. g."typos/verb_apostrophe_s/3"
or"grammar"
. Selectors will be case insensitive.Usage
Rust
The new user-facing
RulesOptions
(passed via.new_with_options(..., options: &RulesOptions)
) will initially look like this:where
selectors
is a list of selectors which are applied in order, and selectors which are disabled by default for the language are implicitly prepended to the list. E. g. to disable all rules in thetypos
category butverb_apostrophe_s
:alternatively:
or, conversely, to enable all
typos
and only disableverb_apostrophe_s
:There will also be a method which returns all the currently used selectors, including the ones which are implicitly prepended by default:
Python
In Python, only the string representation of the IDs will be visible to the user:
and there will be a
.selectors()
method returning a list of tuples of selectors / enabled state.All of this will only be implemented for the
Rules
, not for theTokenizer
. Rules in theTokenizer
are applied hierarchically so disabling one can have an effect on the others.Points of discussion
selectors
terminology and about the namesCategoryID
,GroupID
andIndexID
.enable
/disable
list of IDs. Although it is much more expressive (e. g. it is impossible to express the simple scenarios from above with anenable
/disable
list).As always, I appreciate discussion about this a lot :). Writing it down helped me a lot to define the API (and it actually changed significantly while writing this), more discussion helps even more.
Update: Selectors will not have an on / off state. Instead, the
selectors
argument will be a vector of tuples of (selector, enabled). Having an on / off state makes it unclear how finding rules by selector should work.The text was updated successfully, but these errors were encountered: