Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rule selectors #43

Closed
bminixhofer opened this issue Feb 23, 2021 · 1 comment · Fixed by #41
Closed

Rule selectors #43

bminixhofer opened this issue Feb 23, 2021 · 1 comment · Fixed by #41
Labels
api Additions to the public API.

Comments

@bminixhofer
Copy link
Owner

bminixhofer commented Feb 23, 2021

I'm currently improving rule selection along with #41.

Rules are nested in up to three layers in the LanguageTool XML:

<category ...>
    <rulegroup ...> <!-- can be omitted, there's also <rule>s at this level -->
        <rule>
        </rule>
    </rulegroup>
</category>

At the moment the id is just a string with <group_name>|<rule_name>.<number>. I want to streamline this to allow easily disabling e. g. an entire category.

API

This is the API I currently have in mind for this. There will be one struct for each rule level ID:

pub struct CategoryID;
pub struct GroupID;
pub struct IndexID;

with conversions between them:

impl CategoryID {
    pub fn new<S: Into<String>>(category: S) -> Self;
    pub fn join<S: Into<String>>(&self, group: S) -> GroupID;
}

impl GroupID {
    pub fn join(&self, index: usize) -> IndexID;
    pub fn parent(&self) -> &CategoryID;
}

impl IndexID {
    pub fn parent(&self) -> &GroupID;
}

It will only be possible to create CategoryIDs directly. It can then be joined to create IDs at different levels. The id() field of Rule will become an IndexID (currently a String).

Selector

The structures above won't do any work on their own. For that, there is a RuleSelector which, given an IndexID, determines if it matches. It can also be disabled to inverse the match:

pub enum IDSelector {
    Category(CategoryID),
    Group(GroupID),
    Index(IndexID),
}

impl IDSelector {
    pub fn is_match(&self, spec: &IndexID) -> bool;
}

and corresponding methods to create a selector from IDs at various levels:

impl From<GroupID> for IDSelector {};

// same for others

Selectors can also be cast to / from strings with the representation <category>/<group>/<index> e. g. "typos/verb_apostrophe_s/3" or "grammar". Selectors will be case insensitive.

Usage

Rust

The new user-facing RulesOptions (passed via .new_with_options(..., options: &RulesOptions)) will initially look like this:

struct RulesOptions {
    selectors: Vec<IDSelector>,
}

where selectors is a list of selectors which are applied in order, and selectors which are disabled by default for the language are implicitly prepended to the list. E. g. to disable all rules in the typos category but verb_apostrophe_s:

let rules = Rules::new_with_options("rules.bin", RulesOptions {
    selectors: vec![
        (CategoryID::new("typos").into(), false),
        (CategoryID::new("typos").join("verb_apostrophe_s").into(), true)
    ],
})

alternatively:

let rules = Rules::new_with_options("rules.bin", RulesOptions {
    selectors: vec![
        ("typos".try_into().unwrap(), false),
        ("typos/verb_apostrophe_s".try_into().unwrap(), true),
    ],
})

or, conversely, to enable all typos and only disable verb_apostrophe_s:

let rules = Rules::new_with_options("rules.bin", RulesOptions {
    selectors: vec![
        ("typos".try_into().unwrap(), true),
        ("typos/verb_apostrophe_s".try_into().unwrap(), false),
    ],
})

There will also be a method which returns all the currently used selectors, including the ones which are implicitly prepended by default:

impl Rules {
    pub fn selectors(&self) -> &[(IDSelector, bool)];
}

Python

In Python, only the string representation of the IDs will be visible to the user:

rules = Rules.load("en", selectors=[
    ("typos", False), 
    ("typos/verb_apostrophe_s", True)
])

and there will be a .selectors() method returning a list of tuples of selectors / enabled state.


All of this will only be implemented for the Rules, not for the Tokenizer. Rules in the Tokenizer are applied hierarchically so disabling one can have an effect on the others.

Points of discussion

  • I am not so sure about the selectors terminology and about the names CategoryID, GroupID and IndexID.
  • I'm quite happy with the abstraction itself. But it does add additional complexity and it might not be as intuitive at first glance as a simple enable / disable list of IDs. Although it is much more expressive (e. g. it is impossible to express the simple scenarios from above with an enable / disable list).

As always, I appreciate discussion about this a lot :). Writing it down helped me a lot to define the API (and it actually changed significantly while writing this), more discussion helps even more.

Update: Selectors will not have an on / off state. Instead, the selectors argument will be a vector of tuples of (selector, enabled). Having an on / off state makes it unclear how finding rules by selector should work.

@bminixhofer bminixhofer added the api Additions to the public API. label Feb 23, 2021
@bminixhofer bminixhofer mentioned this issue Feb 23, 2021
5 tasks
@bminixhofer bminixhofer linked a pull request Feb 24, 2021 that will close this issue
@bminixhofer
Copy link
Owner Author

API of this changed again moving away from the list of selectors. Details in the docs and release notes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Additions to the public API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant