Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document rules: introduce "where" key which accepts an array of conditions #177

Merged
merged 3 commits into from
Aug 29, 2022

Conversation

jeremyroman
Copy link
Collaborator

This isn't quite full boolean algebra, but addresses #160. It's tempting to go for a full parsed expression syntax but that introduces a fairly significant complexity, escaping, etc.

@jeremyroman
Copy link
Collaborator Author

The extra [] needed around an individual URL pattern or selector (especially the latter, since CSS selectors already have a , operator) is a little unfortunate but still seems more consistent than making it optional -- but maybe it's useful for convenience?

If I also made that change and combined it with #169 it would reduce the example down to

{
  "prerender": [
    {"where": [
       {"href_matches": "/*"},
       {"not": {"href_matches": "/logout"}},
       {"not": {"selector_matches": ".no-prerender"}}
     ],
     "score": 0.1}
  ]
}

WDYT?

Also not explicit in the example is that URL patterns can be constructed either from a single string or from a structured object with the same syntax as URLPatternInit. Not mentioned in the explainer because it's not very illustrative.

@jeremyroman jeremyroman requested a review from domenic August 24, 2022 01:54
@jeremyroman
Copy link
Collaborator Author

And additionally if arrays are optional then the "where" case can also drop them if there's only one condition. Saves characters but also makes the structure less consistent (e.g., it would be harder to validate against a JSON schema).

@domenic
Copy link
Collaborator

domenic commented Aug 24, 2022

My main question is figuring out exactly what boolean algebra we're giving, both implicitly and explicitly.

From what I can see we have:

  • (Implicit) "or" via separate document rules
  • (Implicit) "and" (I think??) via multiple array elements under "where"
  • (Implicit) "or" via multiple array elements under each "href_matches" or "selector_matches"
  • (Explicit) "not" wrapper

I guess every formula in a boolean algebra can be put into disjunctive normal form so the minimum we could provide is: outer-or, inner-and, innermost-not. E.g., separate document rules, multiple array elements under "where", and either a version of "not" that can wrap a single "href_matches" / "selector_matches" rule, or "not_href_matches" + "not_selector_matches" counterparts.

That said, maybe we should not be including separate document rules in this analysis; maybe we want you to be able to express any formula within a single document rule (and thus not have to duplicate things like score or referrer policy). So from that perspective what you have so far works well (and corresopnds to conjunctive normal form).

Either way, it seems like we may be able to preempt any further expansion desires since everything is already expressible, so "In the future, a more complete boolean algebra could be added if needed" is maybe not necessary.

That leaves me with the following thoughts:

  • Is it sufficiently clear that "where" arrays mean "and" and "href_matches" / "selector_matches" arrays mean "or"? I'm unsure...

  • Do we need a fully general "not": {...} wrapper, or would making not atomic (e.g. via "not_href_matches" + "not_selector_matches") work better? The latter is more like a normal form, and simpler, but maybe that's not the right priority.

  • If we do a general "not": {...}, should we also allow "not": [{...}, {...}]? If so does the array mean "or" or "and"?

  • Your original question, generalized: Are we OK with syntaxes where a single element stands in for a single-element array, e.g. "href_matches": "X" as a stand-in for "href_matches": ["X"] or (per the previous item) "not": {...} as a stand-in for "not": [{...}]? (My answer: yes, this is OK, especially when designing leaf nodes.)

Also, an interesting link... https://stackoverflow.com/q/20737045.

Attempt at something inspired by https://jsonlogic.com/
{
  "where": [ // "or"
    {
      "and": [
        { "href_matches": "/*" },
        { "not": { "href_matches": "/logout" } },
        { "not": { "selector_matches": ".no-prerender" } },
      ]
    },
    { "selector_matches": ".definitely-prerender" }
  ]
}
{
  "where": [ // "and"
    {
      "or": [
        { "href_matches": "/good-stuff/*" },
        { "selector_matches": ".please-prerender" },
      ]
    },
    { "not": { "selector_matches": ".no-prerender" } }
  ]
}

The killer problem is just what the top-level "where" key means. It's not at all obvious whether it's "or" or "and", and anything that makes it explicit is ugly.

@domenic
Copy link
Collaborator

domenic commented Aug 24, 2022

I also noticed that CSS's :where(x, y) means "x OR y", which is kind of unfortunate if the intention for us is AND.

@jeremyroman
Copy link
Collaborator Author

I'm not super convinced by the argument that things can be converted to DNF or CNF because the normal form isn't always remotely ergonomic -- or efficient to compute, for that matter. Authors can arguably express any logic by adding/removing a class name and using CSS selectors to match on that.

CSS :where is a little funny because the , behavior there comes from how commas are treated in CSS selectors generally, rather than anything about :where itself.

One way to avoid the where ambiguity is to require it to be explicit any time it takes more than one, but it's cumbersome.

It did seem natural if not obvious to me that "where" would be implicitly "and" because you can always duplicate a rule, and in practice I would expect most rules to be of the form "X" or "X but not Y", which is more easily expressible with "and" than "or".

CSS selectors have builtin combinators so an array isn't actually necessary there but is there for convenience more than anything. URL patterns, though, I expect it to be common to say things like "everything is safe, except for /logout, /shred/* and /incinerate/*", which isn't expressible with a single pattern. And to say that concisely, you want "or".

So while they're not all the same I think this is the sense which I expect to be most frequently useful.

Some random alternatives to feed the brainstorm:

explicit all/any

{
  "where": {"all": [
    {"href_matches": "/*"},
    {"not": {"href_matches": "/logout"}},
    {"not": {"selector_matches": ".no-prerender"}}
  ]}
}

combining logical operators with adjacent nodes using string prefixes

Inlining the "not" into particular conditions was part of the starting point. Maybe it's not so bad if we simply define a number of common patterns for the keys that apply in a standardized way.

{
  "where_all": [
    {"href_matches": "/*"},
    {"not_href_matches": "/logout"},
    {"not_selector_matches": ".no-prerender"}
  ]
}

Honestly, "where_all" still seems cumbersome for simple uses and still raises the question of what happens if "where_all" and "where_any" are both specified.

expression syntax

We could just give up on using JSON as a way to represent conditions and just give a proper expression syntax.

{"where": "href_matches('/*') && !href_matches('/logout') && !selector_matches('.no-prerender')"}

This immediately creates annoyances around string escaping rules, though, in addition to interpreting this requiring a parser.

expression syntax, prepared statement style

{
  "where": [
    "href_matches($1) and not href_matches($2) and not selector_matches($3)",
    "/*", "/logout", ".no-prerender"
  ]
}

This extracts the strings similar to how SQL prepared statements work but ends up being very ugly.

expression syntax only for boolean algebra

{
  "where": {
    "expression": "is_local && !is_logout && !is_no_prerender",
    "$is_local": {"href_matches": "/*"},
    "$is_logout": {"href_matches": "/logout"},
    "$is_no_prerender": {"selector_matches": ".no-prerender"}
  }
}

This allows an expression syntax limited to identifiers, &&, || and parentheses (so requires parsing, but simpler parsing) and makes all of those identifiers correspond to condition objects.

@domenic
Copy link
Collaborator

domenic commented Aug 25, 2022

I generally agree with all your points; thanks for spelling them out! The only pushback is that "if" might be worth exploring instead of "where", to avoid anyone drawing the CSS analogy.

explicit all/any

This seems pretty OK, especially with that indentation style. I guess you could also feed it a single non-logical rule, e.g. "where": { "href_matches": "/*" } or "where": { "not": { "selector_matches": ".avoid-prerendering" } }, which is kind of nice.

combining logical operators with adjacent nodes using string prefixes

I agree that "where_all" / "where_any" is kind of weird and doesn't really solve the problem.

I find the simplicity of inlining "not_" into the terminal nodes compelling, but I'm not sure.

expression syntax only for boolean algebra

This is... not terrible? It's not really like anything I've seen before, but I think it could be pretty reasonable to use.


Overall I would be happy with any of:

  • What you have in the current PR, optionally with minor modifications like allowing single-elements to stand in for arrays, or "if" instead of where, or inlining the "not"s.

  • A more full JSON boolean algebra, in the style of your "explicit all/any". (In this case I think probably "not": {...} should stay generic.)

  • expression syntax only for boolean algebra, maybe??

I think the explainer should include brief writeups of whatever alternatives you don't choose, and try to get feedback on them ASAP (e.g. by pinging #160 which should notify Lea, and asking current or future speculationrules users).

@jeremyroman
Copy link
Collaborator Author

Alright, removed implicit-and for "where" because at least one other person told me it was unintuitive, made the logic a simple and familiar "not", "and" and "or", and allowed implicit single patterns for href and selector matches.

We can always extend with "I have a really complicated boolean function" features in the future if it proves necessary, which I hope it won't.

Copy link
Collaborator

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I'm pretty happy with where this landed, personally.

@jeremyroman jeremyroman merged commit 957e543 into WICG:main Aug 29, 2022
@jeremyroman jeremyroman deleted the adjust-document-rule-microsyntax branch August 29, 2022 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants