Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watered-Down Regex Syntax #94

Open
7 tasks done
kristineds opened this issue Oct 21, 2015 · 22 comments
Open
7 tasks done

Watered-Down Regex Syntax #94

kristineds opened this issue Oct 21, 2015 · 22 comments

Comments

@kristineds
Copy link

KB Article Creation Checklist
  • Write initial draft for this KB Article; label this issue draft and either questions or tutorials
  • Add required YAML configuration
  • Add Tags for this KB Article to the YAML config (see YAML Keys (Explained))
  • Edit and finalize draft for publishing (remove draft label, add draft-finalized label)
  • Assign Issue to yourself and create Markdown file (remove draft-finalized label, add pending)
  • Project Lead: Review and Publish KB Article (remove pending label, add published label)
Additional TODOs

KB Article Published @ zencache.com
📃 See: Watered-Down Regex Syntax

:octocat: View Markdown File | ✏️ Edit Markdown File


@raamdev
Copy link
Contributor

raamdev commented Oct 22, 2015

@jaswsinc writes (on Slack)...

In the KB article we should provide sections within the article that cover each use case.

  • Custom URL Examples
  • URI/Referer/User-Agent Exclusion Examples
  • Sitemap URI Examples

@raamdev raamdev changed the title Watered Down Regex Syntax Watered-down Regex Syntax Oct 24, 2015
@raamdev
Copy link
Contributor

raamdev commented Nov 1, 2015

@jaswrks
Copy link

jaswrks commented Nov 1, 2015

@kristineds This article should also contain the list of definitions somewhere at the top; i.e., what each special character is used for as a primer; just in case this article is found before you see the list of special characters that we have in the Dashboard.

@kristineds
Copy link
Author

@raamdev @jaswsinc: KBA has been updated with the following:

  • Separate section for the Clear Cache "Specific URL" menu option
  • Wildcard characters list of definition
  • Introduction for regex

Would love to hear your feedback on this. :)

@raamdev
Copy link
Contributor

raamdev commented Nov 7, 2015

@kristineds Looking great! Here's some feedback:

  • I would make the "list of the special or wildcard characters supported" an actual list, instead of putting that inside a codeblock. Make each line a separate list item and use the code format for the characters themselves, like this:
    • * = 0 or more characters that are NOT a slash /
  • Instead of "This syntax can be found in the following cases:", I would write a paragraph that explains there are several ares within ZenCache that the watered-down regex syntax is supported, and that this syntax allows you to get creative and specific when tweaking your ZenCache configuration. I would mention each area of the plugin that supports the syntax, and then say that specific usage examples for each are are described below.
  • At the beginning of each section that describes usage examples, I would mention the Dashboard path to the section, e.g., Dashboard → ZenCache → Plugin Options → Automatic Cache Clearing → Misc. Auto-Clear Options → Custom URLs.
  • Use H2 (##) instead of H3 (###) for each of the sections in this article.

@kristineds
Copy link
Author

@raamdev Those points have been added on the KBA. Please review when you get a chance.

@jaswrks jaswrks changed the title Watered-down Regex Syntax Watered-Down Regex Syntax Nov 10, 2015
@jaswrks
Copy link

jaswrks commented Nov 10, 2015

@kristineds I worked to help you improve the introductory paragraphs and heading structure. I made some changes above. Please review and make any additional adjustments that you'd like, then post a reply to let us know that you're ready for a final review.

Here are some additional suggestions/observations:

To clear all cache files for URIs under /blog/ (i.e., all Posts):

/blog/* should be /blog/** right?


Any custom URLs that are listed (to be cleared automatically) are taken almost literally. Therefore, in a Multisite Network you will need to list URLs that you want to exclude, not URIs that apply to every site in the network.

I would remove that and instead explain what ** does in that example.


home page URI, I can do that now!

I would remove the ", I can do that now!" part of this. Not needed in this article.


clear the cache for for your

Remove the extra for.


*domain.com/my-account*

A URL starts with http, followed by ://, and the slashes are not matched by *, so that should be **. A single * would match http: only, and then it would stop on // because it expected to find domain.com next. Make sense?

@kristineds
Copy link
Author

@jaswsinc

I worked to help you improve the introductory paragraphs and heading structure.

Thank you! Looks so much better than before. 👍

@raamdev: I'm not 100% sure on the examples I wrote for Specific URLs though. Please review. :)

@raamdev
Copy link
Contributor

raamdev commented Nov 13, 2015

@kristineds One thing to watch for when copy/pasting is accidentally copying "fancy quotes", which look different than regular quotes:

2015-11-12_21-18-01

Notice how that second double-quote looks different from the first one? Please delete that and replace it with a regular one.


I'm not 100% sure on the examples I wrote for Specific URLs though. Please review. :)

This looks wrong to me:

If you want to clear the cache for your /my-account page, you could use a pattern like this:

http://domain.com/my-account**

This will match the following URLs:

https://www.domain.com/my-account
https://domain.com/my-account
http://www.domain.com/my-account

The ** matches "0 or more characters of any kind, including / slashes", going from left to right (i.e., it matches 0 or more characters of any kind, including slashes, to the right of the **).

I think you misunderstood what Jason said when he was replying to your *domain.com/my-account* example. He was saying that the single * at the beginning needs to be **, i.e., **domain.com/my-account* will match all of the following:

**https://www.**domain.com/my-account
**https://**domain.com/my-account
**http://www.**domain.com/my-account

The ** in **domain.com/my-account* matches all of the bold items in the URLs above.

@raamdev
Copy link
Contributor

raamdev commented Nov 13, 2015

The ** matches "0 or more characters of any kind, including / slashes", going from left to right (i.e., it matches 0 or more characters of any kind, including slashes, to the right of the **).

Sorry, that is not correct. The ** matches 0 or more characters of any kind, including / slashes. It doesn't matter which direction.

@kristineds
Copy link
Author

@raamdev I updated it to this:

If you want to clear the cache for your /my-account page, add ** at the beginning of the URL:

*domain.com/my-account
This will match all of the bold items in the URLs below.

https://www.domain.com/my-account
https://domain.com/my-account
http://www.domain.com/my-account

To clear the cache for /membership/ and any pages or posts beneath it, one level deep, add * at the end of the URL:

http://www.domain.com/membership/*

@raamdev
Copy link
Contributor

raamdev commented Nov 13, 2015

@kristineds Thank you. I'll take it from here. :-) Nice work on this article! I'm marking this approved as of 2015-11-12.

@raamdev
Copy link
Contributor

raamdev commented Nov 13, 2015

@kristineds KB Article has been published here: http://zencache.com/kb-article/watered-down-regex-syntax/

My edits are here in case you'd like to take a look: f8973c0...50a10d0

@raamdev
Copy link
Contributor

raamdev commented Nov 13, 2015

@jaswsinc In Sitemap URI Examples, the pattern http://*/path/to/something/ is followed by this note, which seems wrong (or at least confusing) to me:

NOTE: The scheme (i.e., http:// or https:// is ignored), and both schemes are cleared automatically.

Wouldn't we need to use **/path/to/something/ for that to be the case? Or is this just a matter of how the code handles Sitemap URIs in particular (i.e., that part of the pattern is ignored altogether?).

@raamdev raamdev reopened this Nov 13, 2015
@jaswrks
Copy link

jaswrks commented Nov 13, 2015

@jaswsinc In Sitemap URI Examples, the pattern http://*/path/to/something/ is followed by this note, which seems wrong (or at least confusing) to me:

Yes, that is wrong. I think I gave Kristine bad information on this by mistake whenever I was showing examples of the new syntax. Sitemap "URIs" should be entered as "URIs" and not as URLs. The URIs that you list there are inclusion patterns, and they are parsed just like URI exclusion patterns are.

What I was trying to convey in that note about the scheme, is that ZenCache always deals with both schemes automatically. It's actually a bit easier to understand this is you think about it in the context of a URI exclusion pattern, not with a Sitemap URI inclusion. For instance, if you exclude:

^/members/**

That excludes this URI across all domains, and across all schemes. There is no need to enter the domain name, and no need to enter the scheme.


However, full "URLs" do come into play in a list of Custom URLs that you want to clear. For instance, if you provide a list of Custom URLs that should be cleared automatically. Or, if you want to clear a Specific URL from the admin bar, you are typing in the full URL. You can be specific about the domain in this case.

http://example.com/members/

↑ Note: This also clears the cache for both schemes, even though you only gave http://.


↓ In other words, this is not necessary; it always happens automatically:

*://example.com/members/

In fact, if you use a wildcard for the scheme it will not work at this time.

@jaswrks
Copy link

jaswrks commented Nov 13, 2015

There is still room for improvement in these routines. I am already seeing some potential conflicts between the intended functionality (i.e., what we have in the article) vs. what actually takes place. The article doesn't need to be updated (I don't think), but the codebase could use some minor tweaks.

Opening bug report here: wpsharks/comet-cache#611

@jaswrks
Copy link

jaswrks commented Nov 13, 2015

@kristineds Nice work on this article. Thanks for helping us put this all together! 😃

@raamdev
Copy link
Contributor

raamdev commented Nov 13, 2015

@jaswsinc writes...

Sitemap "URIs" should be entered as "URIs" and not as URLs.

So this statement about mapped domains in the Sitemap URIs section is also wrong, correct?

"If you want to clear URLs for a mapped domain, the URLs that you list should include the mapped domain. If you need to cover all domains, you can use a pattern like this:"

@jaswrks
Copy link

jaswrks commented Nov 13, 2015

So this statement about mapped domains in the Sitemap URIs section is also wrong, correct?

Correct. URIs only, not full URLs.

Any domains mapped to the current site are cleared automatically.
In other words, if you enter (default value for that field):

/sitemap*.xml

Whenever the cache is cleared for Child Site A, the following are cleared automatically:

  • http://child-a.example.com/sitemap.xml
  • or: http://example.com[/base]/child-a/sitemap.xml
  • and/or: http://[MAPPED DOMAIN]/sitemap.xml (for each domain mapped to child site A)

@jaswrks
Copy link

jaswrks commented Nov 13, 2015

Regarding this change in the latest release. The interpretation of * is changing. Therefore, the update routine will change existing instances of * into ** in order to preserve the original functionality that was previously accomplished with *. Now requires **

See: https://github.com/websharks/zencache-pro/blob/000000-dev/src/includes/classes/VsUpgrades.php#L282

If you can double-check me on those option updates, I'd appreciate it. I see that I excluded sitemaps from the upgrade routine. If I remember correctly, that was because /sitemap*.xml seems more appropriate than /sitemap**.xml in the new syntax.

raamdev added a commit that referenced this issue Nov 13, 2015
@raamdev raamdev closed this as completed Nov 13, 2015
@raamdev
Copy link
Contributor

raamdev commented Nov 13, 2015

@jaswsinc Thanks for all your feedback here. :-) I rewrote the XML Sitemap Patterns section of the KB Article (see b79424e).

@jaswrks
Copy link

jaswrks commented Dec 21, 2015

I'm noting that this line in the KBA seems out of context.

In this example, adding the syntax ** will exclude all custom URIs that start with /blog/ and anything infinitely deep beneath that slug.

@jaswrks jaswrks reopened this Dec 21, 2015
@raamdev raamdev removed their assignment Sep 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants