Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dev/mailing#95 Only track <a> urls in Flexmailer for HTML emails #20129

Conversation

larssandergreen
Copy link
Contributor

@larssandergreen larssandergreen commented Apr 23, 2021

Issue here.

Before

Link tracking was applied to all URLs inside tags, including tags like <link>.

After

Link tracking is applied only to URLs inside <a> tags. Regex simply looks for <a anything href=URL> instead of <anything href=URL> (edited to add "anything" for clarity).

In Flexmailer only, HTML only
@civibot
Copy link

civibot bot commented Apr 23, 2021

(Standard links)

@artfulrobot
Copy link
Contributor

@larssandergreen sometimes I think there should be a prize for the biggest metadata:changed-characters ratio!

Mind you, now we're looking for <a do you think we should also be checking for <A ? i.e. add an i flag to the end of the regex?

@homotechsual
Copy link
Contributor

homotechsual commented Apr 23, 2021

The href does not have to immediately follow the <a and <a title="Title" href="link">Text</a> is perfectly valid HTML which this should account for - has that been tested?

@larssandergreen
Copy link
Contributor Author

@artfulrobot If you use A in a tag and save the mailing, it is converted to lowercase (same for HREF, which wouldn't have worked before now if not). But good idea to add i here — belt and braces.

That does take the PR up to four characters though.

@larssandergreen
Copy link
Contributor Author

larssandergreen commented Apr 23, 2021

@MikeyMJCO Sorry, I oversimplified in my explanation and have now clarified. All I've changed is where the regex formerly looked for < to start matching the pattern, now it looks for <a (and I've now added an i to make this case insensitive). But just to be sure, I just tested with <a title="Title" href="link">Text</a>.

@eileenmcnaughton
Copy link
Contributor

@artfulrobot @MikeyMJCO do either of you have a feeling what the status is on this one?

@homotechsual
Copy link
Contributor

Approved from me - small but valuable change. Code is clear and a few things clarified by the author in-transit!

@eileenmcnaughton eileenmcnaughton added the merge ready PR will be merged after a few days if there are no objections label May 25, 2021
@eileenmcnaughton
Copy link
Contributor

I've added merge-ready just in case @artfulrobot wants to comment before we merge based on your review @MikeyMJCO

@totten
Copy link
Member

totten commented May 26, 2021

  • Agree tracking <link rel="stylesheet" href="styles.css"> would be superfluous. (In the current state of affairs, you'd expect stats on that should be similar to the pixel-tracking... so it's redundant.)
  • Dunno if CiviMail or Mosaico documents ever use <base href> tags... tracking base+relative URLs is probably a sensible thing, but it's ill-defined in the current schema. Adding the tracking-code to <base href> is probably harmful at the moment.
  • <area href> is more of a bona-fide/clickable hyperlink. OTOH, <area href> probably died off ten years ago? It seems more correct to track it, but I wouldn't block other fixes on that.

@totten
Copy link
Member

totten commented May 26, 2021

Shoulda read the regex... ;(\<a[^>]*href *= *")([^">]+)(");i', probably matches both <a ... href> and <area ... href>. 🙃

@eileenmcnaughton
Copy link
Contributor

@totten is that a thumbs up on merging?

@totten
Copy link
Member

totten commented May 26, 2021

Yeah, concept+approach seem fine. Haven't r-run. However, the test coverage in HtmlClickTrackerTest, ClickTrackerTest, and FlexMailerSystemTest::testUrlTracking() all seem relevant (and pass).

@artfulrobot
Copy link
Contributor

Let's merge this, although I think a minor improvement would be

;(\<a\s+[^>]*href\s*=\s*")([^\">]+)(");i

and likewise for the ' one.

  • Allow any whitespace (tabs, newlines, not just spaces) after href
  • Insist on at least one whitespace after tag name. This does rule out <area, but feels more correct; I'd suggest adding specific allowance for <area if we need to support that.

https://regex101.com/r/aYyL0S/1

@eileenmcnaughton
Copy link
Contributor

Ok - Ill merge this & someone can put up a follow up with @artfulrobot's suggestion as a follow up

@eileenmcnaughton eileenmcnaughton merged commit 881c8ad into civicrm:master May 26, 2021
@larssandergreen larssandergreen deleted the apply-mailing-tracking-only-to-a-urls branch May 26, 2021 14:24
@larssandergreen
Copy link
Contributor Author

Thanks all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
master merge ready PR will be merged after a few days if there are no objections
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants