Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds same glob patterns as exclusions for pageview goals #750

Merged
merged 9 commits into from
Mar 2, 2021

Conversation

Vigasaurus
Copy link
Contributor

@Vigasaurus Vigasaurus commented Feb 24, 2021

Changes

screenshot

Addresses #241

Tests

  • Automated tests have been added

Changelog

  • Entry has been added to changelog

Documentation

  • Docs have been updated

Although I must say, I sometimes feel bad with how many PRs I've forced you to review recently heh, this should be the last one for a bit though

@ukutaht
Copy link
Contributor

ukutaht commented Feb 24, 2021

Code looks good. This should be thoroughly tested for sure.

@Vigasaurus
Copy link
Contributor Author

Code looks good.

Huh, honestly a bit surprised there. I guess my Elixir has gotten pretty decent after all - this felt a bit hacky to me at first to be honest, but doing it any other way than a separate CH query-per-goal was causing significant issues with counting uniques correctly, so I settled for this setup. (This code is technically v3, heh)

This should be thoroughly tested for sure.

100%. I'll write significant tests for this, partially because my elixir string replacement/regex is weak anyways, and that I had multiple instances of the clickhouse match() function not working as I'd generally expect, so there are decent odds that the same may present itself again.

As a note here actually, do you know of an easy (or not easy, if needed, I suppose) way for me to have mix test set up the test CH db, but not proceed with any more of the tests? In other words, keep the DB initialized as it would be at test-time for me to run CLI queries on it.

@ukutaht
Copy link
Contributor

ukutaht commented Feb 24, 2021

Ah, I didn't actually notice the N+1 query here. Would be good to combine it all back into a single query in case there are a lot of goals. There seem to be loads of functions designed for searching multiple different regex patterns in CH: https://clickhouse.tech/docs/en/sql-reference/functions/string-search-functions/#multimatchanyhaystack-pattern1-pattern2-patternn

As a note here actually, do you know of an easy (or not easy, if needed, I suppose) way for me to have mix test set up the test CH db, but not proceed with any more of the tests?

You can try setting a IEx.pry() somewhere in your test code. It will stop the execution but keep the environment up without tearing it down. More info: https://hexdocs.pm/iex/IEx.html#pry/0

Then you can run arbitrary Elixir or connect to your CH with their client.

@Vigasaurus
Copy link
Contributor Author

Hmm some of those CH functions actually seem promising now that I think about it. The hard part here (and the problem with v1 and v2 of my code) were that I could really only group over the basic pathname effectively, which made unique user_id's be double counted per every basic pathname. I wonder if there's a way to group by something other than a basic column; I'll look into it.

@ukutaht
Copy link
Contributor

ukutaht commented Feb 24, 2021

Hmm some of those CH functions actually seem promising now that I think about it. The hard part here (and the problem with v1 and v2 of my code) were that I could really only group over the basic pathname effectively, which made unique user_id's be double counted per every basic pathname. I wonder if there's a way to group by something other than a basic column; I'll look into it.

Yes we'd need to find a way to group over the matching regex basically. I wonder if you could use multiMatchAnyIndex to get the index of the matching regex. Then group by the index. And after the query in elixir you could map the index of the match back to the regex/goal name that corresponds to it.

@Vigasaurus
Copy link
Contributor Author

Yeah that sounds good, I was thinking about pretty much the same, but with the actual regexes. Doing it just over the index might be better and cleaner though. Will poke around and see what I can get 👍

lib/plausible/stats/clickhouse.ex Outdated Show resolved Hide resolved
value && value != ""
defp validate_page_path(changeset) do
value = get_field(changeset, :page_path)
value && String.match?(value, ~r/^\/.*/)
Copy link
Contributor Author

@Vigasaurus Vigasaurus Feb 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to do better validation of pathnames here, and add good validation for preventing duplicate goals; but I didn't want to make a disgusting regex if you knew of an easy Elixir helper for validating URLs (this may not work I guess because of the globs), and I couldn't get the latter to work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK as is.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about it to extract this regex and put in the module attribute? And use a better name to describe the intentions of what this regex do...

@Vigasaurus
Copy link
Contributor Author

Alright, I think this should be ready for review now. I do again still need to do more testing across the board, but I figure if you see any issues off the bat I can handle them beforehand.

I did just add the one main test, adding more tests, or more specific tests would need a lot more detailed events added to test that out effectively. If you think those should be added, I can do that, I just didn't want to immediately add a bunch of events there for this one test alone unless necessary.

I'll do docs later.

@Vigasaurus Vigasaurus marked this pull request as ready for review February 26, 2021 10:20
value && value != ""
defp validate_page_path(changeset) do
value = get_field(changeset, :page_path)
value && String.match?(value, ~r/^\/.*/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK as is.

lib/plausible/stats/clickhouse.ex Show resolved Hide resolved
@Vigasaurus Vigasaurus requested a review from ukutaht March 1, 2021 18:50
@ukutaht ukutaht merged commit b952053 into plausible:master Mar 2, 2021
@ukutaht
Copy link
Contributor

ukutaht commented Mar 2, 2021

Thanks @Vigasaurus !

@ukutaht
Copy link
Contributor

ukutaht commented Mar 2, 2021

The goals seem to work OK but it also needs to be implemented as a filter. Currently, when you click on the goal you land on a broken dashboard:

https://plausible.io/plausible.io?goal=Visit+%2Fblog%2F**

@metmarkosaric
Copy link
Contributor

metmarkosaric commented Mar 2, 2021

Yeah, when you click on the goal, all metrics show 0 and "no data yet", You can see it on our live demo for "Visit /blog/**".

@Vigasaurus: This feature could be very cool to add to our "Top Pages" report to make it easier to use. When you click on "details" in Top Pages you could get a search box at the top which allows you to type whatever you want to filter the pages by that. I could type "blog" which should only list pages that include "blog" in their paths.

And then we could add one header to that table that summarizes the numbers of the pages you searched for:

  • you could see percentage of all pageviews that the searched for pages stand for (say "blog" pages have 20% of total pageviews in the time period)
  • the total number of pageviews between the searched for pages (say "blog" pages have 2335 pageviews in total in the time period)

Will solve the thing we've been asked for recently in #774

@Vigasaurus
Copy link
Contributor Author

Vigasaurus commented Mar 2, 2021

Oh yeah huh I never even thought to test for it being used as a filter. Fixing that should be not that terrible though, maybe. I'll look into it.

@Vigasaurus Vigasaurus mentioned this pull request Mar 2, 2021
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants