-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Metric: visit count until conversion #584
Comments
Some notes in #1434 |
Note: the field is zero before 1.2-rc1 update, and 1 after by default. Maybe we can assume 1 visit for all previous 1.2 logs. |
I'd like to work on this ticket, but I've got a couple of questions:
|
Great ticket to pick up! Here is breakdown used by GA:
I suggest we use the same breakdown as it makes sense and is consistent.
|
Thanks for the info, and I'll take your suggestion and work on the other tickets as well. I'll update again when I have a working prototype. |
Attachment: Patch for this issue. Made for revision 5193. |
Some notes on my patch:
I didn't add them for any other language, though if there's a way for me to do it I'd be willing to. Let me know if anything should be changed/added. |
Your solution here sounds like maybe we can do it another way. Are you trying to do something new to Piwik? Maybe see the existing code for similar reports, which work fine, and your code can follow these?
Translators do the translation later ;) http://piwik.org/translations/
Please apply above feedback and then post a patch to this ticket and mention all the other tickets fixed in the same patch. Then I can look at it in detail, test it, see the changes which is all called a code review. Then you can do a few more changes or we can commit when it's ready and working. You can see the process in: http://piwik.org/participate/development-process/#toc-how-to-submit-a-patch Looking forward to seeing your work :) |
Replying to matt:
I used the one you suggested. (BTW, this patch is just for the visits to conversion report. I'll fix the other issues & add them to one ticket before I attach another patch.)
I might be trying to do something new. The reports I looked at had variable data in the 'label' column. When sorting by this column piwik will sort lexicographically, which makes sense for these reports. Sorting this new report in this way, however, makes it look odd (ie, with 15-25 visits right after 1 visits). To get around this, the label column in the new report holds an index and a filter is used to make sure the label is transformed into a string AFTER the table is sorted. Is there another report that has a special sort of ordering? If so, I could take a look at that.
I'll upload another patch as soon as I am able. |
$dataTable->queueFilter('ColumnCallbackReplace', array('label', 'Piwik_getDurationLabel'));
|
One more quick question, what's the policy on refactoring in a patch? If I find opportunities to reduce some code redundancy (in this case the beauty filter code used in VisitorInterest), would it be alright to make the changes in this patch? |
Definitely :) refactoring is very welcome and in fact highly recommended. We would point out duplicate code during the review and generally propose to refactor :) so even better to do it from the start. If you have any question let us know |
Attachment: Patch for issues #584, #536, #2031. Made for revision 5256. |
I've uploaded another patch, this one addresses tickets #584, #536, #2031. I've also uploaded a file (in the same archive as the patch) that describes the changes I made in each file. Hopefully, that will make it easier to review. I missed the other related issue, but I'll be working on that while I incorporate the changes from the review (if any). |
Great patch! Most of it is on the right track and looking great, here is complete code review:
While your solution is more "correct" in Piwik we must tradeoff performance with absolute correctness over a large date range. In all existing reports, appart from the "Unique Visitors" metric, we do sum visits across dates. In this case, we want to do the same, for consistency (we don't want to only process these 2 reports for all periods), but also for performance (takes cpu time and additional code). Let me know your thoughts and if you have any question! looking forward to next patch :) |
Thanks for taking the time to review my patch (especially so promptly :)). Replying to matt:
I checked again, and it is passed, but the parameter is named 'filter_only_display_idgoal'. Must've missed it :) There is something else, though. None of the other reports seem to actively use the goal id. They all seem to select one blob which contains all the data, then only display some of it (which explains the 'filter_only_display_idgoal' parameter). My patch, on the other hand, will create a record for every goal. Is the former approach desired rather than mine? Also, this isn't related to this issue, but I just noticed there might be some inaccuracy in the goal-specific reports. I created a goal where conversions are visits whose page title contains 'piwik' then generated some visits. The first row in the keywords report looked like this:
From the visitor log there were only two conversions with the keyword 'piwik bingbot'.
Since I select the count of visitors, row 0 (the row w/ visitors who visited at least once) is the super set of all the other rows, ie, visitors who visited at least twice are also visitors who visited at least once. If I change it to select visits, then I'll have to sum every row, though...
To clarify, do you mean to avoid summing anything and instead select a count(*) & group by visitor_count_visits? Or something else? My patch currently sums over the visitor_count_visits column when archiving for one day and adds data tables together when archiving periods. Also, would it be a good idea to keep the refactoring I did (I merged two similar functions)? Finally, maybe I'm not looking at this correctly, but I think the 'Visits that were the visitor's nth visit' statistic is the same thing as the '# of visitors who visit at least N times'. For example, if you've got data like:
then the visits that were the visitor's 1st visit (3) is the same as the # of visitors who visited at least 1 time (3). If I am looking at it correctly, is it still better to display visits as a metric or is it better to display visitors? |
In your case, one different blob per idgoal is a much better solution indeed. but I think it's better to reuse existing filter_only_display_idgoal parameter since it does the same thing. if you're keen, you can rename it globally to "filter_idgoal" for example?
I never heard of this bug, but if you are sure there is one, please create a new ticket with the screenshot of the bug or data dump from the log_visit table. Definitely something to investigate!
In #536 the Visitor who visited 1 time, but not "at least once" - are you talking about this report in #536 ?
Yes group by visitor_count_visits is definitely the way to go, and select count(distinct idvisitor)
Yes the one building the SQL for ranges? definitely
What is the report " # of visitors who visited at least N times"? I think the report "Visits that were the visitor nth visit" is enough isnt it? Cheers! |
Replying to matt:
I don't mind renaming it, though it might be a good idea to just rename it to idGoal? The fact that it is used to filter some larger data would be an implementation detail.
I'll take a look at it after I finish this patch. Maybe I can submit another patch with the new issue :).
Yes, see below for my reasoning.
The '# of visitors who visited at least N times' is essentially the same report as the one specified in #536, it just seemed to me that 'visits that were the visitor's nth visit' wouldn't be as clear to a piwik user. So I was just describing the data in the report in #536 a different way. I may have been overthinking the issue, though. I could change the report documentation to do some more explanation. Or, if it sounds like a good idea, I could change the report to show the '# of visitors who visited exactly N times'. I think I'd just have create and modify the 'at least N times' report. |
Replying to capedfuzz:
OK I agree. After doing the change, please check that menus still work (click on goal 1, goal 2, other menu, come back to goal 3, etc.) - I think I remember sometimes seeing several idGoal in the URL etc. but it might have been fixed. OK to rename to "idGoal" :)
I agree it is slightly confusing, but I think it's best to keep one report for this, as it is simple to process and use existing code.
You're not overthinking in fact I think it's a useful report. In this case there is performance overhead and new core archiving logic, because it adds complexity (if you confirm) then I think it's best not to include it. This would be a great candidate for a plugin though! Cheers! |
Replying to matt:
I don't think it would add complexity, but I haven't written it yet. I think what I'll do is make modifications to my patch while displaying the report in #536, then I'll try and replace that report and see how much of a change it is. It may be easier to see the difference when the code is written :). |
Got another question, I've rewritten the VisitorInterest.php archiving code (for the new report) to GROUP BY visitor_count_visits, but my code is getting complex & hard to read (since the summarizing logic has to be in PHP). If I were to modify the archiving code to sum over ranges but get the data for every report in one SELECT (similar to the way my patch gets the data for the new goal reports), would that deal with the performance issue? In other words, if I do a sum, but get the data for the VisitorInterest_timeGap, VisitorInterest_pageGap, VisitorInterest_visitorsByVisits reports with one SELECT instead of three, will there still be a performance issue? |
It sounds like a good idea to get all of them at once if that's possible indeed! My main feedback was to not add much archiving code since all existing logic is already there to sum reports over ranges etc. so the code can be reused. I think that the new plugin should be quite small amount of code, appart from the new refactored methods you introduce (that will be refactored from the existing code). Let me know if that makes sense! |
Attachment: New patch for issues #584, #536, #2031. Made for revision 5270. |
I uploaded a new patch with the changes specified. Some notes on this patch:
Looking forward to your review! |
capedfuzz, great work thank you! First of all sorry for the delay in answering we wanted to release 1.6 which is now done. We will def commmit your patch for the next release :) Feedback
again thank you for this very nice work and looking forward to see it in trunk :-) |
Almost done w/ the next patch, but I've noticed a problem w/ the php tracking client. The client will not pass the _idvc query string variable which is used to set the visitor_count_visits column. This throws off the integration test results. Is this something the php tracking client should support? If so, would it be acceptable to simply add a function to the tracking client that lets you specify the specific value to use, or is there possibly a way to store this for each visitor? |
Actually, ignore my last comment, I found a solution. |
Replying to capedfuzz:
PHP should probably not support this directly (unless we support ALL cookies of piwik.js which is off topic), but you can use ->setDebugStringAppend('&_idvc=X') as a way to force the value. |
Attachment: Patch for issues #584, #536, #2031. Made for revision 5368. |
I uploaded a new patch with the changes specified. There are some new integration tests, all of which pass. Some notes about this patch: I noticed that I didn't add the appropriate report metadata for the new goals reports, so I did in this report. I made some changes to tests/integration/Main.test.php that you may want to review. There was a regression in the test that created 'test_csvExport_xp1_inner1_trans-de_ _CustomVariables.getCustomVariables_day.csv'. I did some digging, and the difference seems to be because of the extra Piwik_DataTable instances that get created for the new reports. I'm pretty sure this is the only difference, so I included it in the archive. This is a binary file, so its not part of the patch. The patch was made w/ revision 5368. |
(In [5378]) Fixes #584, #536, #2031 - Kuddos to Benaka akka capedfuzz for this great patch!!! I did a few minor modifications in wording and metadata output
Notes
|
(In [5381]) Refs #584
|
Milestone 1.8.x Piwik 1.8.x deleted |
We could provide a new report: visit count until conversion
Example
You have 1000 conversions
- 500 conversions happened on the first visit
- 200 conversions happened on the second visit
- 60 conversions happened on the third to fifth visit
- etc.
The text was updated successfully, but these errors were encountered: