-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster & Reliable Tracking: piwik.php asynchronous tracking import, by replaying piwik.php access logs every N minutes #3632
Comments
Thanks to Thomas Seifert who provides Piwik hosted, for this patch! |
Attachment: |
Thanks for the patch! Code review/ TO Dos
|
Rather than doing the job in the Recorder class, I'd rather have it in the Parser class, as it's really a parsing thing. Basically, the parser creates a Hit object that has all required properties for the Recorder to use. I'll make changes so that the Hit object has an 'args' property where you can override anything you want.
This is not very Pythonesque. Don't modify the dictionary as you iterate over it; instead, create a new one:
Why is it required to do the encode/decode? What format has the string initially? |
I know. As I already told to matt, my language of choice is PHP and for Python I'm just a beginner - you should really have a deep look into it :). I'd love to see what you make out of it.
Well, the string is directly from the logfile and without that encode/decode I got garbled characters for page titles with umlauts in german. Also you really should remove the return True in the beginning of check_http_error - I added it to get rid of some missing imported lines but from the code around it it can only be wrong. One additional enhancement in my mind would be to skip requests being not piwik.php requests when the replay is enabled. |
Could attach a log file with those umlauts? |
Attachment: Anonymized logfile with german data + umlauts |
I've just commited a small change that will let you put your code in the Parser, as I suggested: you now have a 'args' attribute in the Hit object that you can use to override your args. Regarding the Unicode thing, I'm not 100% sure this is the best way to handle it but this is really not an easy one, as there are so many parameters to think about: the log file encoding, the HTTP encoding, etc. Are we sure we always have UTF8 at this point? Does it depend on Piwik? Anyway, if it works that way, go for it ;) Send me your next diff when it's ready so I can make a proper review before it gets committed. |
I wasn't aware ot that request / proposal, and I've already opened a topic on the forum. |
Any update since 2 months ? I am also looking into this and it would be great to have the latest status of this ticket. Thanks ! |
see pull request #28 |
Reopening, pending:
|
Changeset [changeset:7e93e75012153e5f79d3bac98d9662e43b9df21d] refs this ticket. |
See new FAQ Scaling Piwik Tracking |
…ality Forcing all recorders and recorders max payload to 1, to prevent random behavior (eg. in Live.getLastVisitsDetails, the pageIdAction may be random order if recorders import data in random thread order)
…visitor) visits the website on two different days. Visitors on second day is marked as "new" because window_look_back_for_visitor is not set.
…ow look back (forceLargeWindowLookBackForVisitor=1) => The visitor is now marked as "returning" as expected
…indowLookBackForVisitor=1 in tests when replaying tracking logs
This is a performance improvement ticket.
Summary:
Power users will be able to setup Piwik where the Mysql is not required for tracking.
A script will run that will import, every N minutes (for example N=1 or N=60) the Webserver access logs into Piwik.
This script will use our log analytics tool to import the piwik.php requests.
It won't be as "real time" as before since loading logs is asynchronous, but could be set every 1 minute for near real time.
This will make Piwik tracking decoupled from Mysql, more resilient, faster, easier to scale.
The text was updated successfully, but these errors were encountered: