Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep (some) raw data to regenerate extracted values in log_visit #8955

Open
ThaDafinser opened this issue Oct 9, 2015 · 5 comments
Open
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc.

Comments

@ThaDafinser
Copy link
Contributor

The log_visit has currently many processed visitor data

  • resolution
  • operating system
  • language
  • latitude/longitude
  • provider
  • device
  • ....

All those data has one common thing: They are extracted from the $_SERVER (or similar) request data and then the original data are lost.

I think it should be possible to keep the raw data for reprocess the filling of the processed data.

Why? Lets take for example the device. Its value getting filled by the wonderful device-detector which gets better and better. But i'm not able after a device-detector upgrade to fill the missing values, because the original value is lost.

I think of a very simple solution:

  • save all data serialized in a single column in the log_visit table (LONGTEXT)
  • give the user a command line to regenerate the generated values out of the serialized data after an upgrade of the DeviceDetector or the GeoIP database.

Drawback: The size per entry takes a few amount of (k)bytes...

Thoughts?

@ThaDafinser ThaDafinser changed the title Keep (some) raw data Keep (some) raw data to regenerate extracted values in log_visit Oct 9, 2015
@tsteur
Copy link
Member

tsteur commented Oct 12, 2015

👍 👍 I don't think amount of bytes is a big deal nowadays and if so, when can still disable it or clear it, setup delete logs etc.

@ThaDafinser
Copy link
Contributor Author

The only thing i remember, when i've done that some years ago im my custom little log/analyze table was that the $_SERVER variable can get really huge, when including all.

So it should be limited to the currently useful parts.

@mattab
Copy link
Member

mattab commented Dec 23, 2015

maybe we could start reducing scope to store the user agent raw value in the visit, assuming the user agent is the most useful field.

we could not store the whole of _SERVER as we need to make sure privacy is respected, and that fields are properly sanitised such as IP address.

@mattab mattab added this to the 3.0.0 milestone Dec 23, 2015
@mattab mattab added the Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc. label Dec 23, 2015
@hpvd
Copy link

hpvd commented Feb 2, 2016

+1 for keeping raw data!

Very same direction keeping raw data
there are also some other topics very intersting:

Possibility to give visits a type like "standard", "deleted", "bot" etc. #9205

Do not delete bots but make them filterable afterwards (simple switch include or ignore them) #9067

centralized list to store visitis to ignore: bots, deleted visits, spam etc. #9184

(...and storage is becoming cheaper and faster every day, but visitor count (data production) on websites tracked with piwik is not enhancing with same speed)

@ThaDafinser
Copy link
Contributor Author

I added a really simple plugin to add a column with the serialized HTTP headers
https://github.com/ThaDafinser/Piwik-KeepVisitorHttpRawData/blob/master/Columns/KeepVisitorHttpRawData.php#L36-L51

_NOTE_
It does not care currently about the privacy settings
Nor there is already a job to reparse the headers, after an update.

It's just here for now, to get a feeling about the needed memory

@mattab mattab modified the milestones: Mid term, 3.0.0 May 27, 2016
@innocraft-automation innocraft-automation removed this from the Backlog (Help wanted) milestone Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement For new feature suggestions that enhance Matomo's capabilities or add a new report, new API etc.
Projects
None yet
Development

No branches or pull requests

5 participants