Specify the audit log CSV format #234

lognaturel · 2019-05-29T00:44:23Z

This spec intends to provide sufficient information for downstream tools to process client audit logs and for additional clients to create compatible logs.

I considered linking to https://tools.ietf.org/html/rfc4180 but it prescribes CRLF linebreaks and is more verbose than I think we need. I also ended up omitting "each line should contain the same number of fields throughout the file" because it seems easy enough for downstream tools to end a row on linebreak. Should it be included?

I verified this by running bundle exec jekyll serve and looking at the generated site in Chrome.

Perhaps I should consolidate the templates for sub specs now that we have two. Or maybe we wait for one more? 😬

CC @grzesiek2010

MartijnR · 2019-06-07T17:42:33Z

Thank you!

Perhaps I should consolidate the templates for sub specs now that we have two. Or maybe we wait for one more? 😬

Unless, you are really eager, I the 3rd sub-spec would be a good time! :)

lognaturel · 2019-06-12T22:39:55Z

I came to an unfortunate realization at getodk/collect#3126. The way the XPath paths are written out by Collect is idiosyncratic: group names are always followed by a position predicate ([1]) except in the case of terminal groups with the field-list appearance. That is, /audit-group-multiplicity/g1[1]/g2[1]/g3[1]/q3 instead of the more expected /audit-group-multiplicity/g1/g2/g3/q3 for a question in three levels of groups.

Those paths are not incorrect so it doesn't matter for analysis tools that actually speak XPath. We do know that folks are writing their own analysis tools or doing quick visual checks, though, and for them it's important that the exact node names be consistent, I think. And ideally, that would include consistency between clients.

I see a couple of options:

document that position predicates are always included for container form control (groups and repeats) except for terminal groups with the field-list appearance
change what Collect does to not have those position predicates. This would break some things but it's certainly easier to do now than later.

@MartijnR, do you have a strong preference one way or the other? A different option I'm missing?

MartijnR · 2019-06-12T22:55:38Z

except for terminal groups with the field-list appearance

That sounds intriguing and surprising. Maybe I don't understand.

But in any case, my preference would be to not use position predicates except for repeated groups. For repeated groups it might be helpful to always include positions (even if only 1 instance)

MartijnR · 2019-06-12T22:58:17Z

Actually, this brings up another point that we had to resolve in OpenClinica's fork (though not related to logs for them). You can remove a repeat instance (e.g. the second, and then later create a new second instance, or the already existing third instance becomes the second). To deal with this (in audit log data) we added an immutable ordinal attribute to repeats, as well as a last-used-ordinal (to keep track of this in case the last repeat instance is removed). Both are in http://enketo.org/xforms namespace.

lognaturel · 2019-06-12T23:29:54Z

That sounds intriguing and surprising. Maybe I don't understand.

Indeed. Imagine that you have a method that outputs paths with position predicates for every node and then that you have a client that does navigation with page flips. It looks like the position predicates were stripped off nodes that correspond to pages -- questions or field lists. I'm kicking myself for somehow not noticing.

my preference would be to not use position predicates except for repeated groups.

Yes, I agree.

For repeated groups it might be helpful to always include positions.

Agreed.

You can remove a repeat instance (e.g. the second, and then later create a new second instance, or the already existing third instance becomes the second).

I think we had considered the shifting repeat positions not worth a special case but it's true that it's now much more difficult to do things like calculate how much time was spent in a specific repeat instance or track answer changes in a specific repeat instance.

To deal with this (in audit log data) we added an ordinal attribute to repeats.

Can you say a bit more about this? Do you mean Enketo core keeps counters of repeat instances over the lifetime of a record and includes an ID from that count in the log?

MartijnR · 2019-06-13T15:28:49Z

It's like this:

<repeat enk:last-used-ordinal="4" enk:ordinal="1">
   ...
</repeat>
<repeat enk:ordinal="3">
   ...
</repeat>

The ordinals are 1-based and are stored in the model/submission, so loading a submitted record for editing maintains the state.

In the above example 4 <repeat> elements were created and the 2nd and 4th were deleted. A new repeat would get ordinal 5.

P.S. OC has a higher-than-normal need for this, because each new value or new repeat is immediately submitted to the server as a fragment (and not the final complete record). I wonder if that is also meant to be trackable in the audit logs (all the intermediate states before submission).

lognaturel · 2019-07-22T22:32:15Z

For repeated groups it might be helpful to always include positions (even if only 1 instance)

@MartijnR and I discussed this briefly in Slack and agreed to always include positions for repeats. This avoids the case where the first repeat instance is initially logged as /data/repeat but then if edits are made to it after more repeat instances are added, it's logged as /data/repeat[1].

Filed #248

Specify the audit log CSV format

8268a57

lognaturel requested review from MartijnR and ggalmazor May 29, 2019 00:44

This was referenced May 29, 2019

Remove spaces between commas in CSVs getodk/collect#3111

Merged

Escape newlines and quotes for CSV output getodk/collect#3113

Merged

MartijnR approved these changes Jun 7, 2019

View reviewed changes

yanokwa merged commit aaea6b8 into getodk:gh-pages Jun 24, 2019

lognaturel deleted the audit-details branch June 24, 2019 22:35

yanokwa mentioned this pull request Jun 24, 2019

Match formatting of other specs #236

Merged

lognaturel mentioned this pull request Jun 24, 2019

Missing commits in release notes getodk/release-notes-generator#7

Closed

lognaturel mentioned this pull request Jul 1, 2019

Node position inclusion in audit log path is inconsistent for field lists getodk/collect#3126

Closed

lognaturel mentioned this pull request Jul 23, 2019

Proposal: add-on section describing format for csv log #102

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify the audit log CSV format #234

Specify the audit log CSV format #234

lognaturel commented May 29, 2019 •

edited

Loading

MartijnR commented Jun 7, 2019

lognaturel commented Jun 12, 2019

MartijnR commented Jun 12, 2019 •

edited

Loading

MartijnR commented Jun 12, 2019 •

edited

Loading

lognaturel commented Jun 12, 2019

MartijnR commented Jun 13, 2019

lognaturel commented Jul 22, 2019

Specify the audit log CSV format #234

Specify the audit log CSV format #234

Conversation

lognaturel commented May 29, 2019 • edited Loading

MartijnR commented Jun 7, 2019

lognaturel commented Jun 12, 2019

MartijnR commented Jun 12, 2019 • edited Loading

MartijnR commented Jun 12, 2019 • edited Loading

lognaturel commented Jun 12, 2019

MartijnR commented Jun 13, 2019

lognaturel commented Jul 22, 2019

lognaturel commented May 29, 2019 •

edited

Loading

MartijnR commented Jun 12, 2019 •

edited

Loading

MartijnR commented Jun 12, 2019 •

edited

Loading