Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify the audit log CSV format #234

Merged
merged 1 commit into from
Jun 24, 2019
Merged

Conversation

lognaturel
Copy link
Member

@lognaturel lognaturel commented May 29, 2019

This spec intends to provide sufficient information for downstream tools to process client audit logs and for additional clients to create compatible logs.

I considered linking to https://tools.ietf.org/html/rfc4180 but it prescribes CRLF linebreaks and is more verbose than I think we need. I also ended up omitting "each line should contain the same number of fields throughout the file" because it seems easy enough for downstream tools to end a row on linebreak. Should it be included?

I verified this by running bundle exec jekyll serve and looking at the generated site in Chrome.

Perhaps I should consolidate the templates for sub specs now that we have two. Or maybe we wait for one more? 😬

CC @grzesiek2010

@MartijnR
Copy link
Contributor

MartijnR commented Jun 7, 2019

Thank you!

Perhaps I should consolidate the templates for sub specs now that we have two. Or maybe we wait for one more? 😬

Unless, you are really eager, I the 3rd sub-spec would be a good time! :)

@lognaturel
Copy link
Member Author

I came to an unfortunate realization at getodk/collect#3126. The way the XPath paths are written out by Collect is idiosyncratic: group names are always followed by a position predicate ([1]) except in the case of terminal groups with the field-list appearance. That is, /audit-group-multiplicity/g1[1]/g2[1]/g3[1]/q3 instead of the more expected /audit-group-multiplicity/g1/g2/g3/q3 for a question in three levels of groups.

Those paths are not incorrect so it doesn't matter for analysis tools that actually speak XPath. We do know that folks are writing their own analysis tools or doing quick visual checks, though, and for them it's important that the exact node names be consistent, I think. And ideally, that would include consistency between clients.

I see a couple of options:

  • document that position predicates are always included for container form control (groups and repeats) except for terminal groups with the field-list appearance
  • change what Collect does to not have those position predicates. This would break some things but it's certainly easier to do now than later.

@MartijnR, do you have a strong preference one way or the other? A different option I'm missing?

@MartijnR
Copy link
Contributor

MartijnR commented Jun 12, 2019

except for terminal groups with the field-list appearance

That sounds intriguing and surprising. Maybe I don't understand.

But in any case, my preference would be to not use position predicates except for repeated groups. For repeated groups it might be helpful to always include positions (even if only 1 instance)

@MartijnR
Copy link
Contributor

MartijnR commented Jun 12, 2019

Actually, this brings up another point that we had to resolve in OpenClinica's fork (though not related to logs for them). You can remove a repeat instance (e.g. the second, and then later create a new second instance, or the already existing third instance becomes the second). To deal with this (in audit log data) we added an immutable ordinal attribute to repeats, as well as a last-used-ordinal (to keep track of this in case the last repeat instance is removed). Both are in http://enketo.org/xforms namespace.

@lognaturel
Copy link
Member Author

That sounds intriguing and surprising. Maybe I don't understand.

Indeed. Imagine that you have a method that outputs paths with position predicates for every node and then that you have a client that does navigation with page flips. It looks like the position predicates were stripped off nodes that correspond to pages -- questions or field lists. I'm kicking myself for somehow not noticing.

my preference would be to not use position predicates except for repeated groups.

Yes, I agree.

For repeated groups it might be helpful to always include positions.

Agreed.

You can remove a repeat instance (e.g. the second, and then later create a new second instance, or the already existing third instance becomes the second).

I think we had considered the shifting repeat positions not worth a special case but it's true that it's now much more difficult to do things like calculate how much time was spent in a specific repeat instance or track answer changes in a specific repeat instance.

To deal with this (in audit log data) we added an ordinal attribute to repeats.

Can you say a bit more about this? Do you mean Enketo core keeps counters of repeat instances over the lifetime of a record and includes an ID from that count in the log?

@MartijnR
Copy link
Contributor

It's like this:

<repeat enk:last-used-ordinal="4" enk:ordinal="1">
   ...
</repeat>
<repeat enk:ordinal="3">
   ...
</repeat>

The ordinals are 1-based and are stored in the model/submission, so loading a submitted record for editing maintains the state.

In the above example 4 <repeat> elements were created and the 2nd and 4th were deleted. A new repeat would get ordinal 5.

P.S. OC has a higher-than-normal need for this, because each new value or new repeat is immediately submitted to the server as a fragment (and not the final complete record). I wonder if that is also meant to be trackable in the audit logs (all the intermediate states before submission).

@lognaturel
Copy link
Member Author

For repeated groups it might be helpful to always include positions (even if only 1 instance)

@MartijnR and I discussed this briefly in Slack and agreed to always include positions for repeats. This avoids the case where the first repeat instance is initially logged as /data/repeat but then if edits are made to it after more repeat instances are added, it's logged as /data/repeat[1].

Filed #248

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants