Node position inclusion in audit log path is inconsistent for field lists #3126

lognaturel · 2019-06-03T20:13:31Z

Software and hardware versions

Collect v1.22

Problem description

With the addition of change tracking for the audit, there are now events logged for questions in field lists that have value changes. When the field list is first logged, the field list node path looks something like /data/my-field-list. However, when there are changes to values in the field list, their paths include an index after the group and look like /data/my-field-list[1]/my-field.

That's because position predicates are trimmed off terminal nodes but not any others.

In general, it's unexpected to have indexes in the case of anything other than repeats since there can't be two nodes of the same name. It's particularly visible in the case of field lists because of the inconsistency.

Steps to reproduce the problem

Fill a form with groups and track changes turned on such as https://docs.google.com/spreadsheets/d/1CItY6bdAeEYPVx8CxB_e4jKF2IkZfrDQ4X3tac94TYQ/edit#gid=0

Notice that all groups have position predicates that look like [1] after them. For example: /audit-group-multiplicity/g1[1]/g2[1]/g3[1]/q3. However, /audit-group-multiplicity/g1/g2/g3/q3 would always be sufficient to identify the proper node and is easier for humans to deal with. That's presumably why the final position predicate was trimmed.

The thing that is bothersome is /audit-group-multiplicity/fl1 followed by /audit-group-multiplicity/fl1[1]/q6.

event	node	start	end	old-value	new-value
form start		1559590967376
question	/audit-group-multiplicity/g1[1]/q1	1559590967378	1559590968994		Vui
question	/audit-group-multiplicity/g1[1]/g2[1]/q2	1559590968998	1559590970568		Vj
question	/audit-group-multiplicity/g1[1]/g2[1]/g3[1]/q3	1559590970570	1559590971800		Bi
add repeat	/audit-group-multiplicity/r1[1]	1559590971802	1559590972985
question	/audit-group-multiplicity/r1[1]/q4	1559590972994	1559590974275		Gi
add repeat	/audit-group-multiplicity/r1[1]/r2[1]	1559590974276	1559590975623
question	/audit-group-multiplicity/r1[1]/r2[1]/q5	1559590975626	1559590977085		Bio
add repeat	/audit-group-multiplicity/r1[1]/r2[2]	1559590977087	1559590978264
question	/audit-group-multiplicity/r1[1]/r2[2]/q5	1559590978266	1559590979469		Gi
add repeat	/audit-group-multiplicity/r1[1]/r2[3]	1559590979471	1559590980858
add repeat	/audit-group-multiplicity/r1[2]	1559590980859	1559590981691
question	/audit-group-multiplicity/r1[2]/q4	1559590981694	1559590983047		Giyi
add repeat	/audit-group-multiplicity/r1[2]/r2[1]	1559590983048	1559590985743
question	/audit-group-multiplicity/r1[2]/r2[1]/q5	1559590985745	1559590987104		Bk
add repeat	/audit-group-multiplicity/r1[2]/r2[2]	1559590987106	1559590988892
add repeat	/audit-group-multiplicity/r1[3]	1559590988893	1559590990349
question	/audit-group-multiplicity/r1[3]/q4	1559590990351	1559590991701		O
add repeat	/audit-group-multiplicity/r1[3]/r2[1]	1559590991702	1559590993146
add repeat	/audit-group-multiplicity/r1[4]	1559590993148	1559590994690
group questions	/audit-group-multiplicity/fl1	1559590994691	1559590998470
question	/audit-group-multiplicity/fl1[1]/q6	1559590994693	1559590998470		Fu
question	/audit-group-multiplicity/fl1[1]/q7	1559590994694	1559590998470		Fj
end screen		1559590998471	1559590999618
group questions	/audit-group-multiplicity/fl1	1559590999619	1559591005209
question	/audit-group-multiplicity/fl1[1]/q6	1559590999621	1559591005209	Fu	Fp
question	/audit-group-multiplicity/fl1[1]/q7	1559590999622	1559591005209	Fj	Fg
end screen		1559591005210	1559591006270
form save		1559591006270
form exit		1559591006270
form finalize		1559591006271

Expected behavior

I'm not sure what, if anything, we should do about this. It has been this way since the audit feature was released. For someone actually using the XPath path to link an audit row to form contents, the two paths are equivalent so it doesn't matter.

For users doing visual inspection, it's just visual noise they can overlook and probably not a big deal.

But for folks who have written scripts to process the audit, changes to the node format would break their tools. Even though the field list inconsistency is bothersome, it's probably easy enough to explain.

I'm inclined not to do anything about this but wanted to think through it and point it out while the release is still fresh.

What do others think? @grzesiek2010 @yanokwa

The text was updated successfully, but these errors were encountered:

yanokwa · 2019-06-04T03:46:40Z

I'd bias to cleaning this up so there's less visual noise. Yes. we break downstream tools, but

we don't know any that exist
we can put this in a changelog to warn
it's a pretty odd script that does stuff with that info

grzesiek2010 · 2019-06-04T11:17:31Z

I agree with @yanokwa and think we can fix it. It shouldn't be very risky, it was added just in the latest version.

lognaturel · 2019-06-04T18:29:49Z

it's a pretty odd script that does stuff with that info

An analysis tool that wants to link the audit results to the question definition needs to do so. Also consider that if we make a change now, folks who have ongoing data collection will have audit files with inconsistent names for the same question. e.g. if you do some kind of aggregation over /audit-group-multiplicity/g1[1]/g2[1]/q2 and all of a sudden it becomes /audit-group-multiplicity/g1/g2/q2, you may have a problem. That could be as simple as an Excel filter over the Briefcase export that lets you see how much time was spent in q2 over several enumerators.

it was added just in the latest version.

Only logging a question that is in a field list group is new. That would be /audit-group-multiplicity/fl1[1]/q6 in my example above. However, I don't think we can change it just in that context. Currently, we can explain that all nodes after the root node have an index predicate except for the terminal node. If we remove the index predicate for field lists (e.g. /audit-group-multiplicity/fl1/q6) then we'd have to say all nodes after the root node have an index predicate except for field lists and the terminal node which feels more awkward for not much gain because any nested groups (e.g. /audit-group-multiplicity/g1[1]/g2[1]/g3[1]/q3) would still have unnecessary index predicates. Does that make sense?

lognaturel · 2019-07-01T22:54:39Z

Unfortunately I think we have to address this. Martijn was being generous when he said "that sounds intriguing and surprising" about the [1] index predicates being included with all groups here and I do think it's important we have a consistent spec. The sooner we do it, the better.

@yanokwa, is there a reasonable way to communicate this change to users? Or do you think we just do it?

I seem to remember that analytics for audit logging have been removed. I suggest that we add them back in since it's an area that has been actively evolving and having analytics helps us make better decisions.

yanokwa · 2019-07-05T02:45:02Z

I think we should proceed with this change because the longer we wait, the more painful it is. I wish we had a better way of communicating this change to folks, but we don't. The breaking change only affects a small number of people and it's on the analysis side of things. If folks run into problems, they can file a support issue and we can help fix their scripts. Also, we'll do a beta and we've been reminding folks to try betas to make sure they see breaking changes.

grzesiek2010 · 2019-07-11T11:57:51Z

I investigated the issue yesterday and my conclusion is that it's not possible to handle it on our (Collect) side...
Let's say we have a hierarchy like:

repeat_group/regular_group/question1
the result reference would be like:

repeat_group[1]/regular_group[1]/question1[1]
repeat_group[2]/regular_group[1]/question1[1]
repeat_group[3]/regular_group[1]/question1[1]

apart from removing the last [1] what we do, we are not able to go through such string and tell which node position should be removed because it's not a repeatable group.

If we really want to fix the issue we would need to do that in Javarosa adding those indexes [x] only for repeatable groups. I don't know why they are added in all cases where it's always [1]
and doesn't have much sense.

@lognaturel Any thoughts? You think we should fix it in Javarosa?

lognaturel · 2019-07-11T16:49:57Z

Yes, I agree that it needs to be done on the JR side. As a first step, it would be good to do a quick exploration to see whether a method that does what is desired already exists. I don't think so but it's worth a check. And if not, my initial sense is that TreeReference is the place to add this method. It would check the node type and only write out the index predicate (multiplicity) after a repeat. I think it should be separate from the existing toString but I think you should look for a design that makes sense to you and describe why.

It would likely also be possible to do in FormIndex or as a static method in FormController so it's worth spending a little time to think through tradeoffs.

getodk-bot · 2019-07-22T04:08:24Z

Hello @grzesiek2010, you claimed this issue to work on it, but this issue and any referenced pull requests haven't been updated for 10 days. Are you still working on this issue?

If so, please update this issue by leaving a comment on this issue to let me know that you're still working on it. Otherwise, I'll automatically remove you from this issue in 5 days.

If you've decided to work on something else, simply comment @opendatakit-bot unclaim so that someone else can claim it and continue from where you left off.

Thank you for your valuable contributions to Open Data Kit!

lognaturel mentioned this issue Jun 12, 2019

Specify the audit log CSV format getodk/xforms-spec#234

Merged

lognaturel added this to the v1.23 milestone Jul 5, 2019

grzesiek2010 self-assigned this Jul 9, 2019

getodk-bot added the in progress label Jul 9, 2019

This was referenced Jul 22, 2019

Include multiplicities only for repeatable groups in TreeReference.toString() method getodk/javarosa#469

Closed

Do not trim indexes #3227

Closed

lognaturel mentioned this issue Jul 23, 2019

In audit log, only include position predicates for repeats #3231

Merged

3 tasks

yanokwa closed this as completed in #3231 Jul 24, 2019

lognaturel mentioned this issue Jul 31, 2019

Add method to get XPath path in primary instance with index predicate only for nodes bound to repeats getodk/javarosa#472

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node position inclusion in audit log path is inconsistent for field lists #3126

Node position inclusion in audit log path is inconsistent for field lists #3126

lognaturel commented Jun 3, 2019 •

edited

Loading

yanokwa commented Jun 4, 2019

grzesiek2010 commented Jun 4, 2019

lognaturel commented Jun 4, 2019 •

edited

Loading

lognaturel commented Jul 1, 2019

yanokwa commented Jul 5, 2019 •

edited

Loading

grzesiek2010 commented Jul 11, 2019 •

edited

Loading

lognaturel commented Jul 11, 2019

getodk-bot commented Jul 22, 2019

Node position inclusion in audit log path is inconsistent for field lists #3126

Node position inclusion in audit log path is inconsistent for field lists #3126

Comments

lognaturel commented Jun 3, 2019 • edited Loading

Software and hardware versions

Problem description

Steps to reproduce the problem

Expected behavior

yanokwa commented Jun 4, 2019

grzesiek2010 commented Jun 4, 2019

lognaturel commented Jun 4, 2019 • edited Loading

lognaturel commented Jul 1, 2019

yanokwa commented Jul 5, 2019 • edited Loading

grzesiek2010 commented Jul 11, 2019 • edited Loading

lognaturel commented Jul 11, 2019

getodk-bot commented Jul 22, 2019

lognaturel commented Jun 3, 2019 •

edited

Loading

lognaturel commented Jun 4, 2019 •

edited

Loading

yanokwa commented Jul 5, 2019 •

edited

Loading

grzesiek2010 commented Jul 11, 2019 •

edited

Loading