Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node position inclusion in audit log path is inconsistent for field lists #3126

Closed
lognaturel opened this issue Jun 3, 2019 · 8 comments · Fixed by #3231
Closed

Node position inclusion in audit log path is inconsistent for field lists #3126

lognaturel opened this issue Jun 3, 2019 · 8 comments · Fixed by #3231
Assignees
Milestone

Comments

@lognaturel
Copy link
Member

lognaturel commented Jun 3, 2019

Software and hardware versions

Collect v1.22

Problem description

With the addition of change tracking for the audit, there are now events logged for questions in field lists that have value changes. When the field list is first logged, the field list node path looks something like /data/my-field-list. However, when there are changes to values in the field list, their paths include an index after the group and look like /data/my-field-list[1]/my-field.

That's because position predicates are trimmed off terminal nodes but not any others.

In general, it's unexpected to have indexes in the case of anything other than repeats since there can't be two nodes of the same name. It's particularly visible in the case of field lists because of the inconsistency.

Steps to reproduce the problem

Fill a form with groups and track changes turned on such as https://docs.google.com/spreadsheets/d/1CItY6bdAeEYPVx8CxB_e4jKF2IkZfrDQ4X3tac94TYQ/edit#gid=0

Notice that all groups have position predicates that look like [1] after them. For example: /audit-group-multiplicity/g1[1]/g2[1]/g3[1]/q3. However, /audit-group-multiplicity/g1/g2/g3/q3 would always be sufficient to identify the proper node and is easier for humans to deal with. That's presumably why the final position predicate was trimmed.

The thing that is bothersome is /audit-group-multiplicity/fl1 followed by /audit-group-multiplicity/fl1[1]/q6.

event node start end old-value new-value
form start 1559590967376
question /audit-group-multiplicity/g1[1]/q1 1559590967378 1559590968994 Vui
question /audit-group-multiplicity/g1[1]/g2[1]/q2 1559590968998 1559590970568 Vj
question /audit-group-multiplicity/g1[1]/g2[1]/g3[1]/q3 1559590970570 1559590971800 Bi
add repeat /audit-group-multiplicity/r1[1] 1559590971802 1559590972985
question /audit-group-multiplicity/r1[1]/q4 1559590972994 1559590974275 Gi
add repeat /audit-group-multiplicity/r1[1]/r2[1] 1559590974276 1559590975623
question /audit-group-multiplicity/r1[1]/r2[1]/q5 1559590975626 1559590977085 Bio
add repeat /audit-group-multiplicity/r1[1]/r2[2] 1559590977087 1559590978264
question /audit-group-multiplicity/r1[1]/r2[2]/q5 1559590978266 1559590979469 Gi
add repeat /audit-group-multiplicity/r1[1]/r2[3] 1559590979471 1559590980858
add repeat /audit-group-multiplicity/r1[2] 1559590980859 1559590981691
question /audit-group-multiplicity/r1[2]/q4 1559590981694 1559590983047 Giyi
add repeat /audit-group-multiplicity/r1[2]/r2[1] 1559590983048 1559590985743
question /audit-group-multiplicity/r1[2]/r2[1]/q5 1559590985745 1559590987104 Bk
add repeat /audit-group-multiplicity/r1[2]/r2[2] 1559590987106 1559590988892
add repeat /audit-group-multiplicity/r1[3] 1559590988893 1559590990349
question /audit-group-multiplicity/r1[3]/q4 1559590990351 1559590991701 O
add repeat /audit-group-multiplicity/r1[3]/r2[1] 1559590991702 1559590993146
add repeat /audit-group-multiplicity/r1[4] 1559590993148 1559590994690
group questions /audit-group-multiplicity/fl1 1559590994691 1559590998470
question /audit-group-multiplicity/fl1[1]/q6 1559590994693 1559590998470 Fu
question /audit-group-multiplicity/fl1[1]/q7 1559590994694 1559590998470 Fj
end screen 1559590998471 1559590999618
group questions /audit-group-multiplicity/fl1 1559590999619 1559591005209
question /audit-group-multiplicity/fl1[1]/q6 1559590999621 1559591005209 Fu Fp
question /audit-group-multiplicity/fl1[1]/q7 1559590999622 1559591005209 Fj Fg
end screen 1559591005210 1559591006270
form save 1559591006270
form exit 1559591006270
form finalize 1559591006271

Expected behavior

I'm not sure what, if anything, we should do about this. It has been this way since the audit feature was released. For someone actually using the XPath path to link an audit row to form contents, the two paths are equivalent so it doesn't matter.

For users doing visual inspection, it's just visual noise they can overlook and probably not a big deal.

But for folks who have written scripts to process the audit, changes to the node format would break their tools. Even though the field list inconsistency is bothersome, it's probably easy enough to explain.

I'm inclined not to do anything about this but wanted to think through it and point it out while the release is still fresh.

What do others think? @grzesiek2010 @yanokwa

@yanokwa
Copy link
Member

yanokwa commented Jun 4, 2019

I'd bias to cleaning this up so there's less visual noise. Yes. we break downstream tools, but

  1. we don't know any that exist
  2. we can put this in a changelog to warn
  3. it's a pretty odd script that does stuff with that info

@grzesiek2010
Copy link
Member

I agree with @yanokwa and think we can fix it. It shouldn't be very risky, it was added just in the latest version.

@lognaturel
Copy link
Member Author

lognaturel commented Jun 4, 2019

it's a pretty odd script that does stuff with that info

An analysis tool that wants to link the audit results to the question definition needs to do so. Also consider that if we make a change now, folks who have ongoing data collection will have audit files with inconsistent names for the same question. e.g. if you do some kind of aggregation over /audit-group-multiplicity/g1[1]/g2[1]/q2 and all of a sudden it becomes /audit-group-multiplicity/g1/g2/q2, you may have a problem. That could be as simple as an Excel filter over the Briefcase export that lets you see how much time was spent in q2 over several enumerators.

it was added just in the latest version.

Only logging a question that is in a field list group is new. That would be /audit-group-multiplicity/fl1[1]/q6 in my example above. However, I don't think we can change it just in that context. Currently, we can explain that all nodes after the root node have an index predicate except for the terminal node. If we remove the index predicate for field lists (e.g. /audit-group-multiplicity/fl1/q6) then we'd have to say all nodes after the root node have an index predicate except for field lists and the terminal node which feels more awkward for not much gain because any nested groups (e.g. /audit-group-multiplicity/g1[1]/g2[1]/g3[1]/q3) would still have unnecessary index predicates. Does that make sense?

@lognaturel
Copy link
Member Author

Unfortunately I think we have to address this. Martijn was being generous when he said "that sounds intriguing and surprising" about the [1] index predicates being included with all groups here and I do think it's important we have a consistent spec. The sooner we do it, the better.

@yanokwa, is there a reasonable way to communicate this change to users? Or do you think we just do it?

I seem to remember that analytics for audit logging have been removed. I suggest that we add them back in since it's an area that has been actively evolving and having analytics helps us make better decisions.

@yanokwa
Copy link
Member

yanokwa commented Jul 5, 2019

I think we should proceed with this change because the longer we wait, the more painful it is. I wish we had a better way of communicating this change to folks, but we don't. The breaking change only affects a small number of people and it's on the analysis side of things. If folks run into problems, they can file a support issue and we can help fix their scripts. Also, we'll do a beta and we've been reminding folks to try betas to make sure they see breaking changes.

@lognaturel lognaturel added this to the v1.23 milestone Jul 5, 2019
@grzesiek2010 grzesiek2010 self-assigned this Jul 9, 2019
@grzesiek2010
Copy link
Member

grzesiek2010 commented Jul 11, 2019

I investigated the issue yesterday and my conclusion is that it's not possible to handle it on our (Collect) side...
Let's say we have a hierarchy like:

repeat_group/regular_group/question1
the result reference would be like:

repeat_group[1]/regular_group[1]/question1[1]
repeat_group[2]/regular_group[1]/question1[1]
repeat_group[3]/regular_group[1]/question1[1]

apart from removing the last [1] what we do, we are not able to go through such string and tell which node position should be removed because it's not a repeatable group.

If we really want to fix the issue we would need to do that in Javarosa adding those indexes [x] only for repeatable groups. I don't know why they are added in all cases where it's always [1]
and doesn't have much sense.

@lognaturel Any thoughts? You think we should fix it in Javarosa?

@lognaturel
Copy link
Member Author

Yes, I agree that it needs to be done on the JR side. As a first step, it would be good to do a quick exploration to see whether a method that does what is desired already exists. I don't think so but it's worth a check. And if not, my initial sense is that TreeReference is the place to add this method. It would check the node type and only write out the index predicate (multiplicity) after a repeat. I think it should be separate from the existing toString but I think you should look for a design that makes sense to you and describe why.

It would likely also be possible to do in FormIndex or as a static method in FormController so it's worth spending a little time to think through tradeoffs.

@getodk-bot
Copy link
Member

Hello @grzesiek2010, you claimed this issue to work on it, but this issue and any referenced pull requests haven't been updated for 10 days. Are you still working on this issue?

If so, please update this issue by leaving a comment on this issue to let me know that you're still working on it. Otherwise, I'll automatically remove you from this issue in 5 days.

If you've decided to work on something else, simply comment @opendatakit-bot unclaim so that someone else can claim it and continue from where you left off.

Thank you for your valuable contributions to Open Data Kit!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment