Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculation using node from nested repeat has absolute reference #453

Open
MartijnR opened this issue Jun 22, 2020 · 7 comments
Open

Calculation using node from nested repeat has absolute reference #453

MartijnR opened this issue Jun 22, 2020 · 7 comments
Labels

Comments

@MartijnR
Copy link
Contributor

MartijnR commented Jun 22, 2020

type name label calculation
begin_repeat plot_registration Plot registration  
geoshape plot_shape Go around the plot  
begin_repeat species_registration Species registration  
integer species_nb Species number  
end_repeat      
note total_nb_of_trees_allocated_info Total  number of trees allocated to be planted sum(${species_nb})
end_repeat      

The output for the calculation is (which sums all species_nb question in the form) :

calculate="sum( /data/plot_registration/species_registration/species_nb )"`

In this case it would be safe to assume the users wants to sum only the species_nb inside an instance of the higher level each plot_registration repeat.

Hence the output should be:

calculate="sum( ../species_registration/species_nb )"`
@MartijnR MartijnR added the bug label Jun 22, 2020
@lognaturel
Copy link
Contributor

lognaturel commented Jun 22, 2020

In this case it would be safe to assume the users wants to sum only the species_nb inside an instance of the higher level each plot_registration repeat.

When you say "in this case", you mean with this structure regardless of the usage, right? And so then if a user actually wanted to get the sum of all species_nb across all plot_registrations they could explicitly write sum(../../plot_registration/species_registration/species_nb) in their XLSForm.

Let me try to generalize a little bit. Let's say there's an expression with a ${} reference. That expression is in the context of some node a ('calculate', 'relevant', whatever). There's a repeat repeat which is a sibling of a. The reference refers to a node which is in repeat. If a is in a repeat itself, then the ${} reference should be expanded to something relative. Does this match what you're thinking? Is it the case that if a is at the top level of the form (not in a repeat), then the reference should not be relative? Could it be? I think this might be hard to implement given that there could also be any number of groups nesting in there.

@MartijnR
Copy link
Contributor Author

with this structure regardless of the usage, right?

Yes, indeed. It seems that would nearly always be the intention, so best to make that the rule (and require the user to use a manual absolute path (or the different relative path you mentioned, or //species_nb) to circumvent).

Does this match what you're thinking? Is it the case that if a is at the top level of the form (not in a repeat), then the reference should not be relative? Could it be? I think this might be hard to implement given that there could also be any number of groups nesting in there.

Thanks for trying figure out the rules for this. In terms of implementation are we trying to somewhat minimize the number of relative paths in the output?

If so, then I wonder if for any ${node} reference from question A should be a (shortest-route) relative path if any of the repeat ancestors of both ${node} and A are the same (i.e. if the highest level ancestor repeat is the same).

If not, everything could be a (shortest-route) relative path, always? Would that actually simplify pyxform?

@lognaturel
Copy link
Contributor

are we trying to somewhat minimize the number of relative paths in the output?

I don't know? 😄 I think it would look strange to have relative paths from top-level nodes but maybe it doesn't matter?

Your shortest-route option would mean we might end up with something like <bind nodeset="/data/b" calculate="../a">, right? Are there any implications to that? I don't think I've ever seen it but seems it should be fine. It's more work for clients but I think it would be a trivial amount.

@MartijnR
Copy link
Contributor Author

It feels quite radical to make everything relative, but rationally, I guess it should be fine. It might even have a minor performance improvement since the context node has to be looked up anyway, and searches from there will often only be a few steps up/down/sideways.

Maybe we could let it be directed by whoever implements this? If it simplifies the code to make every ${node} reference relative, go for it and otherwise do:

any ${node} reference from question A should be a (shortest-route) relative path if any of the repeat ancestors of both ${node} and A are the same (i.e. if the highest level ancestor repeat is the same)

@lognaturel
Copy link
Contributor

@sadiqkhoja and I just talked through a form with exactly this shape and I was sure pyxform was going to generate the relative reference. I still agree that it's the more expected behavior and that the rule outlined at #453 (comment) feels right.

@lindsay-stevens when you get a chance could you please see what you think the level of effort might be (like maybe spend an hour or two on it and back out if it's looking like it will take much longer)? Nested repeats are not super common so if it's a huge undertaking it may not be worth it. If it's not as complicated as I thought when I looked at it, it would be a nice improvement for those who write forms like this.

@lognaturel lognaturel added this to the Next milestone Nov 1, 2024
@lindsay-stevens
Copy link
Contributor

@lognaturel as of pyxform 2.2.0 the behaviour is still the same. This XLSForm:

| survey |
|        | type         | name | label | calculation |
|        | begin repeat | a    | A     |             |
|        | text         | b    | B     |             |
|        | begin repeat | c    | C     |             |
|        | integer      | d    | D     |             |
|        | end repeat   |      |       |             |
|        | note         | e    | E     | sum(${d})   |
|        | end repeat   |      |       |             |

Produces a match for:

/h:html/h:head/x:model/x:bind[@nodeset = '/test_name/a/e' and @calculate = 'sum( /test_name/a/c/d )']

Maybe there is a quick fix for this but I am not super familiar with this part of the pipeline. It seems a bit confusing, I mean from Survey._relative_path there seems to be a lot of tree iteration, string splitting/joining etc, to subsequently lookup items from an XPath. There is a decent amount of tests though in test_repeat.py to help de-risk changes. Also these magic numbers in Survey._is_return_relative_path seemed strange like maybe there's only 2 levels of nesting allowed (added via #484)?

indexed_repeat_relative_path_args_index = [0, 1, 3, 5]

So it would take me a little while to fully understand how this part of the pipeline works now. Then I would probably want to try and untangle the iteration (for performance) and/or replace with the newer parser (for consistency). Or work out the correct relative XPath from the Survey object tree directly, rather than e.g. parsing an XPath that came from a placeholder/reference substitution (because in order to do that substitution the XPath is calculated from the objects).

@lognaturel
Copy link
Contributor

lognaturel commented Nov 4, 2024

It seems a bit confusing

Agreed.

indexed_repeat_relative_path_args_index = [0, 1, 3, 5]

This is related to indexed-repeat which can be used in place of XPath expressions to reference repeats -- https://getodk.github.io/xforms-spec/#fn:indexed-repeat It's similar to pulldata in that it replaces the verbosity and scary characters of XPath with arguments you need to remember the order of. In this context, it looks like only the arguments that represent references are being replaced with relative paths. I haven't looked at this implementation in a long time but my immediate reaction is that I find this surprising. I guess maybe the index expressions don't usually need to have any special treatment because they look like position(..) for example. I wouldn't be surprised if that array could be removed.

I see there's at least one bug related to indexed-repeat and relative vs. absolute paths: #569

Thanks for taking a look! Let me take this out of the Next milestone for now. I think cleaning up this area in the not too distant future would be good but let's prioritize some of the other nice quality of life improvements we have in the milestone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants