-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support external secondary XML instances #275
Comments
@MartijnR We want to make sure issues that are filed are not just a proposed solution, but also a good explanation of what the core problem is and why it matters. This way, we can have a grounded discussion about the implementation. So with that in mind, why do people need this feature and what does it enable? Example scenarios from Enketo and CommCare would help! |
The core problem for ODK Collect would be the inability to add external data that can be used anywhere in any XPath expression. Currently, only a few specific use cases are supported (with CSV files) and those involve a fair bit of magic (ie. the file is magically available, as there is no reference to it in the XForm), are inflexible, and unnecessarily deviate from XForms. This feature would explode the potential uses of external data by allowing one to query an external data file with regular XPath predicates. For example, if the external data file is a list of camps and households (from a previous survey) and you'd like to:
There are no limitations on how this can be used, and there is no magic. An external secondary instance is the same as an internal secondary instance from the XPath evaluator's point of view, which is why it's so powerful. CommCare is using external data to extend XForms with XML documents that have their own specifications, such as session variables (to replace pre-load items), cases, fixtures, ledgers. Some info here. Pinging @ctsims in case he would like to add info. In addition to the immediate power gained, this can be seen as a 1st step to build future features upon such as:
|
Hi Martijn,
Confirming that this has been a super important part of our build-out, and
incredibly flexible overall.
We've been able to largely use the instance() interface for the best of
both worlds. We implement custom tree elements for DB-backed lookups that
are efficient across thousands of elements, but the resulting nodeset
output can still be treated as any other path, IE:
https://github.com/dimagi/commcare-core/blob/master/src/main/java/org/commcare/cases/instance/CaseInstanceTreeElement.java
if we do
count(instance('casedb')/casedb/case[@case_type = 'FOO'][complex_property *
30 < today()])
the app is able to detect that it can do the (@case_type = 'foo') part with
a storage lookup, while not restricting the user from continuing to use
complex predicate expressions for the second filter. Your CSV example is a
good one, it would be super trivial to have an instance that is doing a CSV
lookup in the background, but still behaves like an XML document in the
engine.
The trade-offs have mostly broken down to
1. It's super hard to infer/reason about what will and won't be slow, so
users can quite trivially turn an O(1) operation into an O(n^2) operation
with minor changes.
2. We are still having to frequently do performance passes on super
basic raw XPath evaluation units. String comparisons are expensive no
matter how you cut it, and it's hard to cut some of them out from high-n
comparisons
3. Nesting XPath predicates introduces an odd dynamic occasionally where
only two levels of context are available (original, and current), which can
make certain complex nested lookups surprisingly hard to express.
2 and 3 have been very tenable for us to manage overall, and #1 has been
fine for our basic users (since most people don't need to think about it),
but has been a bit tough to deal with for very large projects where the
difference between O(N) and O(N^2) don't become distinct until N gets
pretty big and the design decision is hard to take back.
…-Clayton
On Tue, Dec 6, 2016 at 3:33 PM, Martijn van de Rijdt < ***@***.***> wrote:
The core problem for ODK Collect would be the inability to add external
data that can be used anywhere in any XPath expression. Currently, only a
few specific use cases are supported (with CSV files) and those involve a
fair bit of magic (ie. the file is magically available, as there is no
reference to it in the XForm), are inflexible, and unnecessarily deviate
from XForms.
This feature would explode the potential uses of external data by allowing
one to query an external data file with regular XPath predicates. For
example, if the external data file is a list of camps and households (from
a previous survey) and you'd like to:
- find out how many households there are in a particular camp
***@***.***=“Gerihun”]/household)
- what the sum of all household members is in a particular camp
***@***.***=“Gereida”]/household/members)
There are no limitations on how this can be used, and there is no magic.
An external secondary instance is the same as an internal secondary
instance from the XPath evaluator's point of view, which is why it's so
powerful.
CommCare is using external data to extend XForms with XML document that
have their own specifications, such as session variables (to replace
pre-load items), cases, fixtures, ledgers. Some info here. Pinging @ctsims
<https://github.com/ctsims> in case he would like to add info.
In addition to the immediate power gained, this can be seen as 1st step to
build future features upon such as:
1. CSV support, with the same query power
2. After previous item: fast itemsets for multi-select questions
3. XPath functions to facilitate easier querying (that simply
translate into complex XPath queries)
4. Domain-specific extensions (custom ports)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#275 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAJdugd5h7Fpdu-q_k7F4OdG6wV8hsLXks5rFcaXgaJpZM4K_Sr1>
.
|
Closed in getodk/xforms-spec#50 and with @dcbriccetti's ongoing work at getodk/javarosa#16! |
Like CommCare and Enketo.
Instead of an internal secondary instance:
You could use a (dynamic or static) external secondary instance published in the XForms manifest:
This instance can be queried just like with an 'internal' secondary instance in any XPath expression (contraints, calculations, relevants, requireds, itemsets).
Test XForm
Test External data files
(as a follow-up we could extend this feature by also supporting CSV files but better covered as a separate issue).
The text was updated successfully, but these errors were encountered: