-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-enable support for external secondary CSV instances without breaking pulldata()/search() #417
Comments
I'm starting to wonder whether it might not make sense to hold back on fixing this with a workaround until we can do it "properly", i.e. by making Or am I putting the cart before the horse? Is the requirement to get this working urgent enough to warrant an interim solution? |
@lognaturel ☝️❓ |
I'd say it's not an urgent requirement. It's annoying to have a feature that's documented without being actually implemented -- in this case specifying a CSV secondary instance that can be XPath-queried -- but it's probably not something a ton of people are waiting for. I think there's something to learn from exploring options here that may be applicable to a broader solution but there's certainly something to be said for first doing some high-level design work first. |
In the mean time, here are some more details on reproduction and what should actually happen:
I haven't done this myself but once the issue with the connector is resolved on the Collect side of things, the form should work. However, the secondary CSV instance will be loaded in memory which should NOT be the case because there is no call on external_csv.zip is an example of a form that should use the new external secondary CSV instance feature and load the contents of all three csvs into memory because the form body refers to all three of the instances. |
@musamusa, I think you were interested in the information above. If you're still looking into it, are you now able to reproduce the issue? |
@lognaturel Yes, I was able to reproduce the issue after the call. The steps above also added some more explanations. |
I started looking into this again today, and i just want to clarify something that's occurred to me: So far we've been talking the about the problem that Pulldata CSV files are unnecessarily loaded into memory because they're incorrectly treated as if they're an External Secondary Instance (ESI). But isn't there also second, parallel problem whereby genuine ESI CSVs will be automatically read into database tables because they're incorrectly treated as if they're a Pulldata file? Creating an unnecessary database table is, I guess, less of a performance overhead than holding the nodes in memory, but it still seems suboptimal. |
As you say, @adamvert, the performance hit for building the databases is much lower. Additionally, once JR can decide which instances to process, it should be straightforward for Collect to find out whether it needs to do anything. |
+1 in regards to native JavaRosa |
#397 added support for parsing a CSV as a secondary instance. It was then disabled in #416 because it broke search/pulldata in Collect.
We will need to figure out a way to add this support back in without interfering with the existing implementations. These may eventually be deprecated but since they solve a real problem and are in broad use, there needs to be an acceptable, suitably performant alternative in place first. Even if/when they are officially marked for deprecation, I imagine they will stay in for quite a while.
@ggalmazor rightfully pointed out here that the Collect implementation of search/pulldata doesn't use the instance declared in the form.
pyxform
adds that instance declaration whenpulldata
is used (XLSForm/pyxform#43). This is because clients that do things in a spec-compliant way (e.g. Enketo), need to know what instance to load. Collect just reads any CSV attached with the form into a database ("magic").Since the
pulldata
function can be thought of as an abstraction on top of XPath queries (kind of likeindexed-repeat
), it's easy for clients that do load external secondary CSV instances to implement, hence the instance declaration added by pyxform so they can support it. For that reason, I think it will likely become an official part of the spec.search
is strange all around, starting from it being an appearance.Collect search/pulldata are very performant. It's important that the extra overhead of loading the CSV instance doesn't occur if the user intends to use those implementations. That is, if the form has the following instance declaration:
<instance id="foo" src="jr://file-csv/foo.csv" />
, the CSV should only actually be loaded if the form definition includesinstance("foo")
. I'm not immediately seeing a way to do that since secondary instances are parsed before the main instance. Maybe secondary instances could be "lazy loaded" the first time a reference to them is seen in the main instance instead of being parsed upfront?There's an additional issue uncovered by #397:
jr://file
is a substring ofjr://file-csv
and the connector matching is done by looking for a prefix. I think this could be fixed by looking for the connector followed by a/
or something like that.The text was updated successfully, but these errors were encountered: