Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-enable support for external secondary CSV instances #448

Closed

Conversation

musamusa
Copy link

Closes #417

What has been done to verify that this works as intended?

Test was added to determine when an external secondary instance id is referenced in a pulldata function that the instance is parsed as Internal form instance (FormInstance not ExternalDataInstance) without loading the external file into memory.

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

Enabling external secondary CSV instance without an update to FormLoaderTask will result in forms ESI without pulldata() reference throwing FileNotFoundException

Do we need any specific form for testing your changes? If so, please attach one.

No, the code was tested using Sample-Preloading form.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

No

@musamusa musamusa changed the title Enable csv esi issue 417 Re-enable support for external secondary CSV instances Jun 26, 2019
Copy link
Member

@lognaturel lognaturel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please describe exactly what you did to verify that this works as intended? Did you run it in Collect? Fundamentally, I don't think the SecondaryInstanceAnalyzer can have any effect because it runs after the primary instance has already been loaded. I wrote some other comments as I read through the code but that needs to be addressed before anything else is.

From the PR description:

Test was added to determine when an external secondary instance id is referenced in a pulldata function that the instance is parsed as Internal form instance (FormInstance not ExternalDataInstance) without loading the external file into memory.

pulldata is not explicitly involved in any of this so I don't understand this description. Also, I don't understand what you're checking for -- if a CSV is only used by pulldata, it should NOT be loaded into memory. It should not be parsed as an internal form instance either, it should just be entirely ignored by JavaRosa.

// Disable jr://file-csv/ support by explicitly only supporting jr://file/
// until https://github.com/opendatakit/javarosa/issues/417 is addressed
if (instanceSrc != null && instanceSrc.toLowerCase().startsWith("jr://file/")) {
// only build when ESI Id is not referenced in pulldata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SecondaryInstanceAnalyzer doesn't do anything relating to pulldata. A more accurate comment might be something like "Only load an external secondary instance file if its ID is used as a parameter to the instance function"

// a list of SIs that need to be built in memory
private final List<String> inMemorySecondaryInstances;

private final Pattern INSTANCE_FUNCTION_PATTERN = Pattern.compile("instance\\s*\\(\\s*'([^\\s]{1,64})'\\s*");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regular expression should have some corresponding tests to document its intended behavior. A comment would help too (white space is allowed after the function name, before the argument, etc). In particular, there should be an explanation for the 64 character limit.

Also note that double quotes are allowed for strings as well as single quotes.

@@ -130,6 +132,19 @@ public void parsesPreloadForm() throws IOException {
assertEquals("Sample Form - Preloading", formDef.getTitle());
}

@Test
public void parsesExternalSecondaryInstanceAsFormInstance() throws IOException {
Path form = r("Sample-Preloading.xml");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be ideal to have a test that just isolates the case that is trying to be handled here. For example, a very narrow one could define an external CSV instance that is not referred to in an instance() function. A slightly broader test form would have one that is and one that isn't so both of those cases can be verified.


while (elements.hasMoreElements()) {
DataInstance instance = elements.nextElement();
assertTrue(instance instanceof FormInstance);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the opposite of what is desired but perhaps some comments would help clarify. In that form, an instance is declared with <instance id="hhplotdetails" src="jr://file-csv/hhplotdetails.csv">. However, it's never referred to as instance('hhplotdetails') so the file should not be loaded as an in-memory instance.

// until https://github.com/opendatakit/javarosa/issues/417 is addressed
if (instanceSrc != null && instanceSrc.toLowerCase().startsWith("jr://file/")) {
// only build when ESI Id is not referenced in pulldata
if (instanceSrc != null && instanceSrc.toLowerCase().startsWith("jr://file") && secondaryInstanceAnalyzer.shouldSecondaryInstanceBeParsed(instanceId)) {
Copy link
Member

@lognaturel lognaturel Jun 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will always be false because the primary instance is loaded after the secondary instances are, right? Doesn't analyzeElement always happen after this has already been done?

@lognaturel
Copy link
Member

lognaturel commented Jun 28, 2019

Here's what I had in mind -- master...lognaturel:issue-417. This uses an existing pass over the Element tree, hooks into the parser code so no regular expressions are needed, and only looks for instance function calls when we're known to be in a function call. I don't love that XPathFuncExpr makes a static XFormParser call but given that it also calls other static methods, I think it's ok. Let me know what you think of that approach and if it's looking ok I'll add tests and get it done.

Note that my initial description of a lazy kind of loading isn't necessary because there are multiple passes over the full document (which is an issue for performance but not something we need to address in this fix).

@musamusa
Copy link
Author

Let me know what you think of that approach and if it's looking ok I'll add tests and get it done.

@lognaturel Yes, your implementation worked exactly like mine but your approach is cleaner and without the regular expressions will perform better. I'll go ahead and close mine.

@musamusa musamusa closed this Jun 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Re-enable support for external secondary CSV instances without breaking pulldata()/search()
2 participants