-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-enable support for external secondary CSV instances #448
Conversation
Re-enable support for external secondary CSV instances
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please describe exactly what you did to verify that this works as intended? Did you run it in Collect? Fundamentally, I don't think the SecondaryInstanceAnalyzer
can have any effect because it runs after the primary instance has already been loaded. I wrote some other comments as I read through the code but that needs to be addressed before anything else is.
From the PR description:
Test was added to determine when an external secondary instance id is referenced in a pulldata function that the instance is parsed as Internal form instance (FormInstance not ExternalDataInstance) without loading the external file into memory.
pulldata
is not explicitly involved in any of this so I don't understand this description. Also, I don't understand what you're checking for -- if a CSV is only used by pulldata
, it should NOT be loaded into memory. It should not be parsed as an internal form instance either, it should just be entirely ignored by JavaRosa.
// Disable jr://file-csv/ support by explicitly only supporting jr://file/ | ||
// until https://github.com/opendatakit/javarosa/issues/417 is addressed | ||
if (instanceSrc != null && instanceSrc.toLowerCase().startsWith("jr://file/")) { | ||
// only build when ESI Id is not referenced in pulldata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SecondaryInstanceAnalyzer
doesn't do anything relating to pulldata
. A more accurate comment might be something like "Only load an external secondary instance file if its ID is used as a parameter to the instance
function"
// a list of SIs that need to be built in memory | ||
private final List<String> inMemorySecondaryInstances; | ||
|
||
private final Pattern INSTANCE_FUNCTION_PATTERN = Pattern.compile("instance\\s*\\(\\s*'([^\\s]{1,64})'\\s*"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This regular expression should have some corresponding tests to document its intended behavior. A comment would help too (white space is allowed after the function name, before the argument, etc). In particular, there should be an explanation for the 64 character limit.
Also note that double quotes are allowed for strings as well as single quotes.
@@ -130,6 +132,19 @@ public void parsesPreloadForm() throws IOException { | |||
assertEquals("Sample Form - Preloading", formDef.getTitle()); | |||
} | |||
|
|||
@Test | |||
public void parsesExternalSecondaryInstanceAsFormInstance() throws IOException { | |||
Path form = r("Sample-Preloading.xml"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be ideal to have a test that just isolates the case that is trying to be handled here. For example, a very narrow one could define an external CSV instance that is not referred to in an instance()
function. A slightly broader test form would have one that is and one that isn't so both of those cases can be verified.
|
||
while (elements.hasMoreElements()) { | ||
DataInstance instance = elements.nextElement(); | ||
assertTrue(instance instanceof FormInstance); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like the opposite of what is desired but perhaps some comments would help clarify. In that form, an instance is declared with <instance id="hhplotdetails" src="jr://file-csv/hhplotdetails.csv">
. However, it's never referred to as instance('hhplotdetails')
so the file should not be loaded as an in-memory instance.
// until https://github.com/opendatakit/javarosa/issues/417 is addressed | ||
if (instanceSrc != null && instanceSrc.toLowerCase().startsWith("jr://file/")) { | ||
// only build when ESI Id is not referenced in pulldata | ||
if (instanceSrc != null && instanceSrc.toLowerCase().startsWith("jr://file") && secondaryInstanceAnalyzer.shouldSecondaryInstanceBeParsed(instanceId)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will always be false because the primary instance is loaded after the secondary instances are, right? Doesn't analyzeElement
always happen after this has already been done?
Here's what I had in mind -- master...lognaturel:issue-417. This uses an existing pass over the Note that my initial description of a lazy kind of loading isn't necessary because there are multiple passes over the full document (which is an issue for performance but not something we need to address in this fix). |
@lognaturel Yes, your implementation worked exactly like mine but your approach is cleaner and without the regular expressions will perform better. I'll go ahead and close mine. |
Closes #417
What has been done to verify that this works as intended?
Test was added to determine when an external secondary instance
id
is referenced in apulldata
function that the instance is parsed as Internal form instance (FormInstance
notExternalDataInstance
) without loading the external file into memory.How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?
Enabling external secondary CSV instance without an update to FormLoaderTask will result in forms ESI without
pulldata()
reference throwingFileNotFoundException
Do we need any specific form for testing your changes? If so, please attach one.
No, the code was tested using Sample-Preloading form.
Does this change require updates to documentation? If so, please file an issue here and include the link below.
No