Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: external CSV data support #88

Open
MartijnR opened this issue Feb 6, 2017 · 9 comments
Open

Proposal: external CSV data support #88

MartijnR opened this issue Feb 6, 2017 · 9 comments

Comments

@MartijnR
Copy link
Contributor

MartijnR commented Feb 6, 2017

Now that we have a way to add external XML data, I'd like to propose to extend this by adding a way to add external CSV data. This proposal aims to meet these 2 requirements:

  1. The CSV data URI is referred to in an XForm to 'load' it (to avoid magic).
  2. The data can be queried with the full power of XPath (to avoid requiring a secondary language to use CSV data).

I think we'd end up inventing a CSV query language replacement for XPath if we don't use XPath.

The first part could be met by adding a jr://file-csv connector in the same style as jr://file, jr://image etc:

<instance id="households" src="jr://file-csv/households.csv"/>

The second part could be accomplished by defining a fixed transformation from CSV to XML. I'd like to propose to use the transformation that is identical to the one pyxform performs for the choices sheet when it creates a secondary XML instance for an <itemset> which is as follows.

name label rooms
0001 Johnson 2
0034 Doe 5

The above "csv" (presented as a table) is transformed into the following XML (children of <instance id="..."/>):

<root>
    <item>
          <name>0001</name>
          <label>Johnson</label>
          <rooms>2</rooms>
   </item>
   <item>
          <name>0034</name>
          <label>Doe</label>
          <rooms>5</rooms>
    </item> 
</root>

P.S. whether these instances are dealt with internally as actual XML Documents or virtually e.g. a database table/document (as CommCare does) is up to the client and not part of the spec.

@MartijnR
Copy link
Contributor Author

MartijnR commented Mar 6, 2017

I'd like to expand on this by also providing a way to add translations to external data as follows:

name label::English label::Français rooms
0001 Johnson Le Johnson 2
0034 Doe Le Doe 5

The above "csv" (presented as a table) is transformed into the following XML (children of <instance id="..."/>):

<root>
    <item>
          <name>0001</name>
          <label lang="English">Johnson</label>
          <label lang="Français">Le Johnson</label>
          <rooms>2</rooms>
   </item>
   <item>
          <name>0034</name>
          <label lang="English">Doe</label>
          <label lang="Français">Le Doe</label>
          <rooms>5</rooms>
    </item> 
</root>

@MartijnR
Copy link
Contributor Author

Being able to use full XPath for CSV data we could support very complex CSV data queries, e.g.

@lognaturel
Copy link
Member

Yes, that would be very good indeed. A priori, the proposed syntax and approach seems good. Is it really necessary to have a different connector? Why not also use jr://file and let the clients figure out how to process it based on the type?

@dcbriccetti and @mdudzinski, you've been thinking about external secondary instances in the context of the JR implementation. What do you think about this extension? The proposal to add a jr://file-csv connector?

@dcbriccetti
Copy link

I haven’t read this issue yet, but I’ll mention now, for what it’s worth, that the preload data sample form linked to from here uses file-csv.

@lognaturel
Copy link
Member

@dcbriccetti Can you say a little bit more about what you mean? JavaRosa does have a way to query arbitrary side loaded CSVs as described in the page you linked to but it doesn't allow for complex queries and the filename isn't included in the form which isn't very XForm-ish. That's why we're exploring an alternate approach that would be more in line with the rest of the specification.

On the JavaRosa side there could possibly be some overlap in the implementation though that will be a separate conversation. I'm not seeing file-csv anywhere in the JavaRosa source code at the moment so I'm not totally sure what you're referring to.

@MartijnR does the Dimagi specification use the jr://file-csv connector? I'm not finding it immediately.

@lognaturel
Copy link
Member

lognaturel commented Aug 21, 2017

I'm sorry, @dcbriccetti, I totally forgot that the preload implementation does indeed already use the jr://file-csv connector. You're absolutely right. So that seems like it's definitely the right way to go and I apologize for confusing things. @MartijnR, do you have a quick recollection of the intention there? Is jr://file meant to be used only for XML? Was there a particular reason to introduce different connectors for different file types? Was it to match jr://audio, jr://images, etc?

Now that I see jr://file-csv already exists, the potentially contentious part of this proposal is enabling csv files to be queried with XPath expressions. Currently, pulldata is used to pull values out of CSVs in a very simple way. @MartijnR described the problems with that here and the recent forum post that @MartijnR links to above shows that there is some user demand for more complex querying (though combining keys is a simple approach that can help in many contexts). The rest of that original thread about this is also insightful.

Based on all this, I'm in favor of making this an official part of the specification. I think @MartijnR has made a strong case for it and it does make sense at an ecosystem-level to have a way to interact with CSV external instances that is consistent with XML external instances.

I believe this addition only affects clients implementing this spec since the jr://file-csv is already accepted and works in pyxforms. And looking through old issues suggests Kobo (@dorey) and Ona (@ukanga) already have some awareness of this approach and are on board.

Implementation-wise, XPath querying of external CSV instances won't be available in Collect immediately but that's ok. We can make sure that it is put on a roadmap eventually and clearly document what can and can't be queried through XPath. pulldata does meet most users' needs and is simpler to use and that would continue to exist for now.

I think it would be terrific to move to a slightly more consistent process for approving these kinds of changes as we started discussing here. But this has been ongoing for a long time so I propose that we keep this conversation here and ask for a final sanity check from @clint-tseng, @dcbriccetti, @yanokwa and @dorey. Unless they see any show-stopping problems or think of someone else who should be involved, I think we can move ahead.

@MartijnR
Copy link
Contributor Author

MartijnR commented Aug 25, 2017

Thanks for the feedback. Yes, the file-csv connector was to be consistent with audio, images, video and (generally) remove the type-detection burden on the client (even though it's no problem to do so).

@breakbusyloop
Copy link

Is this issue still unresolved? I'm exploring the Collect side of this feature, whose behavior seems to indicate that the corresponding XForms functionality is complete.

@lognaturel
Copy link
Member

Thank you, @OpenDataNerd. I had asked for some final feedback but that was years ago and client implementations have moved forward. At this point it should just be written up for the spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants