Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved CSV Import/Export #63

Closed
Phillipus opened this issue Mar 31, 2014 · 32 comments
Closed

Improved CSV Import/Export #63

Phillipus opened this issue Mar 31, 2014 · 32 comments

Comments

@Phillipus
Copy link
Member

See https://groups.google.com/forum/#!topic/archi-users/Dujr_gUhFY8

This means that we need 3 files per export/import:

  • elements with following attributes
    • Element ID
    • Name
    • Type (Actor, Application Component...)
      *Documentation (newline have to be managed in a way or another)
  • relations with following attributes
    • Relation ID
    • Name
    • Type (assignment, used by...)
    • Source element ID
    • Target element ID
    • Documentation (newline have to be managed in a way or another)
  • properties with following attributes
    • Element or relation ID
    • Property Name
    • Property Value

Remarks:

  • We should follow some standard like this one
  • Files should include headers
  • IDs should be kept in model file so we have to define some basic rules to follow (e.g. accepts letters and numbers only without spaces). It should be easy to create them in tools like spreadsheets using formula (e.g. append Type to Name and remove spaces)
  • Fields have to be delimited with double-quote and potential double-quotes in field content have to replaced by a pair of double-quotes. This should be enough to protect newlines in Description fields, but I suppose we should include and option to remove or convert them so tools not accepting newlines even if double-quoted qhould work.
  • There's a known issue with some softwares which always remove leading zeros. An option should allow to use known the workaround.
  • The model itself should be seen as an element whose Type is ArchimateModel, so that we can export/import its description and properties.
  • Even if this makes it no more real CSV, being able to define another field separator should be great (e.g. in French localized Excel, CSV uses semicolon)

Export should either create 3 files in a folder or a zip containing them. Import should either ask explicitly for each file or take a folder and assume filenames to be (eg.) elements.csv, relations.csv, properties.csv

Of course, my assumption is that we export/import only model, not views... So this leads me to the following requirement: we should have to option to import on an already existing model so that elements that exist are updated and new ones added. Deletion could also be manage (but enable only through an option): element which exist in model but not in import file are deleted.

(Jean-Baptiste Sarrodie)

@Phillipus
Copy link
Member Author

I've started work on this. I have the Export to CSV working.

Most users will be happy with the "," comma delimiter. I can also add the ";" semicolon. I would prefer to limit the delimiters (!) to just these two, or is there another one? Any suggestions for a GUI? I guess It will have to be an extended dialog/wizard like the Export Image one.

@jbsarrodie
Copy link
Member

I think comma and semicolon are enough and represent more than 90% of usage.

For the GUI I have to think a little bit about it but we'll need a wizard.

@Phillipus
Copy link
Member Author

I have a system for file selection.

  1. User chooses a folder
  2. Three files are created - elements-ab45ed7b.csv, relations-ab45ed7b.csv and properties-ab45ed7b.csv

Where "ab45ed7b" is the id of the model. This allows the user to export more than one model to csv in the same folder. It also links the three files.

@jbsarrodie
Copy link
Member

Not sure having id as suffix is a user-friendly thing.

Maybe an option (turned off by default) to add a suffix, alongside with a text field (then if checked but empty, use model ID)...

[ ] Add suffix      [________________________________]

Or just the text field, and then if empty don't put suffix at all (get rid of -)

@Phillipus
Copy link
Member Author

I think some kind of suffix is a good idea. Otherwise the user will have to keep selecting a different folder for each model export. It could be the case that the user wishes to export a few models, and would like to keep them in the same folder. I know in my tests I was manually adding a suffix to the three files in order to group them together. Another option is a time-stamp suffix.

@Phillipus
Copy link
Member Author

Doing this in two stages, first is Export. Let's get this agreed before moving on to Import.

Created (temporary) branch for Export to CSV and tests:

https://github.com/Phillipus/archi/tree/csv-export

@jbsarrodie
Copy link
Member

I've just tested and it works like a charm. I only have 2 remarks:

  • I wonder if prefix would be better than suffix. This would ease file grouping if multiple models are exported on the same folder (sort order based on filename would group all 3 files per model).
  • Regarding delimiter, after some reflection, I think that we should either stick to comma only or allow user to enter delimiter in an input text field (limited to 1 character, different from ", defaults to comma of course).

Anyway, really good job: I was able to import file in LibreOffice without issues (even with descriptions containing newlines).

@Phillipus
Copy link
Member Author

  • Changed to use a prefix instead of suffix.
  • Added option for tab delimiter. comma, semicolon and tab should cover all sensible use cases. Any other character would lead to madness.

@Phillipus
Copy link
Member Author

Re: Import CSV. We need to be clear what the format should be. I don't want to end up writing a CSV parser for all known cases under the sun. ;-)

@jbsarrodie
Copy link
Member

Latest export seems perfect.

Regarding CSV import, for me there's no real question: the format should be exactly the same as the one used for export so that exporting and then importing leads to the same model content (without views of course).

@Phillipus
Copy link
Member Author

I mean things like:

  • Only use the comma delimiter
  • All fields are double quoted (")
  • Newlines are allowed

When writing this kind of code, one has to be very defensive - what if this, what if that - and you find that it's never ending.

If you know what you are expecting and reject it if it does not conform to the "rules" it becomes easier.

@jbsarrodie
Copy link
Member

Ok, I understand and agree.

In this case I would suggest to conform to the RFC, so:

  • only comma delimiter are allowed
  • newlines are allowed
  • (if possible) double quotes are optional and thus only needed if the string contains comma or newline

I would also suggest to:

  • add an option to skip first row if it contains headers. I could be possible to find a way to autodetect headers with some heuristics (element/relation type of first row unknown means headers...)
  • do not try to "invert" the Excel workaround. Not needed as Excel will not include cell formula but cell value.

@Phillipus
Copy link
Member Author

"csv-export" branch has been merged into the "develop" branch.

@Phillipus
Copy link
Member Author

Parsing a CSV file on Import is harder than writing an export routine, so it may be worth taking a look at this - http://commons.apache.org/proper/commons-csv/

@Phillipus
Copy link
Member Author

I have implemented Import from CSV files as a new model. See the "csv-import" branch. Select "Import CSV..." from file menu. Select the file that has the name "elements" in it (could be "elements.csv" or "prefix-elements.csv"). If there is a matching *relations.csv and *properties.csv file these will be parsed as well.

we should have to option to import on an already existing model so that elements that exist are updated and new ones added.
Deletion could also be manage (but enable only through an option): element which exist in model but not in import file are deleted.

This is not so easy. Undo/Redo commands will have to be created and managed for every attribute change, insertion and deletion. Then there is checking the integrity of IDs, elements and relations and their matching diagram objects to make sure nothing is orphaned.

@jbsarrodie
Copy link
Member

I will try it today.

Regarding import on a already existing model, I know it's not easy and this raise several questions. A usual this was more a mid/long term target.

In fact, this leads me to potential generic comparison/merge feature that could be implemented with EMF Diff/Merge (when I will master it, which is not for today).

@jbsarrodie
Copy link
Member

I've just tested and here are some remarks:

  • I works great !
  • created model is not marked unsaved so if I close it I don't get warned.

@Phillipus
Copy link
Member Author

created model is not marked unsaved so if I close it I don't get warned.

No, because it's like when you create a "new" model. There are no undoable actions yet to mark it as "dirty". Dirty is calculated from at least one command on the command stack. In order to mark it as dirty then all the elements would have to be added as one Command and executed. It's really like doing a "new model" or like "open model". But that's a problem if you just want to save it...hmm...it gets ever more complicated. :-(

@jbsarrodie
Copy link
Member

That's not a big deal, as if I close it just after import, then I can easily re-import it again.

@Phillipus
Copy link
Member Author

New commit in "csv-import" branch.

I've taken a new approach to this because I wanted to solve the following issues:

  • import csv data into an existing model
  • created new model is not marked unsaved so if I close it I don't get warned

So...

  • The "Import -> CSV Data Into Model" menu item is now only enabled when a model has the focus (tree or diagram). Opening the CSV file(s) will add the data (elements, relations, properties) to the model. "Undo" will undo this action.
  • If you want to import the CSV data into a new model, you first create a new (blank) model and then import. "Undo" will undo this action.
  • Importing the same CSV files more than once will not work as the importer looks for duplicate identifiers.
  • If the importer finds an existing element or relation with the same ID as one in CSV file it will not create a new element.

This approach makes more sense for an "Import" action. The previous way of doing it was more like an "Open As.." action.

@jbsarrodie
Copy link
Member

Hi,

I did some tests and...

  • it works great when launched from an empty model
  • it works great when launched from a non empty model and when the csv files don't contain already existing elements
  • it fails when launched from a non empty model and when the csv files contain at least on already existing element

I checked the code and I think the CSVParseException should be caught in several methods into CSVImporter so that an already existing element doesn't stop the import (maybe a choice through an option: "strict/relaxed" ?).

Here the use case could be:

  • Goal: update some elements or properties outside Archi
  • Steps:
    1. Open an already existing model
    2. Export it to CSV files
    3. Edit some files with a spreadsheet to add elements, properties...
    4. Import the CSV into the model and choose to ignore warnings about already existing elements/properties
    5. New elements/properties are added to the model.

Obviously this raise some questions:

  • as is this is not a real update, but an "add if it doesn't exist". But the real goal would be to also update name/description accordingly
  • elements and relations can easily be updated as they have an id, but this is not so easy for properties because they don't and there could be multiple properties with same key on an element. So this would require an option to decide whether to add or update properties (update first property matching key).

Remark: when this will work, we'll then have a solution to sync to/from a CMDB (or any other referential able to export/import CSV)...

@Phillipus
Copy link
Member Author

I checked the code and I think the CSVParseException should be caught in several methods into CSVImporter so that an already existing element doesn't stop the import (maybe a choice through an option: "strict/relaxed" ?).

I wanted the user to know about this in case it was a mistake on their part, and not silently fail without them knowing.

There's a danger of overloading the functionality on this. We don't want to turn it into a poor man's batch editor. I would like it to be seen as a way of getting content into a model rather than in-out editing and merging...

@Phillipus
Copy link
Member Author

My goals:

  • Avoid too many dialog options that complicate the issue
  • Let the user know about duplicates and malformed CSV so they can fix the issue, or are aware that they might be importing the same file twice
  • Not to let this functionality become something more than what it is. If there's a need for batch update (like the Find/Replace feature) let's look at solving that in a better way

You are right with the problems of updating Properties.

@jbsarrodie
Copy link
Member

I understand and share your goals, but I still think that offering a "relaxed" option (so update/insert instead of insert only) would allow great use cases.

For example, I have a big model which contains almost all my servers / application (and business processes in the near futur). I frequently have to check it against my CMDB and this is very difficult. There are also some basic information that I don't want to key in manually (#CPU/RAM, #users...).

With a "relaxed" mode, it would be easier for me to update properties. I agree this can be seen as a poors man batch editor, but the goal is still to insert information (properties in my case), not to do a Find/Replace.

From a UI perspective, I agree we should avoid too many dialog options. Would it be possible instead to have two menu entries:

  • Import > CSV Data Into Model (insert only)"
  • Import > CSV Data Into Model (insert/update)"

Of course, there's maybe another, better, way of offering this feature to end-user...

@Phillipus
Copy link
Member Author

If I had to choose, I would go for an import dialog box. Two menu items? JB, I thought you had good taste! ;-)

It is possible to do a default update for an element/relation without needing an option:

  1. If element by id does not exist in model create it. Properties are created too.
  2. If element by id does exist, do not create a new version but compare name and documentation. Update if different (EObjectFeatureCommand does the comparison for old/new values for us).
  3. Not sure how to handle the comparison of Properties because, as you say, it's possible to have more than one key.

If update is supported, now I know what you will ask next....ability to read in just the properties files or just the relations file without the elements file.

@jbsarrodie
Copy link
Member

"Two menu items? JB, I thought you had good taste! ;-)"

Yes I have, I opened a good Bordeaux yesterday ;-)

"Not sure how to handle the comparison of Properties because, as you say, it's possible to have more than one key."

Obvioulsy the only way is to create a new property only if it does not exist, and update the first match in all other cases. That's the best we can do and is more aligned with what a user expect from an update feature. With a good documentation that would be perfect. You can reuse the method created when working on metadata (don't remember the name).

"If update is supported, now I know what you will ask next....ability to read in just the properties files or just the relations file without the elements file."

No, I'm still able to create some dummy files with just headers in them ;-)

@Phillipus
Copy link
Member Author

I've created a new "csv-import" branch, tidied and consolidated. It supports updated element/relation names and documentation on import. Property values are updated if the property key exists. If more than one property exists with the same key then only the first one is updated.

@jbsarrodie
Copy link
Member

I did some tests yesterday and faced on big issue: I got an error message about one relation and then import failed.

To reproduce, just open Archisurance, export it and then try to import it in a new, empty, model.

@Phillipus
Copy link
Member Author

Yes, so did I. That's why I removed the illegal relationship from the Archisurance example in 2c71cad

The importer checks for illegal 2.1 relationships and reports them. Otherwise there's a danger that people will create illegal Archimate CSV files.

@jbsarrodie
Copy link
Member

In fact I did it the way around : I first tested with one of my (hand edited) model, so I thought the issue was linked to so manual edits. I then check with Archisurance and though it was fully archimate 2.1 compliant.

I will retry (and play with it) tonight

@jbsarrodie
Copy link
Member

I've just (re)tested: it works great !

@Phillipus
Copy link
Member Author

Clsoing. Archi 3.0. ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants