Add OpenXML nodes to import/export data #11599

mmisol · 2021-04-08T17:24:19Z

Purpose

Two new nodes are added that allow to import and export data from an
Excel spreadsheet. These nodes use the OpenXML SDK, so they do not
make use of the Excel application, thus can be used without Office
installed locally.

These nodes should behave pretty similarly to the existing Excel nodes.
In order to prove that, the tests for the existing nodes have been
copied and adapted for the new nodes.

At the function signature level, there are the following differences:

OpenXMLImportExcel

Provides inputs for 'startRow' and 'startColumn' (optional)

OpenXMLExportExcel

Inputs for 'startRow' and 'startColumn' are optional

Screenshots

Library search

Nodes on canvas

Declarations

Check these if you believe they are true

The codebase is in a better state after this PR
Is documented according to the standards
The level of testing this PR includes is appropriate
User facing strings, if any, are extracted into *.resx files
All tests pass using the self-service CI.
Snapshot of UI changes, if any.
Changes to the API follow Semantic Versioning and are documented in the API Changes document.
This PR modifies some build requirements and the readme is updated

FYIs

@Amoursol

Two new nodes are added that allow to import and export data from an Excel spreadsheet. These nodes uses the OpenXML SDK, so they do not make use of the Excel application, thus can be used without Office installed locally. These nodes should behave pretty similarly to the existing Excel nodes. In order to prove that, the tests for the existing nodes have been copied and adapted for the new nodes. At the function signature level, there are the following differences: ImportOpenXML - Provides inputs for 'startRow' and 'startColumn' (optional) ExportOpenXML - Inputs for 'startRow' and 'startColumn' are optional

src/Libraries/DSOffice/DSOffice.csproj

mjkkirschner · 2021-04-08T19:27:41Z

@mmisol I have not reviewed in depth yet, but I have some high level questions about approach that I think may help me review better.

Are the files produced by these nodes openable by excel without user intervention or training?
If 1 is true - why not replace the internal implementation of the existing nodes with open.xml functionality and remove the dependency on excel (of course we need to keep the unused binaries around for a while).
It would also mean the tests would not need to be duplicated, just kept as is I believe.
What about obsoleting the excel nodes and adding migrations to the new ones?
why the need for the windows specific binaries?
Have you reviewed adsk products for this new dependency, is it shipped in revit, civil etc?
updating our license files, about box etc. What is the license terms of this dep?

IMO a big benefit of this work, in addition to performance gains will be the removal of a source of conflicts and non working excel interop for users. I think, given the familiarity with execl, if those nodes are still present, most users will choose them.

Are there sufficient differences in behavior that you thought adding the new nodes was the best option? What are those differences?

mmisol · 2021-04-08T19:41:29Z

Hi @mjkkirschner

Regarding compatibility, I don't think we can make the guarantee these nodes work exactly like the existing ones. True, I was able to make the tests pass, but given the differences between the two approaches I wouldn't be surprised that something we have overlooked works differently. That's also why I don't think we should be so quick to obsolete the existing nodes. Also, there is plenty of functionality the obsolete Excel nodes cover that I cannot, like opening the Excel application or dealing with workbooks, worksheets, etc. as objects.

The windows-specific binaries seem to appear when targeting .NET only. As I read, if we targeted .NET standard another dependency would be used instead.

I have not reviewed if the dependency on DocumentFormat.OpenXML is safe to be made private. To be honest I don't know the exact list of products/plugins we should check or how to check them.

License and about box, that is something I can do.

QilongTang · 2021-04-15T08:54:54Z

src/Libraries/DSOffice/OpenXmlHelper.cs

+            // While SpreadsheetDocument.Open handles this, the error message it throws is not localized.
+            if (!File.Exists(filePath))
+            {
+                throw new ArgumentException(string.Format(Properties.Resources.WorkbookNotFound, filePath));


QilongTang · 2021-04-15T12:29:53Z

src/Libraries/DSOffice/OpenXmlHelper.cs

+                    {
+                        rowData.Add(null);
+                        currentColumnIndex++;
+                    }


Curious what would happen if we do not do this step?

There are some tests that exercise this. If you have a sparse spreadsheet with lots of empty cells Dynamo is expected to still return empty values for those cells. The following would be an example, where null is the value to return but the cell is actually empty:

1 null null null 2 null null null 3

Notice that row.Elements<Cell> would not return empty cells, so in order to return 3 elements for the first two rows, that final loop adds the remaining values as null.

Thanks @mmisol for the detailed explanation!

QilongTang · 2021-04-15T12:55:35Z

src/Libraries/DSOffice/OpenXmlHelper.cs

+            var letters = new Stack<char>();
+            do
+            {
+                letters.Push((char)('A' + ((columnIndex - 1) % 26)));


QilongTang · 2021-04-15T14:04:38Z

src/Libraries/DSOffice/OpenXmlHelper.cs

+            }
+            else
+            {
+                var document = SpreadsheetDocument.Create(filePath, SpreadsheetDocumentType.Workbook);


Curious if this function would be interrupted with exception that user only have read but not write access to the target path

I think either of the code paths of this function should fail in that case, because the above path opens the file to write it.

@mmisol OK. I am not sure if you have a local env to test this. I can make a note to QA this case if you want?

I will give this a try but yes, it would be a good thing to try by part of the QA team.

The message I get is Access to the path '<PATH>' is denied. which must come from Windows.

@mmisol Thanks, as long as Dynamo does not crash and we display this as a warning or so. Appreciate the checking!

QilongTang · 2021-04-15T15:16:29Z

src/Libraries/DSOffice/OpenXmlHelper.cs

+        }
+
+        /// <summary>
+        /// Adds default styles to the stylesheet so that Office does not identify the spreadsheet as corrupt.


Is there any reference to the changes in this function? Do you foresee MS change how they identify spreadsheet as corrupt?

That was something I noticed in practice when the styles were absent, even if they were not needed. This code is the result of reviewing the styles of a newly created spreadsheet in Excel and adding some of those elements until Excel would accept the file created by Dynamo. Unfortunately, there is no documentation for this AFAIK, so the process was pretty hard.

@mmisol I can only imagine... Well, that is a rough start and I hope MS dont change these criteria often, otherwise we are in big trouble..

QilongTang · 2021-04-15T17:52:31Z

test/Libraries/DynamoMSOfficeTests/OpenXmlTests.cs

+        }
+
+        [Test]
+        public void CanWriteNullValuesToExcel1()


Can you comment on this test the diff with the test above or use longer name to make it self explain? From reading the code, we are making sure not single value but null in the middle of array can be written?

I think I can, let me take a look at it to make sure. The "undocumented" aspect is inherited from the existing tests which I copied, but let me see if I can improve that here.

I renamed the tests a little bit to remark the difference. The previous test writes null as data, while this one includes null as a value in a list.

@mmisol Great! I will leave some review period today for team and come back tomorrow to check more comments, if not expect our team to merge and bang on these this sprint!

QilongTang

After some reading of this PR, @mmisol really raised my confidence level and a wonderful piece of work! I cant wait to play with these two nodes. @reddyashish @zeusongit Feel free to take a look as well.

QilongTang · 2021-04-15T18:01:12Z

@mmisol @mjkkirschner Thank you guys for the about box discussion, I will make a note to update internal dependency tracking docs.

QilongTang · 2021-04-15T18:15:05Z

Also @mmisol if I remember correctly, OpenXML can handle other types of docs other than excel, like word and ppt as well. If that is the case, I believe node names Data.ImportOpenXml and Data.ExportOpenXml might be still ambiguous? How about Data.ImportExcelInOpenXml and Data.ExportToExcelInOpenXml or something else makes more sense? The reason I ask is later this year, our team may introduce nodes for other file types.

mmisol · 2021-04-15T18:25:42Z

Thanks for the kind words @QilongTang ! I must admit it was more work than I expected initially but hopefully users will appreciate this. It should also ease processing Excel files on the server which can enable some new type of applications.

Regarding the node names, it's true that there may be some ambiguity, but since the function names are under Data. I think that should point that the underlying file is a spreadsheet. Anyway, if I had to choose a longer name I would maybe go for ImportOpenXmlSpreadsheet. I can change it but may be something to discuss with @Amoursol .

Also, I copied the icons, but it's maybe a good idea to use slightly different ones. What do you think?

QilongTang · 2021-04-16T06:29:03Z

@mmisol Yes, I have discussed the icon thing with @Amoursol and he prefer to use a different color than Green to keep same visual but indicate slight difference. We will need help from UX on touching color, I will raise that at standup.

Amoursol · 2021-04-16T14:55:03Z

Hello @mmisol - Jingyi, Elizabeth and I have discussed. Could you please name the nodes the following?

Data.OpenXMLImportExcel
Data.OpenXMLExportExcel

This honors the layout of the library with the data structure and is as clear as possible in as short a name as possible.

mmisol · 2021-04-16T19:14:39Z

@Amoursol @QilongTang I renamed the nodes and updated the icons. The PR description was also updated to reflect those changes.

Let me know if this looks good and if that's the case I'll merge after CI has passed.

QilongTang · 2021-04-17T12:29:59Z

@mmisol This PR looks good, the failing test Dynamo.Tests.Configuration.PreferenceSettingsTests.TestSettingsSerialization is due to a different PR.. I am rerunning the self CI

QilongTang · 2021-04-18T07:37:12Z

The latest self CI passed although not updating the status check here seems. This PR looks good to me and we would require some intensive testing this sprint.

mmisol added 2 commits April 8, 2021 12:43

Clean up

2bd13a7

mmisol requested review from mjkkirschner and QilongTang April 8, 2021 17:24

mjkkirschner reviewed Apr 8, 2021

View reviewed changes

src/Libraries/DSOffice/DSOffice.csproj Outdated Show resolved Hide resolved

mjkkirschner reviewed Apr 8, 2021

View reviewed changes

src/Libraries/DSOffice/DSOffice.csproj Show resolved Hide resolved

mmisol added 2 commits April 8, 2021 16:24

Add open xml to license files

ef4e2a6

Remove reference to System.Drawing

25a45b9

QilongTang reviewed Apr 15, 2021

View reviewed changes

QilongTang approved these changes Apr 15, 2021

View reviewed changes

Give more meaningful name to tests

143ffc2

Rename nodes and update icons

955fdfc

mmisol merged commit 44931b2 into DynamoDS:master Apr 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OpenXML nodes to import/export data #11599

Add OpenXML nodes to import/export data #11599

mmisol commented Apr 8, 2021 •

edited

Loading

mjkkirschner commented Apr 8, 2021 •

edited

Loading

mmisol commented Apr 8, 2021

QilongTang Apr 15, 2021

QilongTang Apr 15, 2021

mmisol Apr 15, 2021 •

edited

Loading

QilongTang Apr 15, 2021

QilongTang Apr 15, 2021

QilongTang Apr 15, 2021

mmisol Apr 15, 2021

QilongTang Apr 15, 2021

mmisol Apr 15, 2021

mmisol Apr 15, 2021

QilongTang Apr 15, 2021

QilongTang Apr 15, 2021

mmisol Apr 15, 2021

QilongTang Apr 15, 2021

QilongTang Apr 15, 2021

mmisol Apr 15, 2021

mmisol Apr 15, 2021

QilongTang Apr 15, 2021

QilongTang left a comment

QilongTang commented Apr 15, 2021

QilongTang commented Apr 15, 2021 •

edited

Loading

mmisol commented Apr 15, 2021

QilongTang commented Apr 16, 2021

Amoursol commented Apr 16, 2021

mmisol commented Apr 16, 2021

QilongTang commented Apr 17, 2021

QilongTang commented Apr 18, 2021

Add OpenXML nodes to import/export data #11599

Add OpenXML nodes to import/export data #11599

Conversation

mmisol commented Apr 8, 2021 • edited Loading

Purpose

Screenshots

Declarations

FYIs

mjkkirschner commented Apr 8, 2021 • edited Loading

mmisol commented Apr 8, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mmisol Apr 15, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

QilongTang left a comment

Choose a reason for hiding this comment

QilongTang commented Apr 15, 2021

QilongTang commented Apr 15, 2021 • edited Loading

mmisol commented Apr 15, 2021

QilongTang commented Apr 16, 2021

Amoursol commented Apr 16, 2021

mmisol commented Apr 16, 2021

QilongTang commented Apr 17, 2021

QilongTang commented Apr 18, 2021

mmisol commented Apr 8, 2021 •

edited

Loading

mjkkirschner commented Apr 8, 2021 •

edited

Loading

mmisol Apr 15, 2021 •

edited

Loading

QilongTang commented Apr 15, 2021 •

edited

Loading