-
Notifications
You must be signed in to change notification settings - Fork 635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add OpenXML nodes to import/export data #11599
Conversation
Two new nodes are added that allow to import and export data from an Excel spreadsheet. These nodes uses the OpenXML SDK, so they do not make use of the Excel application, thus can be used without Office installed locally. These nodes should behave pretty similarly to the existing Excel nodes. In order to prove that, the tests for the existing nodes have been copied and adapted for the new nodes. At the function signature level, there are the following differences: ImportOpenXML - Provides inputs for 'startRow' and 'startColumn' (optional) ExportOpenXML - Inputs for 'startRow' and 'startColumn' are optional
@mmisol I have not reviewed in depth yet, but I have some high level questions about approach that I think may help me review better.
IMO a big benefit of this work, in addition to performance gains will be the removal of a source of conflicts and non working excel interop for users. I think, given the familiarity with execl, if those nodes are still present, most users will choose them. Are there sufficient differences in behavior that you thought adding the new nodes was the best option? What are those differences? |
Regarding compatibility, I don't think we can make the guarantee these nodes work exactly like the existing ones. True, I was able to make the tests pass, but given the differences between the two approaches I wouldn't be surprised that something we have overlooked works differently. That's also why I don't think we should be so quick to obsolete the existing nodes. Also, there is plenty of functionality the obsolete Excel nodes cover that I cannot, like opening the Excel application or dealing with workbooks, worksheets, etc. as objects. The windows-specific binaries seem to appear when targeting .NET only. As I read, if we targeted .NET standard another dependency would be used instead. I have not reviewed if the dependency on DocumentFormat.OpenXML is safe to be made private. To be honest I don't know the exact list of products/plugins we should check or how to check them. License and about box, that is something I can do. |
// While SpreadsheetDocument.Open handles this, the error message it throws is not localized. | ||
if (!File.Exists(filePath)) | ||
{ | ||
throw new ArgumentException(string.Format(Properties.Resources.WorkbookNotFound, filePath)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
{ | ||
rowData.Add(null); | ||
currentColumnIndex++; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious what would happen if we do not do this step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some tests that exercise this. If you have a sparse spreadsheet with lots of empty cells Dynamo is expected to still return empty values for those cells. The following would be an example, where null
is the value to return but the cell is actually empty:
1 null null
null 2 null
null null 3
Notice that row.Elements<Cell>
would not return empty cells, so in order to return 3 elements for the first two rows, that final loop adds the remaining values as null
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mmisol for the detailed explanation!
var letters = new Stack<char>(); | ||
do | ||
{ | ||
letters.Push((char)('A' + ((columnIndex - 1) % 26))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
} | ||
else | ||
{ | ||
var document = SpreadsheetDocument.Create(filePath, SpreadsheetDocumentType.Workbook); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious if this function would be interrupted with exception that user only have read but not write access to the target path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think either of the code paths of this function should fail in that case, because the above path opens the file to write it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mmisol OK. I am not sure if you have a local env to test this. I can make a note to QA this case if you want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will give this a try but yes, it would be a good thing to try by part of the QA team.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message I get is Access to the path '<PATH>' is denied.
which must come from Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mmisol Thanks, as long as Dynamo does not crash and we display this as a warning or so. Appreciate the checking!
} | ||
|
||
/// <summary> | ||
/// Adds default styles to the stylesheet so that Office does not identify the spreadsheet as corrupt. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reference to the changes in this function? Do you foresee MS change how they identify spreadsheet as corrupt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was something I noticed in practice when the styles were absent, even if they were not needed. This code is the result of reviewing the styles of a newly created spreadsheet in Excel and adding some of those elements until Excel would accept the file created by Dynamo. Unfortunately, there is no documentation for this AFAIK, so the process was pretty hard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mmisol I can only imagine... Well, that is a rough start and I hope MS dont change these criteria often, otherwise we are in big trouble..
} | ||
|
||
[Test] | ||
public void CanWriteNullValuesToExcel1() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you comment on this test the diff with the test above or use longer name to make it self explain? From reading the code, we are making sure not single value but null
in the middle of array can be written?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I can, let me take a look at it to make sure. The "undocumented" aspect is inherited from the existing tests which I copied, but let me see if I can improve that here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed the tests a little bit to remark the difference. The previous test writes null
as data, while this one includes null
as a value in a list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mmisol Great! I will leave some review period today for team and come back tomorrow to check more comments, if not expect our team to merge and bang on these this sprint!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some reading of this PR, @mmisol really raised my confidence level and a wonderful piece of work! I cant wait to play with these two nodes. @reddyashish @zeusongit Feel free to take a look as well.
@mmisol @mjkkirschner Thank you guys for the about box discussion, I will make a note to update internal dependency tracking docs. |
Also @mmisol if I remember correctly, OpenXML can handle other types of docs other than excel, like word and ppt as well. If that is the case, I believe node names |
Thanks for the kind words @QilongTang ! I must admit it was more work than I expected initially but hopefully users will appreciate this. It should also ease processing Excel files on the server which can enable some new type of applications. Regarding the node names, it's true that there may be some ambiguity, but since the function names are under Also, I copied the icons, but it's maybe a good idea to use slightly different ones. What do you think? |
Hello @mmisol - Jingyi, Elizabeth and I have discussed. Could you please name the nodes the following?
This honors the layout of the library with the data structure and is as clear as possible in as short a name as possible. |
@Amoursol @QilongTang I renamed the nodes and updated the icons. The PR description was also updated to reflect those changes. Let me know if this looks good and if that's the case I'll merge after CI has passed. |
@mmisol This PR looks good, the failing test |
The latest self CI passed although not updating the status check here seems. This PR looks good to me and we would require some intensive testing this sprint. |
Purpose
Two new nodes are added that allow to import and export data from an
Excel spreadsheet. These nodes use the OpenXML SDK, so they do not
make use of the Excel application, thus can be used without Office
installed locally.
These nodes should behave pretty similarly to the existing Excel nodes.
In order to prove that, the tests for the existing nodes have been
copied and adapted for the new nodes.
At the function signature level, there are the following differences:
OpenXMLImportExcel
OpenXMLExportExcel
Screenshots
Library search
Nodes on canvas
Declarations
Check these if you believe they are true
*.resx
filesFYIs
@Amoursol