Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle XML namespaces in worksheets #101

Merged

Conversation

bschmeck
Copy link
Contributor

@bschmeck bschmeck commented Nov 4, 2021

We've run into an issue with parsing an XLSX when the nodes are namespaced (e.g. <x:row>).

This PR addresses that issue by using the local_name method when looking for row, c, v and t nodes. The name method includes the namespace, e.g. x:row, but local_name will strip the namespace prefix, allowing the existing comparison logic to work.

This PR addresses that issue by identifying the namespace prefix (if there is one) while SAX parsing the sheet and looking for nodes whose name includes the prefix.

Additionally, when the shared strings dictionary is built, this PR identifies the namespace prefix (if there is one) and includes the namespace in the CSS query used to parse the dictionary. An alternative approach would be to call remove_namespaces! on the document, but that seems a bit heavy handed.

Instead of using local_name, which throws away the namespace prefix, identify
the configured namespace prefix (if there is one) and use that when looking for
nodes in the SAX parsing loop.
@bschmeck
Copy link
Contributor Author

bschmeck commented Nov 4, 2021

After thinking about it more, I decided that it makes more sense to use the approach taken for the shared strings dictionary when parsing the sheet's rows as well. Using local_name is akin to calling remove_namespaces! which runs the risk of parsing nodes that we shouldn't (nodes named row, c, v or t but in a different namespace).

Making the row parsing logic namespace aware seems like the better solution.

@pythonicrubyist pythonicrubyist merged commit 494ed05 into pythonicrubyist:master Nov 29, 2022
pythonicrubyist added a commit that referenced this pull request Nov 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants