Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

saveWorkbook produces invalid XML when manipulating libreoffice based XLSX files #403

Closed
retostauffer opened this issue Feb 11, 2023 · 2 comments
Labels

Comments

@retostauffer
Copy link
Contributor

The issue:

I am using loadWorkbook(), writeData() and saveWorkbook() to
modify an existing XLSX file, originally created with libreoffice on linux.

The first time (manipulating the XLSX file once with openxlsx) works well, the
second time the XLSX gets corrupted. openxlsx::read.xlsx is still able to read
the data, however, libreoffice only shows empty sheets, and readxl::read_excel
refused to read the data (Error: unexpected end of data).

I may have found the problem (unclosed XML tag) as shown in the last part of this issue. Note that this error does not occur when I start with an XLSX file created (saved) by/via MS Excel.

System information:

  • Tested with 4.2.2 on Windows and Ubuntu linux
  • openxlsx version `4.2.5.2' (current CRAN release)
  • openxlsx version 4.2.5.9000 (main branch of this repos)

Minimal:

I have put together a small minimal which uses a file test_libreoffice.xlsx,
a simple XLSX file with two sheets. This XLSX file has been created using
LibreOffice 7.3.7.2 30(Build:2) on an Ubuntu Ubuntu 22.04.1 LTS.

The R script below does the following:

  1. Tests if `"test_libreoffice.xlsx" is readable
  2. Manipulates "test_libreoffice.xlsx", stores modified version as "test1.xlsx"
  3. Tests readability of "test1.xlsx"
  4. Manipulates "test1.xlsx", stores modified version as "test2.xlsx"
  5. Tests readability of "test2.xlsx"

Testing readability (test_read()) tries to read the first sheet with both,
openxlsx and readxl (testing). On both, my linux system as well as on
my Windows 10 it fails in (5) where readxl::read_excel() is no longer to
read the data. At this point, libreoffice will also only show empty sheets (no data).

library("readxl")
library("openxlsx")

# Reading sheet 1 with both openxlsx as well as readxl
test_read <- function(file) {
    cat("Testing readability of file \"", file, "\"\n")
    print(read.xlsx(file, sheet = 1))
    print(dim(read_excel(file, sheet = 1)))
}

# Small function which overwrites the content of the first sheet
modify_file <- function(src, dst) {
    # New (static) data set
    x  <- data.frame(a = 1:5, b = LETTERS[5:1], z = 5:1 + 100)
    # Loading, manipulating, and saving workbook
    wb <- loadWorkbook(src)
    writeData(wb, x, sheet = 1, startRow = 2, startCol = 1, colNames = FALSE)
    check <- saveWorkbook(wb, file = dst, overwrite = TRUE, returnValue = TRUE)
    stopifnot("problems saving workbook" = check)
}

# (1) Testing if original file is readable
test_read("test_libreoffice.xlsx")

# (2) Modify the file a first time, store it as test1.xlsx
modify_file("test_libreoffice.xlsx", "test1.xlsx")
# (3) Testing if still readable
test_read("test1.xlsx")

# (4) Modify the file a second time, store it as test2.xlsx
modify_file("test1.xlsx", "test2.xlsx")
# (5) Testing if still readable
test_read("test2.xlsx") # <- here is where the error occurs

Test files:

I am attaching test1.xlsx and test2.xlsx as created by the minimal above on my machine.

Cause of the problem:

I have been looking into the XLSX files to se where the two files "test1.xlsx" (which
works as expected) and "test2.xlsx" (which fails) differ and found a problem in the
XML definition of xl/worksheets/sheet1.xml (as well as xl/worksheets/sheet2.xml);
a <sheetPr> tag that is not properly closed. Namely in this part:

# XML contains
<sheetPr><sheetPr filterMode="false"><pageSetUpPr fitToPage="false"/></sheetPr>
# I assume it must be this (works)                                             ! ------ !
<sheetPr><sheetPr filterMode="false"><pageSetUpPr fitToPage="false"/></sheetPr></sheetPr>

I've manually corrected this XML (in "test2.xlsx") which seem to resolve the problem.
I can't attach the XML files here but they are located in xl/worksheets/ in both the file "test1.xlsx" and "test2.xlsx" (simply unzip).

I don't think this issue is related to #133 (#74, #81), hope opening this one is OK.

Copy link

This issue is stale because it has been open 365 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale label Feb 17, 2024
Copy link

github-actions bot commented Mar 2, 2024

This issue was closed because it has been stalled for 7 days with no activity.

@github-actions github-actions bot closed this as completed Mar 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant