Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in strsplit(sample_name_long, split = ";") : non-character argument #104

Closed
dylanbeaudette opened this issue Feb 14, 2024 · 15 comments · Fixed by #105
Closed

Error in strsplit(sample_name_long, split = ";") : non-character argument #104

dylanbeaudette opened this issue Feb 14, 2024 · 15 comments · Fixed by #105

Comments

@dylanbeaudette
Copy link

Hi,

I'm having a strange issue with reading OPUS files and requesting only the data. This used to work, about a year ago:

read_opus('e:/mir/MIR-training/HI-test/spectra/103283XS01.0', data_only = TRUE)

The error message looks like this:

Error in strsplit(sample_name_long, split = ";") : non-character argument
In addition: Warning messages:
1: In strptime(time_hour, format = "%Y/%m/%d %H:%M:%S", tz = etc_tz) :
  unknown timezone 'Etc/NA'
2: In as.POSIXct.POSIXlt(strptime(time_hour, format = "%Y/%m/%d %H:%M:%S",  :
  unknown timezone 'Etc/NA'
3: In as.POSIXlt.POSIXct(x) : unknown timezone 'Etc/NA'

Reading the file works as expected with data_only = FALSE.

@dylanbeaudette
Copy link
Author

I've tried again with an example OPUS file, same error.

> f <- opus_file()
> read_opus(f, data_only = TRUE)
Error in strsplit(sample_name_long, split = ";") : non-character argument
In addition: Warning messages:
1: In strptime(time_hour, format = "%Y/%m/%d %H:%M:%S", tz = etc_tz) :
  unknown timezone 'Etc/NA'
2: In as.POSIXct.POSIXlt(strptime(time_hour, format = "%Y/%m/%d %H:%M:%S",  :
  unknown timezone 'Etc/NA'
3: In as.POSIXlt.POSIXct(x) : unknown timezone 'Etc/NA'

@esteveze
Copy link

I have the same issue:
data_list <- read_opus_single (dsn = "C:/Users/Documents/Data/POM characterization/FTIR/Data/DByDate/15_2_2024/L12_Y24.0")
Warning messages:
1: In strptime(time_hour, format = "%Y/%m/%d %H:%M:%S", tz = etc_tz) :
unknown timezone 'Etc/GMT+1 '
2: In as.POSIXct.POSIXlt(strptime(time_hour, format = "%Y/%m/%d %H:%M:%S", :
unknown timezone 'Etc/GMT+1 '
3: In as.POSIXlt.POSIXct(x) : unknown timezone 'Etc/GMT+1 '

@ltalluto
Copy link

For the warning in strptime, we tracked the problem down to unexpected whitespace in the timezone; in get_meta_timestamp our timezone is coming out "(GMT+1)\t" with a spare tab character. We solved it for the moment by updating a regular expression to strip whitespace from the end of the timezone (line 29 in extract_metadata.R):

tz <- gsub(pattern = "\\(|\\)|\\s+$", "", x = time_hour_tz[3L])

I'm reluctant to post this in a PR because I'm unsure where this tab character is coming from, and whether this is better handled somewhere in the parsing functions.

@philipp-baumann
Copy link
Member

Hi,

I'm having a strange issue with reading OPUS files and requesting only the data. This used to work, about a year ago:

read_opus('e:/mir/MIR-training/HI-test/spectra/103283XS01.0', data_only = TRUE)

The error message looks like this:

Error in strsplit(sample_name_long, split = ";") : non-character argument
In addition: Warning messages:
1: In strptime(time_hour, format = "%Y/%m/%d %H:%M:%S", tz = etc_tz) :
  unknown timezone 'Etc/NA'
2: In as.POSIXct.POSIXlt(strptime(time_hour, format = "%Y/%m/%d %H:%M:%S",  :
  unknown timezone 'Etc/NA'
3: In as.POSIXlt.POSIXct(x) : unknown timezone 'Etc/NA'

Reading the file works as expected with data_only = FALSE.

Hi Dylan, thanks for bringing this issue up! I'll checkout a solution that fixes the locale setups we know, possibly this week.

@philipp-baumann
Copy link
Member

For the warning in strptime, we tracked the problem down to unexpected whitespace in the timezone; in get_meta_timestamp our timezone is coming out "(GMT+1)\t" with a spare tab character. We solved it for the moment by updating a regular expression to strip whitespace from the end of the timezone (line 29 in extract_metadata.R):

tz <- gsub(pattern = "\\(|\\)|\\s+$", "", x = time_hour_tz[3L])

I'm reluctant to post this in a PR because I'm unsure where this tab character is coming from, and whether this is better handled somewhere in the parsing functions.

Thanks for a first ad-hoc solution, i'll investigate a bit further and see whether we can use a fix that generalizes.

@ltalluto
Copy link

Great, we can provide a sample file if needed.

@philipp-baumann
Copy link
Member

philipp-baumann commented Feb 28, 2024

Great, we can provide a sample file if needed.

please, very happy if you do so. baumann dash philipp at protonmail dot com or here. would as well be great if we can add this file to the test suite to check for that datetime behavior (let's see how this is linked to OPUS/software/language/timezone settings).

@dylanbeaudette
Copy link
Author

Thanks everyone. I'll also send an example file!

@philipp-baumann
Copy link
Member

philipp-baumann commented Feb 29, 2024

This should be fixed now :-) Would be great if you can briefly check out the PR and test with your example file, too. Also, would be good to have an extra test file. Good to not test CRAN size limit, but it can't hurt to solidify testing with reverse engineering ;-) In that case, we would also provide institution and device details in the package.

More infos what was done in PR.

@mtalluto regex for parsing for extra tab/space was useful, and it provides stable to get rid of the warning.

Overall, we were missing to parse "sample" and "history" blocks when data_only = TRUE. Therefore, sample name and timezone infos from the OPUS files weren't included. It is good to do so I think.

@philipp-baumann
Copy link
Member

cc @ThomasKnecht , would be great if you can give a short review of the code (if time) :-)

@philipp-baumann
Copy link
Member

Thank you all for catching this bug 💯

@philipp-baumann
Copy link
Member

@mtalluto regarding the fix of the warning; the correct time info we could only find in the history. It is a messy file (Windows origin plus proprietary Bruker), so that's the best we can do to parse the text lines.

@ltalluto
Copy link

Tested from my end, our file now loads with no warnings. I sent you the file via email, feel free to include with the tests.

@dylanbeaudette
Copy link
Author

Thanks!

@philipp-baumann
Copy link
Member

Thanks!

welcome, thanks also for the file. And if you have any suggestions or comments in the future, happy to discuss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants