Error in strsplit(sample_name_long, split = ";") : non-character argument #104

dylanbeaudette · 2024-02-14T19:22:52Z

Hi,

I'm having a strange issue with reading OPUS files and requesting only the data. This used to work, about a year ago:

read_opus('e:/mir/MIR-training/HI-test/spectra/103283XS01.0', data_only = TRUE)

The error message looks like this:

Error in strsplit(sample_name_long, split = ";") : non-character argument
In addition: Warning messages:
1: In strptime(time_hour, format = "%Y/%m/%d %H:%M:%S", tz = etc_tz) :
  unknown timezone 'Etc/NA'
2: In as.POSIXct.POSIXlt(strptime(time_hour, format = "%Y/%m/%d %H:%M:%S",  :
  unknown timezone 'Etc/NA'
3: In as.POSIXlt.POSIXct(x) : unknown timezone 'Etc/NA'

Reading the file works as expected with data_only = FALSE.

The text was updated successfully, but these errors were encountered:

dylanbeaudette · 2024-02-14T19:35:49Z

I've tried again with an example OPUS file, same error.

> f <- opus_file()
> read_opus(f, data_only = TRUE)
Error in strsplit(sample_name_long, split = ";") : non-character argument
In addition: Warning messages:
1: In strptime(time_hour, format = "%Y/%m/%d %H:%M:%S", tz = etc_tz) :
  unknown timezone 'Etc/NA'
2: In as.POSIXct.POSIXlt(strptime(time_hour, format = "%Y/%m/%d %H:%M:%S",  :
  unknown timezone 'Etc/NA'
3: In as.POSIXlt.POSIXct(x) : unknown timezone 'Etc/NA'

esteveze · 2024-02-27T11:10:46Z

I have the same issue:
data_list <- read_opus_single (dsn = "C:/Users/Documents/Data/POM characterization/FTIR/Data/DByDate/15_2_2024/L12_Y24.0")
Warning messages:
1: In strptime(time_hour, format = "%Y/%m/%d %H:%M:%S", tz = etc_tz) :
unknown timezone 'Etc/GMT+1 '
2: In as.POSIXct.POSIXlt(strptime(time_hour, format = "%Y/%m/%d %H:%M:%S", :
unknown timezone 'Etc/GMT+1 '
3: In as.POSIXlt.POSIXct(x) : unknown timezone 'Etc/GMT+1 '

ltalluto · 2024-02-27T13:01:40Z

For the warning in strptime, we tracked the problem down to unexpected whitespace in the timezone; in get_meta_timestamp our timezone is coming out "(GMT+1)\t" with a spare tab character. We solved it for the moment by updating a regular expression to strip whitespace from the end of the timezone (line 29 in extract_metadata.R):

tz <- gsub(pattern = "\\(|\\)|\\s+$", "", x = time_hour_tz[3L])

I'm reluctant to post this in a PR because I'm unsure where this tab character is coming from, and whether this is better handled somewhere in the parsing functions.

philipp-baumann · 2024-02-27T22:31:06Z

Hi,

I'm having a strange issue with reading OPUS files and requesting only the data. This used to work, about a year ago:
read_opus('e:/mir/MIR-training/HI-test/spectra/103283XS01.0', data_only = TRUE)
The error message looks like this:
Error in strsplit(sample_name_long, split = ";") : non-character argument
In addition: Warning messages:
1: In strptime(time_hour, format = "%Y/%m/%d %H:%M:%S", tz = etc_tz) :
  unknown timezone 'Etc/NA'
2: In as.POSIXct.POSIXlt(strptime(time_hour, format = "%Y/%m/%d %H:%M:%S",  :
  unknown timezone 'Etc/NA'
3: In as.POSIXlt.POSIXct(x) : unknown timezone 'Etc/NA'
Reading the file works as expected with data_only = FALSE.

Hi Dylan, thanks for bringing this issue up! I'll checkout a solution that fixes the locale setups we know, possibly this week.

philipp-baumann · 2024-02-27T22:37:08Z

For the warning in strptime, we tracked the problem down to unexpected whitespace in the timezone; in get_meta_timestamp our timezone is coming out "(GMT+1)\t" with a spare tab character. We solved it for the moment by updating a regular expression to strip whitespace from the end of the timezone (line 29 in extract_metadata.R):
tz <- gsub(pattern = "\$|\$|\\s+$", "", x = time_hour_tz[3L])
I'm reluctant to post this in a PR because I'm unsure where this tab character is coming from, and whether this is better handled somewhere in the parsing functions.

Thanks for a first ad-hoc solution, i'll investigate a bit further and see whether we can use a fix that generalizes.

ltalluto · 2024-02-28T09:10:24Z

Great, we can provide a sample file if needed.

philipp-baumann · 2024-02-28T16:52:33Z

Great, we can provide a sample file if needed.

please, very happy if you do so. baumann dash philipp at protonmail dot com or here. would as well be great if we can add this file to the test suite to check for that datetime behavior (let's see how this is linked to OPUS/software/language/timezone settings).

dylanbeaudette · 2024-02-28T16:57:50Z

Thanks everyone. I'll also send an example file!

philipp-baumann · 2024-02-29T13:42:14Z

This should be fixed now :-) Would be great if you can briefly check out the PR and test with your example file, too. Also, would be good to have an extra test file. Good to not test CRAN size limit, but it can't hurt to solidify testing with reverse engineering ;-) In that case, we would also provide institution and device details in the package.

More infos what was done in PR.

@mtalluto regex for parsing for extra tab/space was useful, and it provides stable to get rid of the warning.

Overall, we were missing to parse "sample" and "history" blocks when data_only = TRUE. Therefore, sample name and timezone infos from the OPUS files weren't included. It is good to do so I think.

philipp-baumann · 2024-02-29T13:42:53Z

cc @ThomasKnecht , would be great if you can give a short review of the code (if time) :-)

philipp-baumann · 2024-02-29T13:46:22Z

Thank you all for catching this bug 💯

philipp-baumann · 2024-02-29T14:01:09Z

@mtalluto regarding the fix of the warning; the correct time info we could only find in the history. It is a messy file (Windows origin plus proprietary Bruker), so that's the best we can do to parse the text lines.

ltalluto · 2024-02-29T14:51:48Z

Tested from my end, our file now loads with no warnings. I sent you the file via email, feel free to include with the tests.

dylanbeaudette · 2024-03-12T15:47:48Z

Thanks!

philipp-baumann · 2024-03-12T21:12:05Z

Thanks!

welcome, thanks also for the file. And if you have any suggestions or comments in the future, happy to discuss.

philipp-baumann mentioned this issue Feb 29, 2024

fix metadata parsing read_opus(dsn, data_only = TRUE) #105

Merged

philipp-baumann closed this as completed in #105 Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in strsplit(sample_name_long, split = ";") : non-character argument #104

Error in strsplit(sample_name_long, split = ";") : non-character argument #104

dylanbeaudette commented Feb 14, 2024

dylanbeaudette commented Feb 14, 2024

esteveze commented Feb 27, 2024

ltalluto commented Feb 27, 2024

philipp-baumann commented Feb 27, 2024

philipp-baumann commented Feb 27, 2024

ltalluto commented Feb 28, 2024

philipp-baumann commented Feb 28, 2024 •

edited

Loading

dylanbeaudette commented Feb 28, 2024

philipp-baumann commented Feb 29, 2024 •

edited

Loading

philipp-baumann commented Feb 29, 2024

philipp-baumann commented Feb 29, 2024

philipp-baumann commented Feb 29, 2024

ltalluto commented Feb 29, 2024

dylanbeaudette commented Mar 12, 2024

philipp-baumann commented Mar 12, 2024

Error in strsplit(sample_name_long, split = ";") : non-character argument #104

Error in strsplit(sample_name_long, split = ";") : non-character argument #104

Comments

dylanbeaudette commented Feb 14, 2024

dylanbeaudette commented Feb 14, 2024

esteveze commented Feb 27, 2024

ltalluto commented Feb 27, 2024

philipp-baumann commented Feb 27, 2024

philipp-baumann commented Feb 27, 2024

ltalluto commented Feb 28, 2024

philipp-baumann commented Feb 28, 2024 • edited Loading

dylanbeaudette commented Feb 28, 2024

philipp-baumann commented Feb 29, 2024 • edited Loading

philipp-baumann commented Feb 29, 2024

philipp-baumann commented Feb 29, 2024

philipp-baumann commented Feb 29, 2024

ltalluto commented Feb 29, 2024

dylanbeaudette commented Mar 12, 2024

philipp-baumann commented Mar 12, 2024

philipp-baumann commented Feb 28, 2024 •

edited

Loading

philipp-baumann commented Feb 29, 2024 •

edited

Loading