Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert strings to UTF-8 before generating YAML #91

Closed

Conversation

kevinushey
Copy link

This fixes a crash on Windows where attempts to convert latin1 strings could cause a crash.

@kevinushey kevinushey changed the title convert strings to UTF-8 before converting convert strings to UTF-8 before generating YAML Sep 18, 2020
@spgarbet
Copy link
Member

This would help with #105 and #90 but needs to cover all entry points. I'm going to work on extending this.

@spgarbet
Copy link
Member

Ideally all entry points would use the same function. The as.utf8 is a good fan-in point. The issue now is making sure all entry points use it. R defaults to "unknown" encoding, if it's specified as "latin1" the conversion is straightforward.

However, what should one assume when the encoding is "unknown"?

I had to add the new util.R file to makefile and this results in the following breakage:

`RUNIT TEST PROTOCOL -- Mon Sep 13 11:52:13 2021


Number of test functions: 167
Number of errors: 1
Number of failures: 5

1 Test Suite :
yaml tests - 167 test functions, 1 error, 5 failures
FAILURE in test_custom_tag_for_data_frame: Error in checkEquals(expected, result) : 1 string mismatch

FAILURE in test_custom_tag_for_named_list: Error in checkEquals(expected, result) : 1 string mismatch

FAILURE in test_custom_tag_for_omap: Error in checkEquals(expected, result) : 1 string mismatch

FAILURE in test_custom_tag_for_unnamed_list: Error in checkEquals(expected, result) : 1 string mismatch

ERROR in test_data_frame_is_converted: Error in y[[5]] : subscript out of bounds
FAILURE in test_data_frame_keys_are_escaped_properly_when_row_major: Error in checkEquals("- 'n': 1\n- 'n': 2\n- 'n': 3\n", result) :
1 string mismatch

Error in eval(ei, envir) : TEST FAILED!
Calls: source -> withVisible -> eval -> eval
Execution halted
make: *** [Makefile:106: test] Error 1`

@spgarbet
Copy link
Member

I think the problem in as.utf8 is it drops attributes of an object which are used in conversion.

@spgarbet
Copy link
Member

I've put everything into a single ticket. There are conflicting needs I'm still trying to figure out the best manner to address.

@spgarbet spgarbet closed this Feb 14, 2022
@spgarbet
Copy link
Member

@/kevinushey I'm trying to fold in your pull request, the problem is there's a lot of possible R types (outside list/string) and the as.yaml doesn't cover them all. I have looked at the alternative to put the conversion in the C code at the emitter stage, but enc2utf8 is actually do_enc2 in the R library and that's marked as internal only as best I can tell.

@kevinushey
Copy link
Author

kevinushey commented Feb 18, 2022

On the C side, you can use Rf_translateCharUTF8() to convert a SEXP into a UTF-8 string. Or, if necessary, you can manually use Rf_eval() and construct the appropriate R call to enc2utf8() from the C side, but that's a bit clunkier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants