Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of non-UTF-8 string length for pango layout #394

Merged
merged 1 commit into from
May 18, 2020

Conversation

tomyun
Copy link
Contributor

@tomyun tomyun commented Apr 18, 2020

Fix for #393 as suggested by @ScottPJones.

Specific test cases for non-UTF-8 string would require import some external deps like LegacyStrings.jl or Strs.jl thus has not been made into this commit. The originally intended fix for UTF-8 unicode strings could be still picked up by a test case implemented in #393.

@codecov-io
Copy link

codecov-io commented Apr 18, 2020

Codecov Report

Merging #394 into master will not change coverage.
The diff coverage is 0.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #394   +/-   ##
=======================================
  Coverage   41.75%   41.75%           
=======================================
  Files          18       18           
  Lines        3365     3365           
=======================================
  Hits         1405     1405           
  Misses       1960     1960           
Impacted Files Coverage Δ
src/pango.jl 33.87% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7a6a0a4...2ea9dc5. Read the comment docs.

@bjarthur
Copy link
Member

just to make sure i understand, we could also just simply input -1 instead of sizeof(...), right? see https://developer.gnome.org/pango/stable/pango-Layout-Objects.html#pango-layout-set-markup

also, are all the other uses of codeunits in Compose.jl okay? i think convert(String,...) might be needed in these cases too to support UTF16/32 IIUC. perhaps that's for another PR.

$ grep -r codeunits src
src/fontfallback.jl:        input = codeunits(textline)
src/pgf_backend.jl:    input = codeunits(escape_tex_chars(text))
src/pango.jl:    textarray = codeunits(text)
src/pango.jl:        input = codeunits(unsafe_string(c_stripped_text[]))

@tomyun
Copy link
Contributor Author

tomyun commented Apr 19, 2020

  1. I'm not sure if Julia string byte array is null-terminated by default.

  2. For other uses of codeunits, we may need more testing. However, note that UTF16/32String from LegacyStrings.jl and UTF16/32Str from Strs.jl don't seem to be fully compatible with string-related functions we use.

julia> s = "aaβαbb"

julia> using LegacyStrings
julia> s1 = utf32(s)

julia> using Strs
julia> s2 = UTF32Str(s)
julia> codeunits(s)
8-element Base.CodeUnits{UInt8,String}:
 0x61
 0x61
 0xce
 0xb2
 0xce
 0xb1
 0x62
 0x62

julia> codeunits(s1)
7-element Base.CodeUnits{UInt32,UTF32String}:
Error showing value of type Base.CodeUnits{UInt32,UTF32String}:
ERROR: MethodError: no method matching codeunit(::UTF32String, ::Int64)

julia> codeunits(s2)
6-element Base.CodeUnits{UInt32,UTF32Str}:
 0x00000061
 0x00000061
 0x000003b2
 0x000003b1
 0x00000062
 0x00000062
julia> match(r"α", s)
RegexMatch("α")

julia> match(r"α", s1)
ERROR: ArgumentError: regex matching is only available for the String type; use String(s) to convert

julia> match(r"α", s2)
ERROR: type Regex has no field extra

@bjarthur
Copy link
Member

ok. i'm fine with this then. julia doesn't support utf16/32 well so no need to go to great effort in Compose.

@ScottPJones does this look right to you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants