Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #46 (make sure symbol-like codepoints have nonzero width) #47

Merged
merged 2 commits into from
Jun 26, 2015

Conversation

stevengj
Copy link
Member

This uses width 1 for symbols that are missing from Unifont but have categories indicating that they have nonzero width. See #46.

It also corrects for a few apparent bugs in Unifont's widths (https://savannah.gnu.org/bugs/index.php?45395).

@ScottPJones
Copy link
Contributor

👍 LGTM

@stevengj
Copy link
Member Author

@jiahao, is this the same failure you were seeing in #45?

@jiahao
Copy link
Collaborator

jiahao commented Jun 25, 2015

It's very similar. The mdf5 hashes agree for both the cached and original versions of UnicodeData.txt - 3a83069e69e2a9101dc4749593cd3268. Cannot reproduce locally.

@jiahao
Copy link
Collaborator

jiahao commented Jun 25, 2015

My ruby version is ruby 2.1.2p95 (2014-05-08) [x86_64-linux-gnu]

@jiahao
Copy link
Collaborator

jiahao commented Jun 25, 2015

Looks like we are seeing ruby version-specific behavior. I get a different utf8proc_data.c output on ruby 2.0.0p481 (2014-05-08 revision 45883) [universal.x86_64-darwin14] with the same UnicodeData.txt. Our Travis build uses ruby-1.9.3-p551.

@stevengj
Copy link
Member Author

I'm using Ruby 2.0.0p481.

Perhaps the culprit is the last few lines of data_generator.rb:

$stdout << "const utf8proc_int32_t utf8proc_combinations[] = {\n  "
i = 0
comb1st_indicies.keys.each_index do |a|
  comb2nd_indicies.keys.each_index do |b|
    i += 1
    if i == 8
      i = 0
      $stdout << "\n  "
    end
    $stdout << ( comb_array[a][b] or -1 ) << ", "
  end
end
$stdout << "};\n\n"

It looks like the output order could depend on the order of the keys in a hash table. Probably we should just sort them.

@ScottPJones
Copy link
Contributor

Maybe we should just rewrite that in a better language? 😀 I happen to know a very nice one!

@stevengj
Copy link
Member Author

Nope, that wasn't it.

@jiahao
Copy link
Collaborator

jiahao commented Jun 26, 2015

I don't think the current Unicode data file is sorted

@stevengj
Copy link
Member Author

I regenerated the unicode_data.c file and it didn't change for me...

@stevengj stevengj merged commit eefdaed into master Jun 26, 2015
@stevengj
Copy link
Member Author

Okay, whatever @jiahao did seems to have worked.

@stevengj stevengj deleted the more_widths branch June 27, 2015 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants