Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support non-ASCII characters in function arguments #2584

Merged
merged 37 commits into from
Aug 21, 2023
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
75778bf
Support non-ASCII characters
seisman Jun 24, 2023
e840c99
Support all ISOlatin1 characters
seisman Jun 24, 2023
1b6d940
Support more ISOLatin1+ characters
seisman Jun 25, 2023
1b88cd6
fix
seisman Jun 25, 2023
311b976
Update pygmt/helpers/utils.py
seisman Jun 26, 2023
672413b
Refactor to make it more readable
seisman Jun 26, 2023
2dcc288
Need to remove single quote
seisman Jun 26, 2023
c9b8254
[ci skip] Use a better reference for ASCII table
seisman Jun 26, 2023
e2947fa
Support Symbols charset
seisman Jun 26, 2023
1a84634
Support ZapfDingbats charset
seisman Jun 26, 2023
636ace0
Refactor and add more doctests
seisman Jun 26, 2023
ffabaee
Fix a symbol which is incorrectly copied from PDF
seisman Jun 27, 2023
e1b43b2
Replace octal codes with non-ASCII character in two examples
seisman Jun 27, 2023
7ea78da
Fix a typo in doctest
seisman Jun 27, 2023
487f2d8
Merge branch 'main' into non-ascii-support
seisman Jun 30, 2023
288486c
Fix some characters
seisman Jun 30, 2023
97a223d
Fix symbol characters
seisman Jun 30, 2023
4ff2e56
Add one more reference
seisman Jun 30, 2023
388109f
Add two more references
seisman Jun 30, 2023
2270d9d
Update ZapfDingbats charset
seisman Jul 1, 2023
117b6e5
Make it clear that Symbol/ZapfDingbats are from Adobe
seisman Jul 2, 2023
1798ccc
Update for ISOLatin1+ charset
seisman Jul 2, 2023
e4bedb9
Fix registered sign, copyright sign and trade mark sign
seisman Jul 2, 2023
5ef84f9
Add more notes
seisman Jul 2, 2023
490535c
Add a test for non-ascii support
seisman Jul 2, 2023
418adc5
Update the dvc file
seisman Jul 2, 2023
4b66b8a
Fix styling issue
seisman Jul 2, 2023
37b0c6a
Add docstrings
seisman Jul 2, 2023
3362b31
Update examples/gallery/embellishments/colorbar.py
seisman Jul 22, 2023
695c59b
Merge branch 'main' into non-ascii-support
seisman Aug 2, 2023
bbc223b
Remove an unused pylint directive
seisman Aug 2, 2023
35306bf
Fix a typo in doctest
seisman Aug 2, 2023
e56359e
Merge branch 'main' into non-ascii-support
seisman Aug 5, 2023
633b2f9
Merge branch 'main' into non-ascii-support
seisman Aug 16, 2023
cdb4ab1
Merge branch 'main' into non-ascii-support
seisman Aug 17, 2023
646465c
Update pygmt/helpers/utils.py
seisman Aug 21, 2023
9ce0217
Merge branch 'main' into non-ascii-support
seisman Aug 21, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion examples/gallery/embellishments/colorbar.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
# with a length/width (+w) of 4 cm by 0.5 cm, and plotted horizontally (+h)
position="g0.3/8.7+w4c/0.5c+h",
box=True,
frame=["x+lTemperature", r"y+l\260C"],
frame=["x+lTemperature", r"y+l°C"],
seisman marked this conversation as resolved.
Show resolved Hide resolved
scale=100,
)

Expand Down
5 changes: 2 additions & 3 deletions examples/gallery/symbols/text_symbols.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,8 @@
# plot a lowercase "s" of size 3.5c and use the "Times-Italic" font,
# color fill is set to "gold"
fig.plot(x=5.5, y=1.5, style="l3.5c+ts+fTimes-Italic", fill="gold", pen=pen)
# plot the pi symbol (\160 is octal code for pi) of size 3.5c, for this use
# the "Symbol" font, the outline color of the symbol is set to
# plot the pi symbol of size 3.5c, the outline color of the symbol is set to
# "darkorange", the color fill is set to "magenta4"
fig.plot(x=7, y=1.5, style="l3.5c+t\160+fSymbol,darkorange", fill="magenta4", pen=pen)
fig.plot(x=7, y=1.5, style="l3.5c+tπ+fdarkorange", fill="magenta4", pen=pen)

fig.show()
81 changes: 80 additions & 1 deletion pygmt/helpers/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import os
import pathlib
import shutil
import string
import subprocess
import sys
import time
Expand Down Expand Up @@ -91,6 +92,84 @@ def data_kind(data, x=None, y=None, z=None, required_z=False):
return kind


def non_ascii_to_octal(argstr):
r"""
Translate non-ASCII characters to their corresponding octal codes.

Currently, only the ISOLatin1+ character set is supported.

References:

- https://docs.generic-mapping-tools.org/latest/cookbook/octal-codes.html
- https://www.ascii-code.com/ISO-8859-1
- https://www.adobe.com/jp/print/postscript/pdfs/PLRM.pdf

Parameters
----------
argstr : str
The string to be translated.

Returns
-------
str
The translated string.
seisman marked this conversation as resolved.
Show resolved Hide resolved

Examples
--------
>>> non_ascii_to_octal("•‰“”±°ÿ")
'\\31\\214\\216\\217\\261\\260\\377'
>>> non_ascii_to_octal("αζ∆Ω∑∏∇")
'@~\\141@~@~\\172@~@~\\104@~Ω@~\\345@~@~\\325@~@~\\321@~
seisman marked this conversation as resolved.
Show resolved Hide resolved
>>> non_ascii_to_octal("✁❞❡➾")
'@%34%\\41@%%@%34%\\176@%%@%34%\\241@%%@%34%\\376@%%'
>>> non_ascii_to_octal("ABC ±120° DEF α ♥")
'ABC \\261120\\260 DEF @~\\141@~ @%34%\\252@%%'
"""
# pylint: disable=line-too-long

# Dictionary mapping non-ASCII characters to octal codes
mapping = {}

# Symbol charset: \041-\176 and \240-\376
mapping.update(
{
c: "@~\\" + format(i, "o") + "@~"
for c, i in zip(
"!∀#∃%&∋()∗+,−./0123456789:;<=>?≅ΑΒΧ∆ΕΦΓΗΙθΚΛΜΝΟΠΘΡΣΤΥςΩΞΨΖ[∴]⊥_αβχδεφγηιφκλμνοπθρστυϖωξψζ{|}∼€Υ′≤⁄∞ƒ♣♦♥♠↔←↑→↓°±″≥×∝∂•÷≠≡≈…↵אIR℘⊗⊕∅∩∪⊃⊇⊄⊂⊆∈∉∠∇∏√⋅¬∧∨⇔⇐⇑⇒⇓◊〈∑ 〉∫⌠⌡",
[*range(33, 127), *range(160, 255)],
)
}
)

# ZapfDingbats charset: \041-\176 and \240-\376
mapping.update(
{
c: "@%34%\\" + format(i, "o") + "@%%"
for c, i in zip(
"✁✂✃✄☎✆✇✈✉☛☞✌✍✎✏✐✑✒✓✔✕✖✗✘✙✚✛✜✝✞✟✠✡✢✣✤✥✦✧★✩✪✫✬✭✮✯✰✱✲✳✴✵✶✷✸✹✺✻✼✽✾✿❀❁❂❃❄❅❆❇❈❉❊❋●❍■❏❐❑❒▲▼◆❖◗❘❙❚❛❜❝❞❡❢❣❤❥❦❧♣♦♥♠①②③④⑤⑥⑦⑧⑨⑩❶❷❸❹❺❻❼❽❾❿➀➁➂➃➄➅➆➇➈➉➊➋➌➍➎➏➐➑➒➓➔→↔↕➘➙➚➛➜➝➞➟➠➡➢➣➤➥➦➧➨➩➪➫➬➭➮➯ ➱➲➳➴➵➶➷➸➹➺➻➼➽➾",
[*range(33, 127), *range(161, 255)],
)
}
)

# ISOLatin1+ charset: \031-\037 and \177-\237
mapping.update(
{
c: "\\" + format(i, "o")
for c, i in zip(
"•…™—–fižšŒ†‡Ł⁄‹Š›œŸŽł‰„“”ı`´ˆ˜¯˘˙¨‚˚¸'˝˛ˇ",
[*range(25, 32), *range(127, 160)],
)
}
)
# ISOLatin1+ charset: \240-\377
mapping.update({chr(i): "\\" + format(i, "o") for i in range(160, 256)})
Copy link
Member Author

@seisman seisman Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, Python supports many different encodings, thus it's possible to implement this feature without manually maintaining the big dictionary, just like what I already do at line 166 for ISOLatin1+ characters \240 to \377. But I don't have enough knowledge about Python encodings now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update:

Can be done using:

for i in range(160, 256):
    print(i, chr(i), chr(i).encode("iso-8859-1").decode("iso-8859-5"))

Copy link
Member Author

@seisman seisman Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the improved version:

for code in [*range(0o040, 0o200), *range(0o240, 0o400)]:
    char = codecs.decode(bytes([code]), "iso8859-5", errors="replace")


# Remove any printable characters
mapping = {k: v for k, v in mapping.items() if k not in string.printable}
return argstr.translate(str.maketrans(mapping))


def build_arg_string(kwdict, confdict=None, infile=None, outfile=None):
r"""
Convert keyword dictionaries and input/output files into a GMT argument
Expand Down Expand Up @@ -213,7 +292,7 @@ def build_arg_string(kwdict, confdict=None, infile=None, outfile=None):
gmt_args = [str(infile)] + gmt_args
if outfile:
gmt_args.append("->" + str(outfile))
return " ".join(gmt_args)
return non_ascii_to_octal(" ".join(gmt_args))


def is_nonstr_iter(value):
Expand Down