Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Generate the charset tables dynamically from codes #3409

Merged
merged 9 commits into from
Sep 1, 2024
176 changes: 70 additions & 106 deletions doc/techref/encodings.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,34 @@
---
file_format: mystnb
---

```{code-cell}
---
tags: [remove-input]
---
from IPython.display import display, Markdown
from pygmt.encodings import charset


def get_charset_mdtable(name):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is modified from the original script at #3206 (comment)

"""
Create a markdown table for a charset.
"""
mappings = charset[name]

text = "| Octal | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |\n"
text += "|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|\n"
for i in range(0o00, 0o400, 8):
chars = [mappings.get(j) for j in range(i, i + 8)]
if all(v is None for v in chars): # All characters in this row are undefined
continue
row = f"\\{i:03o}"[:-1] + "x"
chars = [f"&#x{ord(char):04x};" for char in chars]
text += f"| **{row}** | {' | '.join(chars)} |\n"
text += "\n"
return Markdown(text)
```

# Supported Encodings and Non-ASCII Characters

GMT supports a number of encodings and each encoding contains a set of ASCII and
Expand All @@ -6,125 +37,58 @@ in arguments and text strings. When using non-ASCII characters in PyGMT, the eas
is to copy and paste the character from the encoding tables below.

**Note**: The special character � (REPLACEMENT CHARACTER) is used to indicate
that the character is not defined in the encoding.
that the character is undefined in the encoding.

## Adobe ISOLatin1+ Encoding

| octal | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|---|---|
| **\03x** | � | • | … | ™ | — | – | fi | ž |
| **\04x** |   | ! | " | # | $ | % | & | ’ |
| **\05x** | ( | ) | * | + | , | - | . | / |
| **\06x** | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| **\07x** | 8 | 9 | : | &#x003b; | < | = | > | ? |
| **\10x** | @ | A | B | C | D | E | F | G |
| **\11x** | H | I | J | K | L | M | N | O |
| **\12x** | P | Q | R | S | T | U | V | W |
| **\13x** | X | Y | Z | [ | \ | ] | ^ | _ |
| **\14x** | ‘ | a | b | c | d | e | f | g |
| **\15x** | h | i | j | k | l | m | n | o |
| **\16x** | p | q | r | s | t | u | v | w |
| **\17x** | x | y | z | { | | | } | ~ | š |
| **\20x** | Œ | † | ‡ | Ł | ⁄ | ‹ | Š | › |
| **\21x** | œ | Ÿ | Ž | ł | ‰ | „ | “ | ” |
| **\22x** | ı | ` | ´ | ^ | ˜ | ¯ | ˘ | ˙ |
| **\23x** | ¨ | ‚ | ˚ | ¸ | ' | ˝ | ˛ | ˇ |
| **\24x** | � | ¡ | ¢ | £ | ¤ | ¥ | ¦ | § |
| **\25x** | ¨ | © | ª | « | ¬ | ­ | ® | ¯ |
| **\26x** | ° | ± | ² | ³ | ´ | µ | ¶ | · |
| **\27x** | ¸ | ¹ | º | » | ¼ | ½ | ¾ | ¿ |
| **\30x** | À | Á | Â | Ã | Ä | Å | Æ | Ç |
| **\31x** | È | É | Ê | Ë | Ì | Í | Î | Ï |
| **\32x** | Ð | Ñ | Ò | Ó | Ô | Õ | Ö | × |
| **\33x** | Ø | Ù | Ú | Û | Ü | Ý | Þ | ß |
| **\34x** | à | á | â | ã | ä | å | æ | ç |
| **\35x** | è | é | ê | ë | ì | í | î | ï |
| **\36x** | ð | ñ | ò | ó | ô | õ | ö | ÷ |
| **\37x** | ø | ù | ú | û | ü | ý | þ | ÿ |
```{code-cell}
---
tags: [remove-input]
---
display(get_charset_mdtable("ISOLatin1+"))
```

## Adobe Symbol Encoding

| octal | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|---|---|
| **\04x** |   | ! | ∀ | # | ∃ | % | & | ∋ |
| **\05x** | ( | ) | ∗ | + | , | − | . | / |
| **\06x** | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| **\07x** | 8 | 9 | : | &#x003b; | < | = | > | ? |
| **\10x** | ≅ | Α | Β | Χ | ∆ | Ε | Φ | Γ |
| **\11x** | Η | Ι | ϑ | Κ | Λ | Μ | Ν | Ο |
| **\12x** | Π | Θ | Ρ | Σ | Τ | Υ | ς | Ω |
| **\13x** | Ξ | Ψ | Ζ | [ | ∴ | ] | ⊥ | _ |
| **\14x** |  | α | β | χ | δ | ε | φ | γ |
| **\15x** | η | ι | ϕ | κ | λ | μ | ν | ο |
| **\16x** | π | θ | ρ | σ | τ | υ | ϖ | ω |
| **\17x** | ξ | ψ | ζ | { | | | } | ∼ | � |
| **\24x** | € | ϒ | ′ | ≤ | ∕ | ∞ | ƒ | ♣ |
| **\25x** | ♦ | ♥ | ♠ | ↔ | ← | ↑ | → | ↓ |
| **\26x** | ° | ± | ″ | ≥ | × | ∝ | ∂ | • |
| **\27x** | ÷ | ≠ | ≡ | ≈ | … | ⏐ | ⎯ | ↵ |
| **\30x** | ℵ | ℑ | ℜ | ℘ | ⊗ | ⊕ | ∅ | ∩ |
| **\31x** | ∪ | ⊃ | ⊇ | ⊄ | ⊂ | ⊆ | ∈ | ∉ |
| **\32x** | ∠ | ∇ | ® | © | ™ | ∏ | √ | ⋅ |
| **\33x** | ¬ | ∧ | ∨ | ⇔ | ⇐ | ⇑ | ⇒ | ⇓ |
| **\34x** | ◊ | 〈 | ® | © | ™ | ∑ | ⎛ | ⎜ |
| **\35x** | ⎝ | ⎡ | ⎢ | ⎣ | ⎧ | ⎨ | ⎩ | ⎪ |
| **\36x** | � | 〉 | ∫ | ⌠ | ⎮ | ⌡ | ⎞ | ⎟ |
| **\37x** | ⎠ | ⎤ | ⎥ | ⎦ | ⎫ | ⎬ | ⎭ | � |

**Note**: The octal code `\140` represents the RADICAL EXTENDER character, which is not available in
the Unicode character set.
```{code-cell}
---
tags: [remove-input]
---
display(get_charset_mdtable("Symbol"))
```

**Note**: The octal code `\140` represents the RADICAL EXTENDER character, which is not
available in the Unicode character set.

## Adobe ZapfDingbats Encoding

| octal | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|---|---|
| **\04x** |   | ✁ | ✂ | ✃ | ✄ | ☎ | ✆ | ✇ |
| **\05x** | ✈ | ✉ | ☛ | ☞ | ✌ | ✍ | ✎ | ✏ |
| **\06x** | ✐ | ✑ | ✒ | ✓ | ✔ | ✕ | ✖ | ✗ |
| **\07x** | ✘ | ✙ | ✚ | ✛ | ✜ | ✝ | ✞ | ✟ |
| **\10x** | ✠ | ✡ | ✢ | ✣ | ✤ | ✥ | ✦ | ✧ |
| **\11x** | ★ | ✩ | ✪ | ✫ | ✬ | ✭ | ✮ | ✯ |
| **\12x** | ✰ | ✱ | ✲ | ✳ | ✴ | ✵ | ✶ | ✷ |
| **\13x** | ✸ | ✹ | ✺ | ✻ | ✼ | ✽ | ✾ | ✿ |
| **\14x** | ❀ | ❁ | ❂ | ❃ | ❄ | ❅ | ❆ | ❇ |
| **\15x** | ❈ | ❉ | ❊ | ❋ | ● | ❍ | ■ | ❏ |
| **\16x** | ❐ | ❑ | ❒ | ▲ | ▼ | ◆ | ❖ | ◗ |
| **\17x** | ❘ | ❙ | ❚ | ❛ | ❜ | ❝ | ❞ | � |
| **\20x** | ❨ | ❩ | ❪ | ❫ | ❬ | ❭ | ❮ | ❯ |
| **\21x** | ❰ | ❱ | ❲ | ❳ | ❴ | ❵ | � | � |
| **\24x** | � | ❡ | ❢ | ❣ | ❤ | ❥ | ❦ | ❧ |
| **\25x** | ♣ | ♦ | ♥ | ♠ | ① | ② | ③ | ④ |
| **\26x** | ⑤ | ⑥ | ⑦ | ⑧ | ⑨ | ⑩ | ❶ | ❷ |
| **\27x** | ❸ | ❹ | ❺ | ❻ | ❼ | ❽ | ❾ | ❿ |
| **\30x** | ➀ | ➁ | ➂ | ➃ | ➄ | ➅ | ➆ | ➇ |
| **\31x** | ➈ | ➉ | ➊ | ➋ | ➌ | ➍ | ➎ | ➏ |
| **\32x** | ➐ | ➑ | ➒ | ➓ | ➔ | → | ↔ | ↕ |
| **\33x** | ➘ | ➙ | ➚ | ➛ | ➜ | ➝ | ➞ | ➟ |
| **\34x** | ➠ | ➡ | ➢ | ➣ | ➤ | ➥ | ➦ | ➧ |
| **\35x** | ➨ | ➩ | ➪ | ➫ | ➬ | ➭ | ➮ | ➯ |
| **\36x** | � | ➱ | ➲ | ➳ | ➴ | ➵ | ➶ | ➷ |
| **\37x** | ➸ | ➹ | ➺ | ➻ | ➼ | ➽ | ➾ | � |
```{code-cell}
---
tags: [remove-input]
---
display(get_charset_mdtable("ZapfDingbats"))
```

## ISO/IEC 8859

PyGMT also supports the ISO/IEC 8859 standard for 8-bit character encodings. Refer to
<https://en.wikipedia.org/wiki/ISO/IEC_8859> for descriptions of the different parts of
the standard.
[ISO/IEC 8859](https://en.wikipedia.org/wiki/ISO/IEC_8859) for descriptions of the
different parts of the standard.

For a list of the characters in each part of the standard, refer to the following links:

- <https://en.wikipedia.org/wiki/ISO/IEC_8859-1>
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-2>
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-3>
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-4>
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-5>
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-6>
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-7>
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-8>
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-9>
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-10>
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-11>
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-13>
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-14>
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-15>
- <https://en.wikipedia.org/wiki/ISO/IEC_8859-16>
- [ISO/IEC 8859-1](https://en.wikipedia.org/wiki/ISO/IEC_8859-1)
- [ISO/IEC 8859-2](https://en.wikipedia.org/wiki/ISO/IEC_8859-2)
- [ISO/IEC 8859-3](https://en.wikipedia.org/wiki/ISO/IEC_8859-3)
- [ISO/IEC 8859-4](https://en.wikipedia.org/wiki/ISO/IEC_8859-4)
- [ISO/IEC 8859-5](https://en.wikipedia.org/wiki/ISO/IEC_8859-5)
- [ISO/IEC 8859-6](https://en.wikipedia.org/wiki/ISO/IEC_8859-6)
- [ISO/IEC 8859-7](https://en.wikipedia.org/wiki/ISO/IEC_8859-7)
- [ISO/IEC 8859-8](https://en.wikipedia.org/wiki/ISO/IEC_8859-8)
- [ISO/IEC 8859-9](https://en.wikipedia.org/wiki/ISO/IEC_8859-9)
- [ISO/IEC 8859-10](https://en.wikipedia.org/wiki/ISO/IEC_8859-10)
- [ISO/IEC 8859-11](https://en.wikipedia.org/wiki/ISO/IEC_8859-11)
- [ISO/IEC 8859-13](https://en.wikipedia.org/wiki/ISO/IEC_8859-13)
- [ISO/IEC 8859-14](https://en.wikipedia.org/wiki/ISO/IEC_8859-14)
- [ISO/IEC 8859-15](https://en.wikipedia.org/wiki/ISO/IEC_8859-15)
- [ISO/IEC 8859-16](https://en.wikipedia.org/wiki/ISO/IEC_8859-16)