The Problem
The HP48 calculators have a text encoding that is based on the Latin-1 character set (a.k.a. ISO 8859-1) with the exception of 34 of the control characters. These characters are 0x1F and 0x7F to 0x9F. Instead of leaving these characters as the normal Latin-1 control codes, HP re-purposed these mostly unused control codes for 34 characters better suited for displaying on a high-end calculator. Problems appear when the re-purposed characters are present in HP48 text or file names that are being used on a different computing platform (ex: transferring a file from an HP48 to a PC). This sometimes results in garbage data, bugs, and crashes in software that doesn’t attempt to handle these special characters.
Unicode has become the ubiquitous standard since the time the HP48 was originally created. Unicode supports over 1 million possible characters. This means that it is now possible to convert HP48 text to characters that much of the world now uses.
However, with so many characters to chose from that look similar, sometimes the issue then becomes one to use. For example, the number 0 and the letter O looks somewhat similar, depending on what font is being used.
Solution
To convert an HP48 character to a Unicode character, use the following mapping table:
HP48 | Unicode | |||||
Decimal | Hex | I/O Char* | Name | Char | Hex | UTF-8 |
31 | 1F | Ellipsis | … | 2026 | E2 80 A6 | |
127 | 7F | Medium Shade | ▒ | 2592 | E2 96 92 | |
128 | 80 | \<) | Angle | ∠ | 2220 | E2 88 A0 |
129 | 81 | \x- | Latin Small Letter a with Macron | ā | 0101 | C4 81 |
130 | 82 | \.V | Nabla | ∇ | 2207 | E2 88 87 |
131 | 83 | \v/ | Square Root | √ | 221A | E2 88 9A |
132 | 84 | \.S | Integral | ∫ | 222B | E2 88 AB |
133 | 85 | \GS | Greek Capital Letter Sigma | Σ | 03A3 | CE A3 |
134 | 86 | \|> | Black Right-Pointing Triangle | ▶ | 25B6 | E2 96 B6 |
135 | 86 | \pi | Greek Small Letter Pi | π | 03C0 | CF 80 |
136 | 88 | \.d | Partial Differential | ∂ | 2202 | E2 88 82 |
137 | 89 | \<= | Less-Than or Equal To | ≤ | 2264 | E2 89 A4 |
138 | 8A | \>= | Greater-Than or Equal To | ≥ | 2265 | E2 89 A5 |
139 | 8B | \=/ | Not Equal To | ≠ | 2260 | E2 89 A0 |
140 | 8C | \Ga | Greek Small Letter Alpha | α | 03B1 | CE B1 |
141 | 8D | \-> | Rightwards Arrow | → | 2192 | E2 86 92 |
142 | 8E | \<- | Leftwards Arrow | ← | 2190 | E2 86 90 |
143 | 8F | \|v | Downwards Arrow | ↓ | 2193 | E2 86 93 |
144 | 90 | \|^ | Upwards Arrow | ↑ | 2191 | E2 86 91 |
145 | 91 | \Gg | Greek Small Letter Gamma | γ | 03B3 | CE B3 |
146 | 92 | \Gd | Greek Small Letter Delta | δ | 03B4 | CE B4 |
147 | 93 | \Ge | Greek Small Letter Epsilon | ε | 03B5 | CE B5 |
148 | 94 | \Gn | Greek Small Letter Eta | η | 03B7 | CE B7 |
149 | 95 | \Gh | Greek Small Letter Theta | θ | 03B8 | CE B8 |
150 | 96 | \Gl | Greek Small Letter Lamda | λ | 03BB | CE BB |
151 | 97 | \Gr | Greek Small Letter Rho | ρ | 03C1 | CF 81 |
152 | 98 | \Gs | Greek Small Letter Sigma | σ | 03C3 | CF 83 |
153 | 99 | \Gt | Greek Small Letter Tau | τ | 03C4 | CF 84 |
154 | 9A | \Gw | Greek Small Letter Omega | ω | 03C9 | CF 89 |
155 | 9B | \GD | Greek Capital Letter Delta | Δ | 0394 | CE 94 |
156 | 9C | \PI | Greek Capital Letter Pi | Π | 03A0 | CE A0 |
157 | 9D | \GW | Greek Captial Letter Omega | Ω | 03A9 | CE A9 |
158 | 9E | \[] | Black Square | ■ | 25A0 | E2 96 A0 |
159 | 9F | \oo | Infinity | ∞ | 221E | E2 88 9E |
* not all I/O Characters are listed here.
All remaining HP48 characters can be directly mapped to Unicode. For example, an HP48 ‘A’ is 0×41 and in Unicode is 0041. This applies for the ranges of 0×00 to 0x1E, 0×20 to 0x7E, and 0xA0 to 0xFF.
If you are using UTF-8, then it is necessary to encode each Unicode characters into 1, 2, or 3 byte sequences. Details are available at http://en.wikipedia.org/wiki/Utf-8.
Rationale
- Character 0×80 (angle)
- Instead using ∠ 2220 for character 0×80, others have incorrectly used ∟ 221F. This is the Right Angle character and is not intended for any generic angle. Also, it does not visually match the HP48.
- While ∡ 2221 is visually an even better match, this character often does not render properly on various computer platforms and software. In short, some users will just see empty boxes such as:
- Character 0×81 (x-bar)
- In theory, Unicode allows two characters to be visually combined if the 2nd character is a “combining character”. This would allow for the display of x̄ by using x followed by the “combining macron” character, which would be 0078 followed by 0304. However, there are two problems with this.
- This combining of these two characters often renders poorly or not at all and will leave the user confused. In the example below, the first two in the example are rendering failures while the last two are simply difficult to read at the default settings:
For additional examples of how x-bar is inconsistently rendered based on font, go http://www.kreativekorp.com/charset/encoding.php?file=hp-48.kte&char=81. - Using two characters to represent one HP48 character breaks the pattern having a simple one-to-one mapping. Some HP48 developers will likely have bugs in the code when converting back from Unicode to HP48 characters.
- This combining of these two characters often renders poorly or not at all and will leave the user confused. In the example below, the first two in the example are rendering failures while the last two are simply difficult to read at the default settings:
- Instead, ā 0101 is used. It is a single Unicode character so it is easy for HP48 developers to deal with, leading to less bugs. Also, x-bar is used in statistics as the notation for average and ā looks like an ‘a’ for average.
- In theory, Unicode allows two characters to be visually combined if the 2nd character is a “combining character”. This would allow for the display of x̄ by using x followed by the “combining macron” character, which would be 0078 followed by 0304. However, there are two problems with this.
- Character 0×82 (nabla)
- The character ∇ 2207 was chosen over other triangles since this is the Nabla character which is used in mathematics. Details can be read http://en.wikipedia.org/wiki/Nabla_symbol.
- Characters 0x8D through 0×90 (arrows)
- In Unicode, there are a large number of characters that represent arrows. However, 2190 through 2193 were chosen because these are just simple arrow characters and don’t carry any additional implied meaning. Also, this set of arrow characters supports all four directions where as some of the other sets do not. Lastly, some of the alternative arrow characters do not consistently get rendered on some computing platforms.
- Characters 0×85, 0x8C, 0x9B, 0x9C, 0x9D (various Greek symbols)
- These are Greek symbols that could have alternatively been represented by various mathematical or electrical Unicode characters. However there are several reasons for preferring the Greek symbols:
- We can gain insight into the original HP48 developers intentions by looking at how they translated these characters when using ASCII transfer mode 2 or 3 over a serial link. These characters were translated into \GS, \Ga, \GD, \PI, and \GW respectively. If we assume that “G” stands for Greek, then we can assume these translations mean Greek Capital Sigma, Greek lower alpha, Greek Capital Delta, Capital Pi, and Greek Capital Omega (a lower omega looks like a ‘w’). This pattern holds for all the other translated Greek letters as well, except for \pi which is trivial to see that this is lower pi.
- Using all Greek symbols results in a visually clean look. In contrast, when symbols from math, electronics, and Greek symbols are mixed together, they often look sloppy because they don’t line up, have different line weights, and different drawing styles.
- These are Greek symbols that could have alternatively been represented by various mathematical or electrical Unicode characters. However there are several reasons for preferring the Greek symbols:
- Character 0x9E (box)
- Instead of using ■ 25A0 as the Black Box, others have incorrectly used ▬ 25AC which is the Black Rectangle. This visually does not match.
Other HP48 to Unicode Mappings
- http://www.kreativekorp.com/charset/encoding.php?file=hp-48.kte – Differs from above on HP48 characters 0×81 and 0×85. Characters 0x1F and 0x7F are not dealt with.
- http://www.kostis.net/charsets/hp48.htm – Differs from above on HP48 characters 0×80, 0×85, and 0x9E. Characters 0x1F, 0x7F, and 0×81 are not dealt with.
Note: efforts are being made (or will be made) to rectify the differences.
Other Resources
- Unicode Standard: http://unicode.org/
- Unicode Character Name Index: http://www.unicode.org/charts/charindex.html
- HP48 ASCII Transfer mode translations: http://holyjoe.net/hp/tiotable.htm
- Newsgroup Post: https://groups.google.com/d/topic/comp.sys.hp48/hek271hUD-E/discussion
- Matching Tables: