Base-93 binary/ASCII encoding
Take Vos
2019-09-30
Base-93 is a new binary/ASCII encoding, similar to base-64.
It has the following features:
- Error detection on each chunk of 13 characters;
warning the user about copy-and-paste errors.
- A chunk of 13 characters encodes 10 bytes of data + CRC-5,
with an encoding efficiency of about 6.15 bit/character.
- A base-93 message is delimited so that it can coexist
with text in a document. The delimitation also allows multiple
base-93 messages to exist in a single document.
Message
A base-93 encoded message starts with the four characters ~b93
and ends with the tilde character ~.
The content of a message is a concatenation of base-93 numbers.
Each number is 13 digits long, except for the last number which may
be shorter to encode fewer bytes.
Lines in the message should be limited to 76 characters. To make
it easier to copy and paste between applications. The line separator
should fall inside a base-93 number, to make it possible to detect
missing lines in the message.
The end of message tilde ~ should not be on a line by itself,
it is allowed to create a line with 77 characters.
Base-93 Number
A Base-93 number is 2 to 13 digits in size, The digits are ordered
from most-significant on the left to least-significant on the right.
The following table shows how many digits are needed to encode a certain
number of whole bytes together with a 5 bit CRC:
bytes | digits | bits |
1 | 2 | 13 |
2 | 4 | 26 |
3 | 5 | 32 |
4 | 6 | 39 |
5 | 7 | 45 |
6 | 9 | 58 |
7 | 10 | 65 |
8 | 11 | 71 |
9 | 12 | 78 |
10 | 13 | 85 |
Base-93 Digit
A base-93 digit is represented by a printable ASCII character.
The value of the digit is the ASCII value of that character minus the ASCII
value of the exclamation point '!' character.
A tilde '~' character marks the end of a message and is not part of a base-93
number. Any character value above 128 inside the message is an error.
Any other character is ignored.
Data chunk
A base-93 number encodes the lower 85 bits of an unsigned integer.
The unsigned integer represents a data chunk of 1 to 10 data bytes
together with a 5 bit CRC.
The 5 bit CRC is located in the least significant bits of the
unsigned integer. Then follows each byte of the data chuck; the first
byte into bits 13:5, the second byte in bits 21:13, etc.
All other bits in the unsigned integer should be set to zero.
The polynomial x5 + x2 + 1 (0b100101)
is used for the CRC check over bits 85:5 of the unsigned integer.
85 77 69 61 53 45 37 29 21 13 5 0
+----+----+----+----+----+----+----+----+----+----+---+
| D9 | D8 | D7 | D6 | D5 | D4 | D3 | D2 | D1 | D0 |CRC|
+----+----+----+----+----+----+----+----+----+----+---+