| This code is part of 2
dimentional code family, it can encode up to 2335 characters
on a very small surface. The encoding is done in two stages
: first the datas are converted to 8 bits "codeword" (High
level encoding) then those are converted to small black and
white squares. (Low level encoding) Moreover an error
correction system is included, it allows to reconstitute
badly printed, erased, fuzzy or torn off datas. In the
continuation of this talk, the word "codeword" will be
shortened into CW.
|The general structure|
|Low level encoding|
|In this image, we can remark than CW nr. 2, 5 and 6 have a regular shape. CW nr. 1, 3, 4 are truncated and the remain of these CW is reported on the other side of the symbol. Here is the entire placement of the 8 x 8 matrix :|
|You can remark on this image that the bit 8 of each CW is under the 45 degree parallel diagonal lines. Corner and border conditions are very intricate and different for each matrix size, fortunately Datamatrix standard give us an algorithm in order to make the placement.|
|High level encoding.|
|The hight level encoding support 6 compaction mode, ASCII mode is divided in 3 sub-mode :|
|The default character encodation method is ASCII. Some special CWs allow to switch between the encoding methods|
|If the symbol is not full, pad CWs are required. After the last data CW, the 254 CW indicates the end of the datas or the return to ASCII method. First padding CW is 129 and next padding CWs are computed with the 253-state algorithm.|
|The ASCII mode. This mode has 3 ways
to encode character :
● ASCII character in the range 0 to 127
CW = "ASCII value" + 1
● Extended ASCII character in the range 128 to 255
A first CW with the value 235 and a second CW with the value : "ASCII value" - 127
● Pair of digits 00, 01, 02 ..... 99
CW = "Pair of digits numerical value" + 130
|C40, TEXT and X12 modes|
| C40 and TEXT modes are
similar : only uppercase and lowercase characters are
In these modes 3 data characters are compacted in 2 CWs. In C40 and TEXT modes 3 shift characters allow to indicate an other character set for the next character.
The 16 bits value of a CW pair is computed as following :
Value = C1 * 1600 + C2 * 40 + C3 + 1 with C1, C2 and C3 the 3 character values to compact.
254 CW indicate a return to the ASCII method exept if this mode allows to fill completely the symbol.
In C40 and TEXT mode a pad character with 0 value can be added at the 2 last characters in order to form a pair of CW.
If it remains to encode only one character in C40 or TEXT mode or 2 character in X12 mode; it(they) must be encoded with ASCII method but if a single free CW remain in the symbol before data correction CWs, it is assumed that this CW is encoded using ASCII method without using the 254 CW.
"Upper Shift" character enable to encode extended ASCII character.
|Extended characters are encoded as follows :
● Generate code "1" to switch to set 2, then the code 30 which is the "upper shift" code.
● Substract 128 from the ASCII value of the character to encode; we obtains a not- extended character.
● Encode normally this character with changing the set if necessary.
In this mode 4 data characters are compacted in 3 CWs. Each EDIFACT character is coded with 6 bits which are the 6 last bits of the ASCII value.
|"Base 256" mode.|
|This mode can encode any byte.
After the 231 CW which switch to "base 256" mode, there is a length field. This field is build with 1 or 2 bytes.
Let N the number of data to encode :
If N < 250 a single byte is used, its value is N (from 0 to 249)
If N >= 250 two bytes are used, the value of the first one is : (N \ 250) + 249 (Values from 250 to 255) and the value of the second one is N MOD 250 (Values from 0 to 249).
If N finishes the filling of the symbol: the value of the byte is 0.
Moreover each CW (including the length field) must be computed with the 255-state algorithm
|Errors detection and correction.|