You are here: Home | Datamatrix Introduction

Product
Datamatrix Introduction
 
About DataMatrix
   This code is part of 2 dimentional code family, it can encode up to 2335 characters on a very small surface. The encoding is done in two stages : first the datas are converted to 8 bits "codeword" (High level encoding) then those are converted to small black and white squares. (Low level encoding) Moreover an error correction system is included, it allows to reconstitute badly printed, erased, fuzzy or torn off datas. In the continuation of this talk, the word "codeword" will be shortened into CW.
 
The general structure
  • The symbol is a square or rectangular array made with rows and columns. Each cell is a small square black for a bit set to 1 and white for a bit set to 0. The dimension of the square is named the module.
  • The colors can be inverted : white on black.
  • Extended Channel Interpretation (ECI) protocol provides a method to specify particular interpretations on byte values or to identify a particular page code.
    The default ECI code is 000003 which designate the Latin alphabet ISO 8859-1.
  • There are two datamatrix standard : ECC 000-140 and ECC 200. Only ECC 200 can be used for a new project. This study is only dedicated to the ECC 200.
  • A symbol consists of one or several data regions. Each region has a one module wide perimeter.
  • Independently of the number of region, there is one, and only one, mapping matrix. Le size of the matrix is : "region size" x "number of region"
    Example for the 36x36 symbol : "16x16" x "2x2" ---> matrix size is 32x32
    Second example for the 16x48 symbol : "14x22" x "1x2" ---> matrix size is 14x44
  • The number of rows and the number of columns (including the perimeter) are always even (Odd for ECC 000-140 !)
  • If necessary a mechanism allows to distribute more datas on several symbols. (Up to 16)
  • The error correction mechanism is based on Reed-Solomon codes.
  • For square symbol of 48 x 48 and less, Reed-Solomon codes are append after the datas; for other symbols they are interleaved : datas are divided in blocks.
  • Each symbol size has its own number of Reed-Solomon code.
  • The total number of CW per symbol is equal to the number of cells in the matrix divided by 8 (Without decimal part)
  • The 8 bits of each CW are placed in the region in order from left to right and top to bottom; certain CW are split in order to fill the matrix.
  • A quiet zone from 1 module (minimum) is required on the 4 sides.
Low level encoding
  • Thereafter we'll use operators : + --> addition, x --> multiplication, \ --> integer division, MOD --> remainder of the integer division.
  • There are 24 sizes of square symbol and 6 sizes of rectangular symbol. The following array give basic values for each symbol size.

Symbol size
Rows x columns

Number of
data region
(H x V)

Number of Reed
Solomon CW

Number of
block

Square symbols

10x10

1

5

1

12x12

1

7

1

14x14

1

10

1

16x16

1

12

1

18x18

1

14

1

20x20

1

18

1

22x22

1

20

1

24x24

1

24

1

26x26

1

28

1

32x32

2x2

36

1

36x36

2x2

42

1

40x40

2x2

48

1

44x44

2x2

56

1

48x48

2x2

68

1

52x52

2x2

2 x 42

2

64x64

4x4

2 x 56

2

72x72

4x4

4 x 36

4

80x80

4x4

4 x 48

4

88x88

4x4

4 x 56

4

96x96

4x4

4 x 68

4

104x104

4x4

6 x 56

6

120x120

6x6

6 x 68

6

132x132

6x6

8 x 62

8

144x144

6x6

10 x 62

8

Rectangular symbols

8x18

1

7

1

8x32

2

11

1

12x26

1

14

1

12x36

1x2

18

1

16x36

1x2

24

1

16x48

1x2

28

1

  • Each region has a one module wide perimeter. Left and lower sides are entirely black, right and top sides are made up of alternating black and white squares.
  • Each CW is placed in the matrix (If there are several regions, they are assembled to form an unique matrix) on 45 degree parallel diagonal lines and the left top corner is always as shown below
In this image, we can remark than CW nr. 2, 5 and 6 have a regular shape. CW nr. 1, 3, 4 are truncated and the remain of these CW is reported on the other side of the symbol. Here is the entire placement of  the 8 x 8 matrix :
You can remark on this image that the bit 8 of each CW is under the 45 degree parallel diagonal lines. Corner and border conditions are very intricate and different for each matrix size, fortunately Datamatrix standard give us an algorithm in order to make the placement.
 
 High level encoding.
The hight level encoding support 6 compaction mode, ASCII mode is divided in 3 sub-mode :

Compaction mode

Datas to encode

Rate compaction

ASCII ASCII character 0 to 127

1 byte per CW

ASCII extended ASCII character 128 to 255

0.5 byte per CW

ASCII numeric ASCII digits

2 byte per CW

C40 Upper-case alphanumeric

1.5 byte per CW

TEXT Lower-case alphanumeric

1.5 byte per CW

X12 ANSI X12

1.5 byte per CW

EDIFACT ASCII character 32 to 94

1.33  bytet per CW

BASE 256 ASCII character 0 to 255

1 byte per CW

The default character encodation method is ASCII. Some special CWs allow to switch between the encoding methods

Codeword

Data or function

1 to 128

ASCII datas

129

Padding

130 to 229

Pair of digits : 00 to 999

230

Switch to C40 method

231

Switch to Base 256 method

232

FNC1 character

233

Structure of several symbols

234

Reader programming

235

Shift to extended ASCII for one character

236

Macro

237

Macro

238

Switch to ANSI X12 method

239

Switch to TEXT method

240

Switch to EDIFACT method

241

Extended Channel Interpretation character

254

If ASCII method is in force :
End of datas, next CWs are pads CW
If other method is in force :
Switch back to ASCII method or indicate end of datas
If the symbol is not full, pad CWs are required. After the last data CW, the 254 CW indicates the end of the datas or the return to ASCII method. First padding CW is 129 and next padding CWs are computed with the 253-state algorithm.
 
The ASCII mode. This mode has 3 ways to encode character :
   ASCII character in the range 0 to 127
     CW = "ASCII value" + 1
    ● Extended ASCII character in the range 128 to 255
     A first CW with the value 235 and a second CW with the value : "ASCII value" - 127
   Pair of digits 00, 01, 02 ..... 99
     CW = "Pair of digits numerical value" + 130
C40, TEXT and X12 modes
   C40 and TEXT modes are similar : only uppercase and lowercase characters are inverted.
   In these modes 3 data characters are compacted in 2 CWs. In C40 and TEXT modes 3 shift characters allow to indicate an other character set for the next character.
   The 16 bits value of a CW pair is computed as following :
   Value = C1 * 1600 + C2 * 40 + C3 + 1 with C1, C2 and C3 the 3 character values to compact.
   254 CW indicate a return to the ASCII method exept if this mode allows to fill completely the symbol.
   In C40 and TEXT mode a pad character with 0 value can be added at the 2 last characters in order to form a pair of CW.
   If it remains to encode only one character in C40 or TEXT mode or 2 character in X12 mode; it(they) must be encoded with ASCII method but if a single free CW remain in the symbol before data correction CWs, it is assumed that this CW is encoded using ASCII method without using the 254 CW.
   "Upper Shift" character enable to encode extended ASCII character.
 
Extended characters are encoded as follows :
  Generate code "1" to switch to set 2, then the code 30 which is the "upper shift" code.
  Substract 128 from the ASCII value of the character to encode; we obtains a not-  extended character.
  Encode normally this character with changing the set if necessary.
EDIFACT mode
In this mode 4 data characters are compacted in 3 CWs. Each EDIFACT character is coded with 6 bits which are the 6 last bits of the ASCII value.

EDIFACT value

ASCII value character

Comment

0 to 30 64 to 94

EDIFACT value = ASCII value - 64

31  

End of datas, return to ASCII mode

32 to 63 32 to 63

EDIFACT value = ASCII value

"Base 256" mode.
This mode can encode any byte.
After the 231 CW which switch to "base 256" mode, there is a length field. This field is build with 1 or 2 bytes.
Let N the number of data to encode :
If N < 250 a single byte is used, its value is N (from 0 to 249)
If N >= 250 two bytes are used, the value of the first one is : (N \ 250) + 249 (Values from 250 to 255) and the value of the second one is N MOD 250 (Values from 0 to 249).
If N finishes the filling of the symbol: the value of the byte is 0.
Moreover each CW (including the length field) must be computed with the 255-state algorithm
Errors detection and correction.
  • The correction system is based on "Reed Solomon" codes which enjoy the math students and terrify others ...
  • The number of correction CWs depend of the matrix size, more exactly it depend of the bloc size.
  • Reed Solomon codes are based on a polynomial equation where x power is the number of error correction CWs used. For sample with the 8 x 8 matrix we use an equation like this : x5 + ax4 + bx3 + cx2 + dx + e. The numbers a, b, c, d and e are the factors of the polynomial equation.
  • For information the equation is : (x - 2)(x - 22)(x - 23).....(x - 2k) We develop the polynomial equation with Galois arithmetic on each factor...
    There is 16 Reed Solomon block size (See table ) : 5, 7, 10, 11, 12, 14, 18, 20, 24, 28, 36, 42, 48, 56, 62, 68. The factors of these 16 polynomial equations have been pre-computed. You can see the factors file.
  • Rather than to draw the algorithm used to compute the correction CWs, I prefer to provide it to you in Basic.
    Let
    k the number of correction CWs, a the factors array, m the number of data CWs, d the data CWs array and c the correction CWs array. We'll use a temporary variable t.
    c and t are inited with 0. And let's go with the math fiddle :
      For i = 0 To m - 1
        t = (d(i) Xor c(k - 1))
        For j = k - 1 To 0 Step -1
          If t = 0 Then
            c(j) = 0
          Else
            c(j) = Mult(t, a(j))
          End If
          If j > 0 Then c(j) = c(j - 1) Xor c(j)
        Next
      Next
      Mult is the special Galois field multiplication.