Skip to content

Processing binary data

Mikhail Yakshin edited this page Aug 5, 2016 · 3 revisions

Sometimes the data you're working on is not only packed in some structure, but also somehow encoded, obfuscated, encrypted, compressed, etc. So, to be able to parse such data, one has to remove this layer of encryption / obfuscation / compression / etc. This is called "processing" in Kaitai Struct and it is supported with a range of process directives. These can be applied to raw byte buffers or user-typed fields in the following way:

seq:
  - id: buf1
    size: 0x1000
    process: zlib

This declares a field named buf1. When parsing this structure, KS will read exactly 0x1000 bytes from a source stream and then apply zlib processing, i.e. decompression of zlib-compressed stream. Afterwards, accessing buf1 would return decompressed stream (which would be most likely larger than 0x1000 bytes long), and accessing _raw_buf1 property would return raw (originally compressed) stream, exactly 0x1000 bytes long.

There are following processing directives available in Kaitai Struct.

xor(key)

Applies a bitwise XOR (bitwise exclusive "or", written as ^ in most C-like languages) to every byte of the stream. Length of output stays exactly the same as the length of input. There is one mandatory argument - the key to use for XOR operation. It can be:

  • a single byte value — in this case this value would be XORed with every byte of the input stream
  • an array of bytes — in this case, first byte of the input would be XORed with first byte of the key, second byte of the input with second byte of the keys, etc. If the key is shorter than the input, key will be reused, starting from the first byte.

For example, given 3-byte key [b0, b1, b2] and input line [x0, x1, x2, x3, x4, ...] output will be:

[x0 ^ b0, x1 ^ b1, x2 ^ b2,
 x3 ^ b0, x4 ^ b1, ...]

Examples:

  • process: xor(0xaa) — XORs every byte with 0xaa
  • process: xor(7, 42) — XORs every odd (1st, 3rd, 5th, ...) byte with 7, and every even (2nd, 4th, 6th, ...) byte with 42
  • process: xor(key_buf) — XORs bytes using a key stored in a field named key_buf

rol(key), ror(key)

Does a circular shift operation on a buffer, rotating every byte by key bits left (rol) or right (ror).

Examples:

  • process: rol(5) — rotates every byte 5 bits left: every given bit combination b0-b1-b2-b3-b4-b5-b6-b7 becomes b5-b6-b7-b0-b1-b2-b3-b4
  • process: ror(some_val) — rotates every byte right by number of bits determined by some_val field (which might be either parsed previously or calculated on the fly)

zlib

Applies a zlib decompression to input buffer, expecting it to be a full-fledged zlib stream, i.e. having a regular 2-byte zlib header. Decompression parameters are chosen automatically from it. Typical zlib header values:

  • 78 01 — no compression or low compression
  • 78 9C — default compression
  • 78 DA — best compression

Length of output buffer is usually larger that length of the input. This processing method might throw an exception if the data given is not a valid zlib stream.

Clone this wiki locally