Summary
Rec. ITU-T T.801 | ISO/IEC 15444-2 specifies the JPX file format, which can store images encoded using a variety of image codecs. The Fragment List (flst) box is optional feature of JPX that allows image data and metadata to be fragmented within the same file, across multiple files, or across multiple URLs on the internet. This feature, as currently described in the specification, is inherently vulnerable.
This feature becomes dangerous when a server accepts JPEG 2000 images from untrusted users and displays processed images back to them. In such cases, an attacker can exfiltrate local and remote files reachable by the server processing the image.
In this report, we provide multiple Proof-of-Concepts for exfiltrating local files.
This image contains parts of the contents of /proc/self/exe. We used Kakadu's kdu_expand binary to generate this image.
Severity
High - allows an attacker to exfiltrate local and remote files reachable by a server if the server allows the attacker to upload a specially-crafted the image that is displayed back to the attacker.
Proof of Concept
This proof-of-concept is based on Fragment Table boxes and uses the Kakadu's kdu_expand program, which decodes JPX files containing JPEG 2000 images as specified in Rec. ITU-T T.800 | ISO/IEC 15444-1. Other implementations of the JPX file format using other image codecs might be vulnerable.
Dimension-Based File-Read
The size of the image (height, width and number of colors) is provided by the SIZ Segment Marker. The marker also determines the height and width of the tiles. Its layout is shown below:
The most important fields are:
- Xsiz: Width of the decompressed image as a 4-byte field
- Ysiz: Height of the decompressed image as a 4-byte field
- XTsiz: Width of one tile as a 4-byte field
- YTsiz: Height of one tile as a 4-byte field
We decided to inject bytes from a local file into the SIZ segment marker, since we assume a setup where an attacker can see the image’s output dimensions.
Since neither the width or the height of an image can be zero, and they are 32-bit unsigned integers, we encoded each as (0x00, 0x00, 0x01, ), thus each would end up being 0x100 + the value of the respective read byte, giving us a range from 0x100 to 0x1FF.
represents the injected byte. We can inject this byte both into the Xsiz and Ysiz field, where Xsiz contains the first byte and Ysiz the second byte to leak. This allows an attacker to leak 2 bytes of a local file with a single image.
Since an attacker can control the offset of the bytes within the local file, they can repeatedly upload the file with increasing offsets to leak the full file contents.
Dynamic Tile Read
To leak more data in a single image, we needed to either find a single segment marker with a large array of data that could be reliably decoded from the output image OR find a segment marker that could be repeated in the image to have some noticeable effect.
One tempting target is the tile data itself. What if we created an image with a single large tile, and then used a large number of leaked bytes in the compressed pixel data within this tile.
Unfortunately, the compression applied to tiles made this impossible. The tile data is encoded using a variable length encoding technique, and does not permit the 0xFF byte in the stream. Additionally, the compression used in JPEG 2000 is incredibly complex, using wavelet transforms with dynamic coefficients. The length of the tile and meaning of the tile changed based on the data that we leaked, and changed in such a way that almost guaranteed parsing errors.
We set out looking for a segment marker that could be repeated with a useful effect and settled on the Comment (COM) marker. The COM marker includes an arbitrary number of bytes of unstructured data. However, the COM marker was not saved to the output image, otherwise we could have easily leaked data in the Ccomi bytes. In spite of this, the COM marker was valid in any part of the codestream and had variable length, which made it quite useful for us.
The COM marker has the following fields:
We devised a mechanism to leak a single byte per tile with a jump table based on leaked bytes. We used three types of codestream marker to craft this primitive: Start-of-tile (SOT), Comment (COM), and Start-of-Data (SOD).
For each tile, we included a COM marker with variable length depending on a leaked byte (di). We then included a table of 256 entries, each with tile data and another COM marker. In each entry, we included tile data encoding a different pixel color. Each of these entries was exactly 256 bytes long, meaning depending on the value of the leaked byte, a different entry would be "chosen". An example of this encoding is shown below:
Using this mechanism, we managed to leak 20KB of data in a single image. The following screenshot shows the output image of an image converted using Kakadu:
Further Analysis
Please download the attached Proof-of-Concept file: https://github.com/google/security-research/blob/master/pocs/kakadu/passwd.jp2.zip
Then, take a JPEG 2000 decoder of your choosing that supports fragmentation and decode the image so that it can be viewed, e.g. by converting it to a JPG or PNG. This file reads the contents of /etc/passwd. You can also verify this by observing the process and which files are opened via strace, e.g.:
strace -f PROGRAM /tmp/passwd.jp2 2>&1 | grep '/etc/passwd'
In theory, any image decoder that implements this part of the standard is vulnerable, but if you are running into issues reproducing this, please note that we used kdu_expand from Kakadu to convert this image to a bmp file.
Mitigation
JPX readers should, by default, prevent accessing URLs found in a Fragment List box that are external to the JPX file.
Timeline
Date reported: 10/24/2023
Date fixed: 11/13/2023
Date disclosed: 12/15/2023
Summary
Rec. ITU-T T.801 | ISO/IEC 15444-2 specifies the JPX file format, which can store images encoded using a variety of image codecs. The Fragment List (flst) box is optional feature of JPX that allows image data and metadata to be fragmented within the same file, across multiple files, or across multiple URLs on the internet. This feature, as currently described in the specification, is inherently vulnerable.
This feature becomes dangerous when a server accepts JPEG 2000 images from untrusted users and displays processed images back to them. In such cases, an attacker can exfiltrate local and remote files reachable by the server processing the image.
In this report, we provide multiple Proof-of-Concepts for exfiltrating local files.
This image contains parts of the contents of /proc/self/exe. We used Kakadu's kdu_expand binary to generate this image.
Severity
High - allows an attacker to exfiltrate local and remote files reachable by a server if the server allows the attacker to upload a specially-crafted the image that is displayed back to the attacker.
Proof of Concept
This proof-of-concept is based on Fragment Table boxes and uses the Kakadu's kdu_expand program, which decodes JPX files containing JPEG 2000 images as specified in Rec. ITU-T T.800 | ISO/IEC 15444-1. Other implementations of the JPX file format using other image codecs might be vulnerable.
Dimension-Based File-Read
The size of the image (height, width and number of colors) is provided by the SIZ Segment Marker. The marker also determines the height and width of the tiles. Its layout is shown below:
The most important fields are:
We decided to inject bytes from a local file into the SIZ segment marker, since we assume a setup where an attacker can see the image’s output dimensions.
Since neither the width or the height of an image can be zero, and they are 32-bit unsigned integers, we encoded each as (0x00, 0x00, 0x01, ), thus each would end up being 0x100 + the value of the respective read byte, giving us a range from 0x100 to 0x1FF.
represents the injected byte. We can inject this byte both into the Xsiz and Ysiz field, where Xsiz contains the first byte and Ysiz the second byte to leak. This allows an attacker to leak 2 bytes of a local file with a single image.
Since an attacker can control the offset of the bytes within the local file, they can repeatedly upload the file with increasing offsets to leak the full file contents.
Dynamic Tile Read
To leak more data in a single image, we needed to either find a single segment marker with a large array of data that could be reliably decoded from the output image OR find a segment marker that could be repeated in the image to have some noticeable effect.
One tempting target is the tile data itself. What if we created an image with a single large tile, and then used a large number of leaked bytes in the compressed pixel data within this tile.
Unfortunately, the compression applied to tiles made this impossible. The tile data is encoded using a variable length encoding technique, and does not permit the 0xFF byte in the stream. Additionally, the compression used in JPEG 2000 is incredibly complex, using wavelet transforms with dynamic coefficients. The length of the tile and meaning of the tile changed based on the data that we leaked, and changed in such a way that almost guaranteed parsing errors.
We set out looking for a segment marker that could be repeated with a useful effect and settled on the Comment (COM) marker. The COM marker includes an arbitrary number of bytes of unstructured data. However, the COM marker was not saved to the output image, otherwise we could have easily leaked data in the Ccomi bytes. In spite of this, the COM marker was valid in any part of the codestream and had variable length, which made it quite useful for us.
The COM marker has the following fields:
We devised a mechanism to leak a single byte per tile with a jump table based on leaked bytes. We used three types of codestream marker to craft this primitive: Start-of-tile (SOT), Comment (COM), and Start-of-Data (SOD).
For each tile, we included a COM marker with variable length depending on a leaked byte (di). We then included a table of 256 entries, each with tile data and another COM marker. In each entry, we included tile data encoding a different pixel color. Each of these entries was exactly 256 bytes long, meaning depending on the value of the leaked byte, a different entry would be "chosen". An example of this encoding is shown below:
Using this mechanism, we managed to leak 20KB of data in a single image. The following screenshot shows the output image of an image converted using Kakadu:
Further Analysis
Please download the attached Proof-of-Concept file: https://github.com/google/security-research/blob/master/pocs/kakadu/passwd.jp2.zip
Then, take a JPEG 2000 decoder of your choosing that supports fragmentation and decode the image so that it can be viewed, e.g. by converting it to a JPG or PNG. This file reads the contents of /etc/passwd. You can also verify this by observing the process and which files are opened via strace, e.g.:
strace -f PROGRAM /tmp/passwd.jp2 2>&1 | grep '/etc/passwd'
In theory, any image decoder that implements this part of the standard is vulnerable, but if you are running into issues reproducing this, please note that we used kdu_expand from Kakadu to convert this image to a bmp file.
Mitigation
JPX readers should, by default, prevent accessing URLs found in a Fragment List box that are external to the JPX file.
Timeline
Date reported: 10/24/2023
Date fixed: 11/13/2023
Date disclosed: 12/15/2023