Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contribution: Validating ICC Profile #229

Open
awoods opened this issue May 24, 2024 · 8 comments
Open

Contribution: Validating ICC Profile #229

awoods opened this issue May 24, 2024 · 8 comments

Comments

@awoods
Copy link

awoods commented May 24, 2024

We have several thousand images that have invalid curv tags. We understand the issue in the embedded ICC profile data in our JP2 images and would like to update Jpylyzer to check for these specific ICC profile errors. Recognizing that embedded ICC profiles are not strictly a part of the JPEG 2000 specification, would you be open to including such an addition/contribution to Jpylyzer? If so, have you already given thought to the design of how you would prefer validation of such embedded profiles to be implemented?

Thanks!

@bitsgalore
Copy link
Member

bitsgalore commented May 27, 2024

Hi Andrew,

Thanks for reaching out about this. Even though ICC profiles aren't part of the JPEG 2000 standard, I think the ability to validate them could still be a useful addition to Jpylyzer (even as an option). However, from your description the extent of the ICC profile validation you're proposing is not entirely clear to me.

I just had a quick look at the latest ICC filespec, where I see:

  • 31 tag type definitions (of which the "curveType" one you mention) is only one;
  • 51 registered tag definitions.

Full ICC profile validation would cover all of these (or at minimum the required tags). At first glance implementing this from scratch looks like quite a substantial task. I'm also not sure if doing this directly in Jpylyzer would be the best approach. ICC profiles are widely used in other image formats as well, so for optimum reusability it might be better to address this in a dedicated ICC profile validation tool/library (which could then be imported by other software tools, including Jpylyzer).

However, you mention you'd like to "update Jpylyzer to check for these specific ICC profile errors". This suggests your contribution would only cover specific error(s) from your own images (the "curveType" tag type).

If this is the case, the "validation" would only cover one specific aspect of ICC profiles, which might not be very relevant to other Jpylyzer users. But it's not entirely clear to me if this is what you're proposing here.

Could you provide some more details on the scope and extent of your proposed contribution?

@bitsgalore
Copy link
Member

bitsgalore commented May 27, 2024

Possibly relevant in this context - the ImageCms Module, which is part of Pillow.

Code is based on LittleCMS. From the docs it seems it does some validation of ICC profiles, but cursory look doesn't bring up any details.

@awoods
Copy link
Author

awoods commented May 27, 2024

Thanks for the response, Johan, and for the pointers to potentially relevant libraries.

Regarding our scope and extend, we have over 50 million JP2 images that need to be processed for delivery. Across all of those images, we have encountered a wide range of JP2 errors. However, this invalid curv tags issue is the most prevalent. Although we would like to verify as much of the embedded ICC profile as possible, we intend to initially focus on this one specific error.

I agree that validating one specific aspect of the ICC profile may be of limited value to the broader community of Jpylyzer users. The question becomes, would it be useful as a starting point for more extensive ICC profile validation?

I also agree with your suggestion:

it might be better to address this in a dedicated ICC profile validation tool/library (which could then be imported by other software tools, including Jpylyzer

If it makes sense to you, we may start with an independent Python module for this ICC profile validation... and we can subsequently explore importing that code into Jpylyzer.

@bitsgalore
Copy link
Member

If it makes sense to you, we may start with an independent Python module for this ICC profile validation... and we can subsequently explore importing that code into Jpylyzer.

Yes, this makes perfect sense to me.

I agree that validating one specific aspect of the ICC profile may be of limited value to the broader community of Jpylyzer users. The question becomes, would it be useful as a starting point for more extensive ICC profile validation?

One idea that just occurred to me, is that you could start this independent Python module as a simple proof-of-concept, that initially only does this specific thing. Then I can have a look how to integrate this into Jpylyzer. This way we can make sure that the integration works from the get-go. Actual integration into the Jpylyzer code base would then still happen later once the module is more fleshed out, but at least this would reduce the risk of any surprises later.

@awoods
Copy link
Author

awoods commented May 27, 2024

Perfect. I will be in touch as development proceeds.

@kimpham54
Copy link

Hi @bitsgalore, in the past few months I worked with Andrew on a python module that addresses this issue to identify and optionally correct invalid curv tags. The project is here: https://github.com/harvard-lts/jp2_remediator. If it's useful, I would be happy to discuss and to work with you on potentially integrating this into jpylyzer. Thanks!

@kimpham54
Copy link

Hi @bitsgalore, I can summarize here what the module does and propose a few options for integration with jpylyzer:

  1. takes an input jp2 file, directory of jp2s, or AWS bucket of jp2s
  2. reads the bytes of the file(s)
  3. checks if the jp2 is valid using jpylyzer
  4. if valid, checks for 'colr' tag which indicates that an ICC profile exists
  5. if present, gets METH value
  6. then looks for TRC tags, rTRC/gTRC/bTRC
  7. gets tag signature (e.g. 'rTRC'), tag offset (where data related to rTRC starts, in our case curv tag data), size of tag data element (trc_tag_size in the module)
  8. gets tag data (curv data), curve type signature, reserved value, count value, actual curve values, this data makes up the curve field length (curv_trc_field_length in the module)
  9. if trc_tag_size == curv_trc_field_length, do nothing
  10. if trc_tag_size != curv_trc_field_length, change tag size to field length, trc_tag_size = curv_trc_field_length, create new jp2 file

As discussed in this thread, as is this module covers one specific aspect of validation, but could be explored further to be used for broader validation.

Ideas for integration:

  • keep as separate module, strip out methods that fix the issue and create new files, option to import this into jpylyzer library, maybe can be used with a flag for additional validation options
  • line could be added when imported in the report, for instance, in boxvalidator.py:
            <colourSpecificationBox>
                <methIsValid>True</methIsValid>
                <precIsValid>True</precIsValid>
                <approxIsValid>True</approxIsValid>
                <iccSizeIsValid>True</iccSizeIsValid>
                <iccPermittedProfileClass>True</iccPermittedProfileClass>
                <iccNoLUTBasedProfile>True</iccNoLUTBasedProfile>
            </colourSpecificationBox>

would add additional validation tests, such as 'countIsGamma' and 'curveValuesIsValid':

         <colourSpecificationBox>
                <methIsValid>True</methIsValid>
                <precIsValid>True</precIsValid>
                <approxIsValid>True</approxIsValid>
                <iccSizeIsValid>True</iccSizeIsValid>
                <iccPermittedProfileClass>True</iccPermittedProfileClass>
                <iccNoLUTBasedProfile>True</iccNoLUTBasedProfile>
                <iccrTRCcountValueIsValid>False</iccrTRCcountValueIsValid>
                <countIsGamma>True</countIsGamma>
                <curveValuesIsValid>False<curveValuesIsValid>
            </colourSpecificationBox>

@bitsgalore
Copy link
Member

bitsgalore commented Dec 16, 2024

Hi @kimpham54, thanks for the update and the additional explanation. After a first look at the code, I do have some questions.

The main thing that's not quite clear to me, is what the three validation tests presented here are based on exactly, and how they follow from ICC.1:2022. The code does contain 7 references to ICC.1:2022, but these only point to the byte positions of the respective fields. This is helpful, but it's not clear to me how the subsequent tests then follow from the filespec.

As an example, in case of the countIsGamma test, I see this checks if the "count" value of the "curveType" tag equals 1. This tells a reader application that "the curve value shall be interpreted as a gamma value". But ICC.1:2022 actually does allow other "count" values (see p. 48, bottom paragraph)! So for general ICC profile validation this test would be overly restrictive. This makes me wonder where exactly this restriction is coming from? What is the validation context? E.g, is this related to:

  1. JP2's "restricted ICC" method, which only allows a restricted subset of the full ICC features set (most importantly this rules out lookup-table based ICC profiles; no idea if this implies any restrictions on the "count" value).

  2. Something that is specific to the Harvard situation.

  3. Something else?

If 1., this isn't really an ICC validation test, but rather a check on a requirement that follows from the restrictions imposed by JP2's "restricted ICC" method (similar to the existing iccPermittedProfileClass and iccNoLUTBasedProfile tests). If 2., the test might not be relevant to other Jpylyzer users. But based on the information here I can't really tell.

In case of the two other tests (iccrTRCcountValueIsValid and curveValuesIsValid), I don't quite see how they follow from the filespec at all. This might just reflect my own unfamiliarity with the ICC spec, but seeing that the countIsGamma doesn't follow from the filespec either, this does make me wonder whether these are also specific to either JP's "restricted ICC" or the Harvard situation.

In an earlier post in this thread, @awoods suggested that the ICC validation test could be useful as a starting point for more extensive ICC profile validation. If this is the long-time objective, it's important that the scope of this validation is clear from the start:

  1. General ICC profile validation - this is independent of any restrictions of the image format in which a profile is embedded. Tests like countIsGamma would have no place here, or,
  2. conformance testing to JP2's "restricted ICC" subset. In this case tests like countIsGamma might make sense (assuming this test is indeed related to the JP2 restrictions, but I don't know if this is the case), but the resulting code would be less useful outside a JP2 context.

Let me know what your thoughts are, and how you'd like to proceed with this. If I somehow missed out on something obvious myself please let me know as well! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants