Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IMAGING-339] Basic WebP Support #254

Merged
merged 3 commits into from
Oct 9, 2023
Merged

[IMAGING-339] Basic WebP Support #254

merged 3 commits into from
Oct 9, 2023

Conversation

Glavo
Copy link
Contributor

@Glavo Glavo commented Dec 14, 2022

This PR is intended to provide basic support for the WebP format.

Task List:

  • Add WebP support for Imaging.guessFormat;
  • Read all of webp chunks;
  • Implementing ImageParser::getImageSize for webp files;
  • Implementing ImageParser::getMetadata for webp files;
  • Implementing ImageParser::getImageInfo for webp files.

Non-Goals:

  • Read image from webp file;
  • Write to webp files.

Although I hope to implement complete WebP support, it is very difficult for me at present.

Therefore, the main goal of this PR is to read metadata from webp files.

A more complete implementation may require me to have a deeper understanding of vp8 format, or let others complete it.

Copy link
Member

@kinow kinow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 looks promising @Glavo ! And pure Java, I thought there would be some dependency here, but much better if we can implement it in Imaging without bringing other jars. Feel free to ping me once it's ready and I can take a look and try it out. Thanks!

@codecov-commenter
Copy link

codecov-commenter commented Dec 14, 2022

Codecov Report

Attention: Patch coverage is 82.40741% with 57 lines in your changes missing coverage. Please review.

Project coverage is 71.18%. Comparing base (c021adf) to head (0d8d2a6).
Report is 405 commits behind head on master.

Files with missing lines Patch % Lines
.../commons/imaging/formats/webp/WebPImageParser.java 80.74% 12 Missing and 14 partials ⚠️
...mons/imaging/formats/webp/chunks/WebPChunkVp8.java 77.14% 4 Missing and 4 partials ⚠️
.../main/java/org/apache/commons/imaging/Imaging.java 64.70% 2 Missing and 4 partials ⚠️
...ons/imaging/formats/webp/chunks/WebPChunkVp8x.java 83.33% 2 Missing and 3 partials ⚠️
...ons/imaging/formats/webp/chunks/WebPChunkVp8l.java 83.33% 2 Missing and 2 partials ⚠️
...ons/imaging/formats/webp/chunks/WebPChunkAlph.java 0.00% 2 Missing ⚠️
...ons/imaging/formats/webp/chunks/WebPChunkIccp.java 0.00% 2 Missing ⚠️
...pache/commons/imaging/internal/SafeOperations.java 60.00% 1 Missing and 1 partial ⚠️
...ommons/imaging/formats/webp/WebPImageMetadata.java 87.50% 0 Missing and 1 partial ⚠️
...commons/imaging/formats/webp/chunks/WebPChunk.java 95.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master     #254      +/-   ##
============================================
+ Coverage     70.89%   71.18%   +0.29%     
- Complexity     3432     3521      +89     
============================================
  Files           334      351      +17     
  Lines         16968    17292     +324     
  Branches       2607     2668      +61     
============================================
+ Hits          12030    12310     +280     
- Misses         3919     3930      +11     
- Partials       1019     1052      +33     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Glavo Glavo marked this pull request as ready for review December 14, 2022 20:39
@Glavo Glavo changed the title [WIP] Basic WebP Support Basic WebP Support Dec 14, 2022
@Glavo
Copy link
Contributor Author

Glavo commented Dec 14, 2022

It seems that there are some strict checkstyle rules here. I'll try to resolve these warnings.

@Glavo
Copy link
Contributor Author

Glavo commented Dec 14, 2022

@kinow By adding exclusion rules, I no longer get warnings when running mvn -V --no-transfer-progress locally. Can you trigger the CI again?

Basic support for WebP has been completed. Now I can successfully read the basic information of the lossy/lossless compressed webp file.

I also implemented dumpFile for WebP files. Now it can print the contents of webp chunks in a similar way to the webpinfo tool.

It is difficult for me to implement the function of reading/writing image content. I try to read the source code of libwebp, but it is not easy for me. So I can't make more contributions to WebP support in the near future.

@kinow
Copy link
Member

kinow commented Dec 14, 2022

@kinow By adding exclusion rules, I no longer get warnings when running mvn -V --no-transfer-progress locally. Can you trigger the CI again?

Done!

Basic support for WebP has been completed. Now I can successfully read the basic information of the lossy/lossless compressed webp file.

I also implemented dumpFile for WebP files. Now it can print the contents of webp chunks in a similar way to the webpinfo tool.

Thakn you!

It is difficult for me to implement the function of reading/writing image content. I try to read the source code of libwebp, but it is not easy for me. So I can't make more contributions to WebP support in the near future.

I never read the format docs, so no idea how hard it would be. But one option would be to include it in our upcoming 1.0beta release (one beta before the final release) and document what it does and what it doesn't. It's better to be able to at least read metadata/information, then not being able to read WebP at all.

Thank you @Glavo !

@Glavo
Copy link
Contributor Author

Glavo commented Dec 14, 2022

@kinow Will the next version be 1.0 beta? Do you have any ideas about when to make AWT as optional dependency? I think it is better to solve this problem in the alpha phase, because this change may break the user's code.

@kinow
Copy link
Member

kinow commented Dec 14, 2022

@kinow Will the next version be 1.0 beta?

Yes. We could release an 1.0-alpha3 if the API is not stable enough. I was actually thinking in releasing 1.0 final, but there was some internal discussion about it, and it was decided to release at least one beta version after these alpha releases, to a) make sure we have a good public API (i.e. err on having less than what's needed, not more, so that we can maintain it for a while) and b) it has enough features for users.

Do you have any suggestion on alpha/beta/final?

Do you have any ideas about when to make AWT as optional dependency? I think it is better to solve this problem in the alpha phase, because this change may break the user's code.

I think it is more a matter of deciding the how first. If it is not too complicated, we can include it in the next release.

I can't recall what classes we used from AWT (Dimension maybe?).

We had a user that contributed with a lot of suggestions and by testing pull requests, and his application was being deployed on Android. I think he was the first one to request it, from what I recall, as for the time being he had to do some patching on his code in order to use Imaging.

If you have any suggestions, feel free to let us know, please 🙂

BTW, it would be great if you could create a JIRA issue for this issue too, as we need one to track changes in our changes.xml (that's updated by committers, no need to worry about that).

Thanks heaps!

EDIT: BufferedImage too from awt, d'oh

Copy link
Member

@kinow kinow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Glavo , I did a quick review, and most files look great (I even marked some as "Reviewed" on GitHub).

You have already seen that we have some strict checkstyle/spotbugs rules. These are more or less similar across all ASF Commons projects, as well as the idea to include only what's necessary in the public API, add javadocs to all public code, and to internal code if possible too, etc.

I left some comments about that and about the text in the exceptions, and the exceptions types. But overall it looks great!

Thanks a lot!
-Bruno

@Glavo
Copy link
Contributor Author

Glavo commented Dec 15, 2022

Do you have any suggestion on alpha/beta/final?

In my understanding, the alpha phase allows large API changes, but after entering the beta phase, the API tends to be stable, and there can be some minor damage, but usually it should not change as much as the alpha phase.

I can't recall what classes we used from AWT (Dimension maybe?).

Yes, if we provide an alternative to Dimension, we can use commons-imaging in more environments without AWT.

However, even if this problem is solved, I think the commons-imaging API still relies too much on AWT.

I don't think so many BufferedImage based APIs should be provided, nor should the implementation of each format be based on AWT/Swing.

Maybe we can provide a general abstraction to hide these details. The implementation of the format should only use a common interface, and behind the interface may be AWT/Swing, JavaFX, Android, or even other file formats writter.

This is my rough idea. Do you have any opinion on it?

@kinow
Copy link
Member

kinow commented Dec 15, 2022

In my understanding, the alpha phase allows large API changes, but after entering the beta phase, the API tends to be stable, and there can be some minor damage, but usually it should not change as much as the alpha phase.

Correct. That's why it's important to decide on whether it will be an alpha3 or beta1. I didn't have plans to add anything to the public API before 1.0, but I think WebP support could go in any non-final release as long as we take care with what is added to the public API, and add tests/docs.

However, even if this problem is solved, I think the commons-imaging API still relies too much on AWT.

Agreed.

I don't think so many BufferedImage based APIs should be provided, nor should the implementation of each format be based on AWT/Swing.

Maybe we can provide a general abstraction to hide these details. The implementation of the format should only use a common interface, and behind the interface may be AWT/Swing, JavaFX, Android, or even other file formats writter.

This is my rough idea. Do you have any opinion on it?

Hmmm, this sounds interesting!

I just remembered that when I went looking for ways to support Android, I found someone that had a project with the library-something.jar, with the interfaces, and library-something-awt.jar and library-something-android.jar. But I found that solution harder to implement & maintain.

But let's give it another try before the 1.0. I don't mind postponing this release in order to have more alphas/betas. This is the issue for Android, we can rename/use it for AWT: https://issues.apache.org/jira/browse/IMAGING-208 (if you do not have an account, let me know and I'll prepare the sign up process for you, ASF blocked new sign ups due to Spam).

Thanks!!!

-Bruno

@Glavo
Copy link
Contributor Author

Glavo commented Dec 19, 2022

@kinow Can you open this issue for me?

I'm sorry to reply you so late, because I'm not feeling well recently (probably infected with COVID-19), and I need a rest for a while.

Besides, I'm not good at English. You can better describe this problem than let me use Google Translate myself.

@kinow
Copy link
Member

kinow commented Dec 19, 2022

@kinow Can you open this issue for me?

I'm sorry to reply you so late, because I'm not feeling well recently (probably infected with COVID-19), and I need a rest for a while.

Besides, I'm not good at English. You can better describe this problem than let me use Google Translate myself.

Hi @Glavo , your English is excellent for me (I am not a native speaker either) , but don't worry I left this notification unread on my GitHub list of notifications, and I will update it later with a JIRA issue.

Get some rest, and I hope you feel better soon.

Thanks
Bruno

Copy link
Member

@kinow kinow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JIRA created: https://issues.apache.org/jira/browse/IMAGING-339

Haven't tested nor read the spec yet, but first wanted to have a slow read of the code to leave any comments. Once these are addressed I will test and try to verify correctness. But everything looks great! Should be included in the next alpha or beta release.

Thanks!
-Bruno

@kinow kinow changed the title Basic WebP Support [IMAGING-339] Basic WebP Support Dec 25, 2022
@Glavo
Copy link
Contributor Author

Glavo commented Jan 12, 2023

@kinow I went back to this work and completed the missing documents.

Can you take a look at my question here at https://github.com/apache/commons-imaging/pull/254/files/08d6a0dff607eefa633c89d231a01623e3ad90c6#r1049312581?
I feel that the usage of ImageReadException and IOException is often confused, so I don't know which one to choose.

@Glavo
Copy link
Contributor Author

Glavo commented Jan 16, 2023

I updated this PR, added the @since tag and more detailed javadoc.

@Glavo Glavo requested a review from kinow August 25, 2023 07:40
@kinow
Copy link
Member

kinow commented Aug 25, 2023

Sorry I am overseas visiting family until the end of September. Not sure if I will be able to review it until then.

@spencerhakim
Copy link

Considering the libwebp CVE the industry is currently dealing with, would love to see this PR's conflicts resolved and this merged in sooner than later so I can move away from using a library that relies on JNI. 🙏

@kinow
Copy link
Member

kinow commented Sep 27, 2023

Considering the libwebp CVE the industry is currently dealing with, would love to see this PR's conflicts resolved and this merged in sooner than later so I can move away from using a library that relies on JNI. 🙏

@Glavo can you rebase and fix conflicts, please? I returned from my overseas trip last Friday. Just going through the backlog ($work/emails/etc.), but I should be able to review this pretty soon. Or if you are busy (and you trust us) just say so and we can try to fix the conflicts and review it. Thanks!

@Glavo
Copy link
Contributor Author

Glavo commented Sep 27, 2023

I've rebased and resolved conflicts.

I noticed that the project had changed a lot in my time away, so I simply updated this PR so that it would compile and pass tests. If there is anything else to update, it will take me some time to get back to this work.

@kinow
Copy link
Member

kinow commented Sep 27, 2023

Thank you @Glavo ! Hopefully we can solve simple issues and we won't need any major refactoring for the review 🤞 Coming back to this one soon... (allowed GH Actions to run)

Copy link
Member

@kinow kinow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a quick look at the code, looks in excellent state. Left some comments, but I think I can fix most, if not all, during the review/merge.

Remains to review the parser code, and go through the specification & docs confirming the code is working as expected. Maybe test with exiftool (or similar tool), and check the tests coverage.

Thanks @Glavo !

@garydgregory
Copy link
Member

We don't need to use since tags IMO since we don't have a 1.0 yet.

@kinow
Copy link
Member

kinow commented Sep 27, 2023

We don't need to use since tags IMO since we don't have a 1.0 yet.

I think there are other since tags (can't recall if only for 1.0-alphaX or for 0.98 (sanselan) as well). But I cannot recall if these were used in tests.

Copy link
Member

@kinow kinow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebasing onto master locally today to start reviewing it; hoping I can finish it this weekend.

I'm squashing all 23 commits into a single one, @Glavo. Here's the commits message I'm using (it will be attributed to your user; I used the text from your pull request, plus the prefix with the JIRA issue):

image

If I manage to review and it is ready to merge, I will proceed to push-force the squashed commits with this message 👍

This adds:

- WebP support for Imaging.guessFormat
- Read all of webp chunks
- Implement ImageParser::getImageSize for WebP files
- Implement ImageParser::getMetadata for WebP files
- Implement ImageParser::getImageINfo for WebP files

Note that this does not add support to read image from WebP files,
nor support to write to WebP files. It focuses on reading metadata
from WebP files.
@Glavo
Copy link
Contributor Author

Glavo commented Oct 1, 2023

I probably just named them arbitrarily, so feel free to rename them if necessary.

@kinow
Copy link
Member

kinow commented Oct 1, 2023

I probably just named them arbitrarily, so feel free to rename them if necessary.

Thanks @Glavo !

Copy link
Member

@kinow kinow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a new commit (thanks for marking it as OK for maintainers to update your branch, @Glavo ). Here's the summary of changes:

  • Javadocs added to every public method;
  • Renamed hasAlpha and similar boolean attributes to alpha, then the method became hasAlpha() (reverted following @Glavo 's feedback - thanks!);
  • VP8, VP8L, and VP8X seem to disagree on height, imageHeight, and canvasHeight. I thought they were all the same, but the developers docs use these terms, so I think the current values are the best choice;
  • Replaced some arithmetic operations by the JVM *exact methods, to avoid overflows and possible issues (especially security issues, fuzzing issues, etc.) - see SafeOperations class with a helper static method, in the internal package;
  • WebP chunk implementations were throwing ImagingException, but the parent WebPChunk constructor was throwing IllegalArgumentException, but since the value is read from the image, I changed that to throw ImagingException too (the constructor actually declared @throws ImagingException but never raised it);
  • Renamed classes like WebPChunkALPH to WebPChunkAlph (in other formats we have BlablaExif, BlablaIcc, etc.);
  • Checked all headers and comments were OK.

Also resolved all pending comments, but I think we have one or two new conversations started just now 🙂 we should be able to solve them all pretty easily.

The final step will be to review the tests. I reviewed warnings raised by IntelliJ, and the last one I have now is about public methods never used. This probably indicates these public methods are not being called in unit tests too. Can't tell if important or not until I run the tests with coverage. Will do that over this week (maybe tomorrow).

Gotta go now, thanks @Glavo !

-Bruno

Copy link
Member

@kinow kinow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current webp package coverage, with JaCoCo,

image

image

It doesn't seem too hard to increase that coverage, so we can add a few more tests before merging it, not necessarily 100%, but we can definitely get it a little higher.

Other than that had a look at the docs and everything looks OK. So the pending now is:

  • add a few more simple tests
  • write a quick java program to scan a directory and try to parse webp files (I have a bunch of life drawing reference that were saved as webp files, from multiple web sites and social media)
  • write in the changes.xml about the new format 🎉

@garydgregory
Copy link
Member

I hope we can get coverage above 90%.

@kinow
Copy link
Member

kinow commented Oct 4, 2023

Well, this is as much as I could get today.

image

image

Most of what's missing now are in the chunks. A bit tricky to reach that part, as apparently nothing is ever calling some methods, like vertical and horizontal scaling. The chunk reader is also private, so not sure if we should ignore the methods not tested, or if there was another idea on how to access those metadata fields, @Glavo ?

Copy link
Member

@kinow kinow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Glavo ,

I think I am done with the review and review and the changes. A couple of comments:

  • VP8X doesn't have a version number, and I couldn't find in the developer docs anything saying that it should have, but other chunks have it... I guess that chunk doesn't need it?
  • ALPH chunk was never used. I added it to the enum of chunk types... or was it intentional to not add that one?

No blockers, I think. And here's the latest coverage report:

image

image

So 89% (was 68%) line coverage, 80% (was 51%) branch coverage. Most of the lines not covered are error handling. And one chunk is completely not tested, the ALPH.

I think this is good to go. If @Glavo doesn't find anything wrong, then it should be ready to be merged. @garydgregory would you review it, since I added some code too, it'd be best if another person could have a look at it - if you have time.

Will add another commit with the change log.

Bruno

@Glavo
Copy link
Contributor Author

Glavo commented Oct 8, 2023

I re-checked this PR and it looks good.

@Glavo
Copy link
Contributor Author

Glavo commented Oct 8, 2023

VP8X doesn't have a version number, and I couldn't find in the developer docs anything saying that it should have, but other chunks have it... I guess that chunk doesn't need it?

I implemented it in full compliance with the developer documentation. I'm not sure why there's no version number, but I guess it's not needed.

ALPH chunk was never used. I added it to the enum of chunk types... or was it intentional to not add that one?

I don't remember the reason for this, so it should just be an oversight on my part. Thanks for the fix.

@kinow kinow requested a review from garydgregory October 8, 2023 13:01
@kinow
Copy link
Member

kinow commented Oct 8, 2023

Thank you @Glavo ! Let's wait a while to see if Gary has time to have a quick look just to see if I missed anything important. Then it should be ready to be merged. Thanks!!!

@garydgregory garydgregory merged commit 50b6745 into apache:master Oct 9, 2023
6 checks passed
@Glavo Glavo deleted the webp branch October 9, 2023 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants