Skip to content

Latest commit

 

History

History
135 lines (103 loc) · 6.45 KB

File metadata and controls

135 lines (103 loc) · 6.45 KB


logo
Image Compression: A PCA-based approach enhanced by parallel processing

MIT cpp python

🚩 Table of Contents

Introduction

In this project we implement the standard Covariance method for the PCA procedure in modern C++17, applying it to both grayscale and RGB image compression. The computational backbone of this implementation harnesses the power of OpenBlas and Lapacke in order to perform high-performance matrix operations and linear algebraic computations. Additionally, to further enhance performance, we utilize parallel processing through multithreading with OpenMP, alongside cache memory alignment techniques and vectorization (SIMD) via CPU intrinsics.

Key Features

  • Current version supports only loading images stored as binary gzipped files.
  • Converting and storing compressed images as binary, PNG, JPEG or gzip.
  • Fully functional PCA algorithm for grayscale and RGB images.
  • The primary procedure focuses on computing the eigenvalues of the covariance matrix, as detailed here.
  • Implemented flattened storage of input image to align with cache line.
  • Parallel multithreaded version of fit and transform of input image (Z-normalization).
  • Parallel multithreaded and vectorized computation of the covariance matrix.
  • Lapacke computation of eigenvectors with dsyev().
  • Parallel projection via matrix multiplication (utilizing CBlas dgemm() for grayscale images). For the RGB case we parallelize the matrix multiplication using OpenMP.
  • Implemented parallelized inverse PCA procedure (decompression), enabling efficient reconstruction of original image data.

Note

dsyev() operates sequentially, which may lead to significant overhead, epsecially when processing larger images.

TODOs

  • Explore matrix-free methods for an alternative PCA procedure, eliminating the need for full eigenvector calculation and the construction and storing of the covariance matrix.
  • Enable the loading of images from additional formats.

Prerequisites

  • Docker
  • Python: 3.10.*
  • A list of Python libraries available in the requirements file here

Installation

We provide a custom Dockerfile that manages all required dependencies, including GCC, OpenBlas and Lapacke.

git clone https://github.com/DmMeta/ParallelPCA-ImgCompressor
cd ParallelPCA-ImgCompressor
docker build -t pca:v0.4 .

Note

Naming the image pca and tagging it as v0.4 ensures consistency with the provided running scripts.

Usage

By default, we provide a run script that compiles and executes the main driver program inside a container. You can alter the behavior of the driver program passing two environment variables DEBUG and SIMD. To run the script with debug mode and SIMD optmizations disabled:

DEBUG=0 SIMD=0 ./run.sh

The execution above triggers the image compression of the lena_hd image, storing two images in a results folder, one before and one after the procedure takes place. With DEBUG=1 one should expect the runtimes of each part of the computation and the intermediate results as well as a series of images.

The following example illustrates the main functionalities of the current project:

#include "pca.hpp"

constexpr const uint16_t N_COMPONENTS = 20;

ImgMatrix<uint8_t> img {"path/to/image.bin.gz", height, width, channels, Order::ROW_MAJOR};
PCA<uint8_t> pca;
auto compressedImg = pca.performPCA(img, N_COMPONENTS);
auto decompressedImg = pca.inversePCA(compressedImg);

// save the decompressed image
decompressedImg.saveImg("path/to/decompressedImg.png", ImageFormat::PNG);

Demo

Below, a short demo video is provided to illustrate the functionality and usage of the project. The process starts by loading the input image from lena_hd.bin.gz (822x1200), which is then saved as a PNG file. The image undergoes compression using our parallel PCA implementation, retaining 80 principal components. Afterward, the image is reconstructed via inverse PCA. Finally, both the original and decompressed images are displayed side by side for a quick visual comparison.

Tests

Some initial tests have been implemented using the Pytest library; however, the current test coverage remains suboptimal. We aim to enhance our testing framework in the future to ensure better coverage and reliability of the code.

Install the required dependencies:

pip3 install -r requirements.txt
cd scripts
# ensure execute permissions are granted for run_tests.sh
chmod +x run_tests.sh
./run_tests.sh

Contact

License

Distributed under the MIT License. See LICENSE.md for more details.