Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nydusify: introduce chunkdict generate subcommand #1401

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

newthifans
Copy link
Contributor

Brief introduction

Add functionlity to generate chunk dictionary by database file and algorithm "exponential_smoothing"
Implement the command nydus-image chunkdict generate --database command

Basic Usage

nydus-image chunkdict generate --database ./metadata.db

Details

  1. Write command “nydus-image chunkdict generate” entries
  2. Call the database interface and get the chunk information in the database
  3. Implement the exponential smoothing algorithm and obtain chunk dictionary
  4. To be continued, dump chunk dictionary and save, and then compact chunk

@newthifans newthifans requested a review from a team as a code owner August 15, 2023 06:31
@newthifans newthifans requested review from liubogithub, jiangliu and luodw and removed request for a team August 15, 2023 06:31
Copy link
Member

@adamqqqplay adamqqqplay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the failing unit tests, thanks!

@newthifans newthifans changed the title nydus-image: introduce chunkdict generate subcommand [WIP] nydus-image: introduce chunkdict generate subcommand Sep 8, 2023
@codecov
Copy link

codecov bot commented Sep 19, 2023

Codecov Report

Attention: Patch coverage is 54.43466% with 673 lines in your changes are missing coverage. Please review.

Project coverage is 61.24%. Comparing base (a0ec880) to head (6093309).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1401      +/-   ##
==========================================
- Coverage   61.34%   61.24%   -0.11%     
==========================================
  Files         144      145       +1     
  Lines       47001    48383    +1382     
  Branches    44537    45906    +1369     
==========================================
+ Hits        28835    29630     +795     
- Misses      16685    17194     +509     
- Partials     1481     1559      +78     
Files Coverage Δ
builder/src/lib.rs 64.66% <ø> (ø)
rafs/src/metadata/cached_v5.rs 80.79% <100.00%> (+0.02%) ⬆️
rafs/src/metadata/layout/v6.rs 85.23% <100.00%> (+0.04%) ⬆️
storage/src/cache/filecache/mod.rs 67.25% <50.00%> (-0.25%) ⬇️
storage/src/meta/mod.rs 72.51% <89.47%> (+0.12%) ⬆️
rafs/src/metadata/layout/v5.rs 85.03% <25.00%> (-0.21%) ⬇️
storage/src/device.rs 68.98% <62.50%> (-0.06%) ⬇️
utils/src/digest.rs 88.32% <0.00%> (-3.74%) ⬇️
contrib/nydusify/cmd/nydusify.go 14.79% <0.00%> (-0.31%) ⬇️
builder/src/core/context.rs 65.94% <25.92%> (-1.09%) ⬇️
... and 3 more

... and 2 files with indirect coverage changes

@newthifans newthifans changed the title [WIP] nydus-image: introduce chunkdict generate subcommand nydus-image: introduce chunkdict generate subcommand Sep 22, 2023
@newthifans newthifans force-pushed the yuanzhao08 branch 4 times, most recently from 9d17ca3 to b04d80e Compare September 26, 2023 09:43
Algorithms Introduction.md Outdated Show resolved Hide resolved
src/bin/nydus-image/main.rs Outdated Show resolved Hide resolved
src/bin/nydus-image/main.rs Outdated Show resolved Hide resolved
src/bin/nydus-image/main.rs Outdated Show resolved Hide resolved
src/bin/nydus-image/main.rs Outdated Show resolved Hide resolved
Algorithms Introduction.md Outdated Show resolved Hide resolved
@newthifans newthifans force-pushed the yuanzhao08 branch 6 times, most recently from e4b7288 to c7654ba Compare October 26, 2023 18:20
src/bin/nydus-image/main.rs Show resolved Hide resolved
src/bin/nydus-image/deduplicate.rs Outdated Show resolved Hide resolved
inode.set_blocks(256);
let node_info = NodeInfo {
explicit_uidgid: true,
src_dev: 66305,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value is meaningless, maybe set it to zero.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

let node_info = NodeInfo {
explicit_uidgid: true,
src_dev: 66305,
src_ino: 24772610,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

let node_info = NodeInfo {
explicit_uidgid: true,
src_dev: 66305,
src_ino: 24775126,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

) -> Result<()> {
for chunk_info in chunkdict.iter() {
let chunk_size: u32 = chunk_info.chunk_compressed_size;
let file_offset = 1 as u64 * chunk_size as u64;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this value valid?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corr.

// update blob context
let (blob_index, blob_ctx) =
blob_mgr.get_or_cerate_blob_for_chunkdict(ctx, &chunk_info.chunk_blob_id)?;
if blob_ctx.blob_id.is_empty() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A chunk must belong to a blob, when will the blob_id be empty?

@@ -0,0 +1,257 @@
// Copyright (C) 2022 Nydus Developers. All rights reserved.
Copy link
Collaborator

@imeoer imeoer Dec 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file should be removed?

.num_args(1..),
)
.arg(
Arg::new("digester")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need this option?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted.

)
.arg(
Arg::new("features")
.long("features")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need this option?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted.

let path_name: Vec<&str> = path.split('/').collect();

// Extract the image name and version name from the bootstrap directory
let bootstrap_dir = match path_name.get(path_name.len() - 2) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let full_image_name: Vec<&str> = bootstrap_dir.split(':').collect();
let image_name = match full_image_name.get(full_image_name.len() - 2) {
Some(&second_last) => second_last.to_string(),
None => bail!("Invalid image name"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also print the image name in the error message.

Some(&second_last) => second_last.to_string(),
None => bail!("Invalid image name"),
};
let version_name = match full_image_name.last() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It better be let image_tag.

.set_blob_accessible(matches.get_one::<String>("config").is_some());
build_ctx.configuration = config;
build_ctx.blob_storage = Some(chunkdict_bootstrap_path);
build_ctx.blob_features = BlobFeatures::CAP_TAR_TOC;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blob_features should be from the base bootstrap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

// Build_ctx.blob_features.insert(BlobFeatures::ENCRYPTED);
build_ctx.features = features;

let digester = matches
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

digester should also be from the base bootstrap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

docs/chunk-deduplication.md Outdated Show resolved Hide resolved
@imeoer imeoer changed the title nydus-image: introduce chunkdict generate subcommand nydusify: introduce chunkdict generate subcommand Dec 25, 2023



| image_name | version number | total_image_size(OCI) | total_image_size(nydus) | total_image_size (nydus after dedulicating) | chunkdict_image_size | dedulicating rate |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be number of versions?


| image_name | version number | total_image_size(OCI) | total_image_size(nydus) | total_image_size (nydus after dedulicating) | chunkdict_image_size | dedulicating rate |
|------------|----------------|-----------------------|-------------------------|---------------------------------------------|----------------------|-------------------|
| redis | 10 | 341.78 | 419.37 | 319.48 | 41.87 | 23.82% |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All image size should be MB unit.

| image_name | version number | total_image_size(OCI) | total_image_size(nydus) | total_image_size (nydus after dedulicating) | chunkdict_image_size | dedulicating rate |
|------------|----------------|-----------------------|-------------------------|---------------------------------------------|----------------------|-------------------|
| redis | 10 | 341.78 | 419.37 | 319.48 | 41.87 | 23.82% |
| ubuntu | 10 | 290.26 | 308.59 | 140.28 | 30.8 | 54.54% |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's split the table into two, one for "image info" and one for "deduplication result"


// jsonData := `{
// "endpoint": "oss-cn-zhangjiakou.aliyuncs.com",
// "access_key_id": "LTAI5tKHuSQQXVjSE7PgKYhf",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AK is leaked.

ctx.Binary.Nydusify, logLevel, image2, target2v6, ctx.Binary.Builder, ctx.Env.TempDir,
)
tool.RunWithoutOutput(i.T, convertCmd2)
convertCmd3 := fmt.Sprintf(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should write a helper func for image conversion, instead of writing three times.

// "bucket_name": "testcompact1"
// }`

// formattedData, err := json.MarshalIndent(json.RawMessage(jsonData), "", " ")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the unused codes.

)
tool.RunWithoutOutput(i.T, checkCmd)

// Generate v5 chunkdcit
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should write a helper func for each of the v5 and v6.

)
tool.RunWithoutOutput(i.T, checkCmd1)

// Test v5 chunkdict convert
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the codes is unformatted.



### Exponential smoothing algorithm test table
Step 1: Download 10 OCI versions of an image and count the total size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the list format for steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants