How to fastly extract the dataset #20

Raion-Shin · 2024-09-02T12:49:10Z

I downloaded the .tar.gz file in https://huggingface.co/datasets/TIGER-Lab/M-BEIR, but it's really large and the pv command shows that I need 2.5 days to extract the file!
Can you provide smaller zip files that package each dataset into a zip file? Thanks very much!

The text was updated successfully, but these errors were encountered:

nrdyava · 2024-09-30T22:05:37Z

After downloading the .tar.gz files, use the following command to combine the files into a single file:
sh -c 'cat mbeir_images.tar.gz.part-00 mbeir_images.tar.gz.part-01 mbeir_images.tar.gz.part-02 mbeir_images.tar.gz.part-03 > mbeir_images.tar.gz'

Next extract images from the combined file:
tar -xzf mbeir_images.tar.gz

It will not take 2.5 days. I was able to complete the whole process in just 10 hrs

Raion-Shin · 2024-11-21T01:26:38Z

After downloading the .tar.gz files, use the following command to combine the files into a single file: sh -c 'cat mbeir_images.tar.gz.part-00 mbeir_images.tar.gz.part-01 mbeir_images.tar.gz.part-02 mbeir_images.tar.gz.part-03 > mbeir_images.tar.gz'

Next extract images from the combined file: tar -xzf mbeir_images.tar.gz

It will not take 2.5 days. I was able to complete the whole process in just 10 hrs

Thanks. But I'm extracting it with a 2-core CPU, so it takes a long time. It'll be better if you split it into many smaller zip files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to fastly extract the dataset #20

How to fastly extract the dataset #20

Raion-Shin commented Sep 2, 2024

nrdyava commented Sep 30, 2024

Raion-Shin commented Nov 21, 2024

How to fastly extract the dataset #20

How to fastly extract the dataset #20

Comments

Raion-Shin commented Sep 2, 2024

nrdyava commented Sep 30, 2024

Raion-Shin commented Nov 21, 2024