Unable to preprocess Criteo Kaggle Display Advertising Challenge Dataset #374

JerryQGui · 2024-01-21T20:24:39Z

I have downloaded and unzipped the 4GB dataset. It consists of 3 files, readme.txt, train.txt, and test.txt. It is stored in a folder called dataset, which is a sibling folder to my cloned dlrm folder.

I believe this is the command needed to preprocess, as implied in the README

python dlrm_s_pytorch.py --raw-data-file=../dataset/train.txt

however, the output of this is

world size: 1, current rank: 0, local rank: 0
Using CPU...
time/loss/accuracy (if enabled):
Finished training it 1/1 of epoch 0, -1.00 ms/it, loss 0.083850

I have seen another issue, #274 where someone posted lines that should happen when preprocessing occurs.
Reading raw data=/my_raw_data_path/train.txt

Additionally, there is no .npz file(s) in my input directory.

Is it because there are some other required flags?

chinmayjainnnn · 2024-09-16T04:54:11Z

python dlrm_s_pytorch.py --data-generation=dataset --raw-data-file=
you should try this

hrwleo · 2024-09-16T04:54:51Z

已收到邮件阿里阿豆故咋一马斯！如未及时回复，请致电15868848097 QQ：812737452

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to preprocess Criteo Kaggle Display Advertising Challenge Dataset #374

Unable to preprocess Criteo Kaggle Display Advertising Challenge Dataset #374

JerryQGui commented Jan 21, 2024 •

edited

Loading

chinmayjainnnn commented Sep 16, 2024

hrwleo commented Sep 16, 2024 via email

Unable to preprocess Criteo Kaggle Display Advertising Challenge Dataset #374

Unable to preprocess Criteo Kaggle Display Advertising Challenge Dataset #374

Comments

JerryQGui commented Jan 21, 2024 • edited Loading

chinmayjainnnn commented Sep 16, 2024

hrwleo commented Sep 16, 2024 via email

JerryQGui commented Jan 21, 2024 •

edited

Loading