Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to preprocess Criteo Kaggle Display Advertising Challenge Dataset #374

Open
JerryQGui opened this issue Jan 21, 2024 · 2 comments
Open

Comments

@JerryQGui
Copy link

JerryQGui commented Jan 21, 2024

I have downloaded and unzipped the 4GB dataset. It consists of 3 files, readme.txt, train.txt, and test.txt. It is stored in a folder called dataset, which is a sibling folder to my cloned dlrm folder.

I believe this is the command needed to preprocess, as implied in the README

python dlrm_s_pytorch.py --raw-data-file=../dataset/train.txt

however, the output of this is

world size: 1, current rank: 0, local rank: 0
Using CPU...
time/loss/accuracy (if enabled):
Finished training it 1/1 of epoch 0, -1.00 ms/it, loss 0.083850

I have seen another issue, #274 where someone posted lines that should happen when preprocessing occurs.
Reading raw data=/my_raw_data_path/train.txt

Additionally, there is no .npz file(s) in my input directory.

Is it because there are some other required flags?

@chinmayjainnnn
Copy link

python dlrm_s_pytorch.py --data-generation=dataset --raw-data-file=
you should try this

@hrwleo
Copy link

hrwleo commented Sep 16, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants