PGHash

On-device training of large networks via Locality-Sensitive Hashing (LSH) and Federated Learning (FL).

Datasets

Extreme classification datasets, overview, and results can be found here: http://manikvarma.org/downloads/XC/XMLRepository.html. Furthermore, to run our code and reproduce our experiments, Delicious-200K, Amazon-670K, and Wiki-325K must be downloaded. Some notes about this process:

You will want the dataset with BoW features
To parse out the training/test files you will also need pyxclib: https://github.com/kunaldahiya/pyxclib
Store the data in the data folder under the corresponding dataset name, naming files train.txt and test.txt

Code Dependencies

Our code was constructed and tested using the following Python packages:

tensorflow 2.10.0
numpy 1.23.4
mpi4py 3.1.4
pyxclib (which requires the following packages):
1. sklearn 0.0 (downloading scikit-learn works)
2. Cython

There are instructions at https://github.com/kunaldahiya/pyxclib for how to download pyxclib (this can be a slightly buggy process). Furthermore, we utilize MPI with Open MPI 4.1.4 (slightly older versions should work as well).

Running the Code

To run our code, first test for a single process:

mpirun -np 1 python run_pg.py --hash_type pghash --dataset Delicious200K --name test-run-single-worker

If this works, feel free to run the scripts that are present in the codebase!

Citation

@inproceedings{
    rabbani2023pghash,
    title={Large-Scale Distributed Learning via Private On-Device LSH},
    author={Tahseen Rabbani and Marco Bornstein and Furong Huang},
    booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
    year={2023},
    url={https://openreview.net/forum?id=dpdbbN7AKr},
}

Name		Name	Last commit message	Last commit date
Latest commit History 380 Commits
PyTorch		PyTorch
data		data
models		models
output		output
scripts		scripts
train		train
util		util
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pg_amz.sh		pg_amz.sh
pg_del.sh		pg_del.sh
pg_wiki.sh		pg_wiki.sh
plotter.py		plotter.py
run_pg.py		run_pg.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PGHash

Datasets

Code Dependencies

Running the Code

Citation

About

Releases

Packages

Contributors 2

Languages

License

rabbanitw/PGHash

Folders and files

Latest commit

History

Repository files navigation

PGHash

Datasets

Code Dependencies

Running the Code

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages