-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compress cache files #113
Comments
Yes, we also share our cache. Compression seems like a good idea. Do you have an idea of decompression times? Because that's the cost of saving space on disk |
I haven't measured, but used rootusers.com/gzip-vs-bzip2-vs-xz-performance-comparison to select In any case, compared to the download rate from Scopus of 30-40 MB per hour, any delay due to (de)compression will be negligible. |
Okay, this sounds good and certainly makes sense. I am thinking about how to best implement the compression:
Depending on the answers, all previously cached files will be useless which I'd like to avoid. In any case, that's something for pybliometrics 3.0. |
I presume that in this case, some kind of inference &
Having used only the latter, I still guess that significant benefits are possible for each search class.
Yes, please :-) Different situations require different prioritisations of speed over storage or the other way round. (De)compression will most likely add some delay. The main question is probably: What should the default be? I vote for |
As a team of scientometricians, my colleagues and me are considering to share our
~/.scopus/scopus_search/
directories to avoid redownloading data and to parallelise multiple downloads for a single project.In order to speed up synchronisation and to avoid filling up our local drives too much,
gz
compression (or any other) of the md5-named cache files would be tremendously helpful.The text was updated successfully, but these errors were encountered: