Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for Alternative Download Methods for the ISIA Food-500 dataset #1

Open
jyp-studio opened this issue Aug 21, 2024 · 3 comments

Comments

@jyp-studio
Copy link

I am currently in the process of downloading the necessary external open-source datasets required for your database. However, I've encountered a significant issue with the download speed from the dataset provider's server when downloading the ISIA Food-500 dataset. Specifically, the download speed is extremely slow, and it's estimated to take around 4 days to download a single file. Since there are 10 files in total, the entire process would take approximately 40 days, which is excessively long.

Given that you've already downloaded this dataset during the creation of your database and that they are all open-source, I would greatly appreciate it if you could provide an alternative method for accessing these files. For example, uploading the compressed datasets to Google Drive or another faster and more reliable hosting service would be highly beneficial.

Thank you for considering this request. Your assistance in this matter would be invaluable and would greatly expedite my work with your database.

@michaeledeprospo
Copy link

An alternative download for ISIA Food-500 would be greatly appreciated if possible!

@jyp-studio
Copy link
Author

Thank you for your response. I have already downloaded the dataset, but I need to run database_generation.py to verify that it was downloaded correctly, so I haven't closed the issue yet. You might still want to prepare the dataset in case there are any missing files in what I downloaded.

Currently, I am encountering three main issues:

  1. I placed all the datasets in the src folder, at the same level as database_generation.py, and unzipped all of them. However, when running database_generation.py, an error occurs indicating that the datasets cannot be found. This might be due to the code checking for the datasets using len(os.listdir(path)). To address this, I manually changed the initial paths of the datasets from None to the corresponding paths. I'm not sure if this is the correct approach.

  2. When executing the function download_file(path) with the URL http://atvs.ii.uam.es/atvs/AI4Food-NutritionDB/AI4Food-NutritionDB.txt, I encountered the following error:

    requests.exceptions.ConnectionError: HTTPConnectionPool(host='atvs.ii.uam.es', port=80): 
    Max retries exceeded with url: /atvs/AI4Food-NutritionDB/AI4Food-NutritionDB.txt 
    (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f721edeabe0>: 
    Failed to establish a new connection: [Errno 110] Connection timed out'))
    

    It seems I cannot connect to your server, as I also tested by directly Googling atvs.ii.uam.es/atvs/ and couldn't connect either. Could you please check this issue on your end?

  3. Additionally, in database_generation.py, there is an issue on line 195 where correspondence_file is not defined beforehand, resulting in a "reference before assignment" error.

Thank you for your attention to these issues.

@zs1314
Copy link

zs1314 commented Sep 17, 2024

@jyp-studio Hello! The official link seems to be dead, I wonder where you got it from? Or can you help me with the dataset! Paid! Thank!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants