-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data Inconsistencies #21
Comments
Just as a follow up, the total number of indices included in the train-test-validation split files do not sum to the total number of datapoints. So there is definitely something wrong there. There are also exactly 40 indices that do not correspond to any data points in the dataset (are they missing?) |
Hi Nick, Thank you so much for your detailed feedback and for bringing these issues to our attention! This will definitely help us refine the dataset for better usability.
Again, We truly appreciate your contribution and the time you’ve taken to test the dataset. If you have any further feedback or suggestions, please don’t hesitate to share! |
Hi,
First off, great work! I’m excited to test some new models on this dataset. However, I’ve noticed several discrepancies in the data that I wanted to bring to your attention:
In the train-test-validation split file, some filenames are prefixed with "DrivAer_", while others are not. However, in the actual dataset, none of the files include this prefix. I assume the prefix should be removed from the index files for consistency.
Some files listed in the train-test-validation split files are missing from the dataset uploaded to Dataverse. For example, "E_S_WWC_WM_640" is included in the test index file but does not exist in the Pressure or Shear zip files uploaded to Dataverse. This issue appears to affect many files, not just this one.
There is significant inconsistency in how surface shear values are stored in the dataset. Some VTK files store shear as a cell_data field, others as a point_data field, and some contain both. It would be best to choose one format and standardize it, as inaccuracies are introduced on my end when converting between cell_data and point_data. Proper export from the CFD simulation would ensure accuracy.
Lastly, many of the Shear VTK files contain several NaN values, which seem to indicate issues with either the export process or the CFD simulation itself. For example, in "N_S_WW_WM_229," I found more than 2,500 cells with NaN values. Additionally, this file does not contain any point_data, as mentioned in the previous point.
Please let me know if you need further clarification. Unfortunately, until these issues are resolved, I am unable to use the shear data in my models.
The text was updated successfully, but these errors were encountered: