Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProjectInventory question #524

Open
bhbraswell opened this issue Feb 24, 2020 · 1 comment
Open

ProjectInventory question #524

bhbraswell opened this issue Feb 24, 2020 · 1 comment

Comments

@bhbraswell
Copy link

If I have a gips_export output that looks like this

% tree /data/modis_test 
modis_test
├── 0
│   ├── 2019100_MCD_ndvi.tif
│   └── 2019101_MCD_ndvi.tif
└── 1
    ├── 2019100_MCD_ndvi.tif
    └── 2019101_MCD_ndvi.tif

and if I say inv = ProjectInventory('/data/modis_test', None)
then inv will only have the inventory of the second feature in the export.

(Pdb) inv[inv.dates[0]].filenames
{('MCD', 'ndvi'): '/data/modis_test/1/2019100_MCD_ndvi.tif'}

But I'm wondering if it should raise an error instead.

I'm asking this because I'm trying to find a simple way to extend gips_stats to create a single summary file for all the features in the export. Catching an error and looping over subdirectories would make that easier. Usual disclaimer: unless I'm missing something, which I often do.

@ircwaves
Copy link
Collaborator

There's definitely a bug here. As it is, the output of ProjectInventory('/data/modis_test', None) is non-deterministic, and I don't believe it was intended to be constructed as such.

The gips.data.core.Data.discover code does use os.walk to find files, but there is no code that does anything with the sub-directories. So, I would vote for the behaviour to be using os.listdir and returning an empty list for /data/modis_test. This isn't as good as getting an error message, but I think it is as good as we can do because it could just be that /outdir/feat1 is an empty export, which we would want to handle gracefully.

I've thought before that there should be a ProjectTree class which handles the iteration over inventory directory trees, and can handle app; or (2) lication & aggregation of an algorithm (i.e. gips_stats). In the end, I've always resorted to using Pool or GNU parallel to apply-and-then-aggregate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants