-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BIEN_trait_mean performance #16
Comments
Okay playing with this further I was able to determine that: I think it will be common for people to want to pull every trait and to call the function as I did above. Right now you have to write a for-loop to do it one at a time (which works great and is pretty fast). However, it might be better to prevent putting multiple traits into BIEN_trait_mean, or make sure it supports vectorized trait lists. -Finally, Querying DBH also tends to be extremely slow as you suggested and I think you're right about this crashing the console. In particular, calculating mean DBH at the Family level could be drawing many thousands of records without being very informative. Additionally, the trait 'whole plant height' seems to behave the same way. The R process gets 'Killed' probably because the SQL query returns way too much stuff. Maybe DBH data should only be available through the stem.R module? These are traits that take > 15 minutes to query data then eventually just crash the console, so maybe higher density measurements need some special treatment. The other traits return values in less than 30 seconds. |
I'm trying to pull as many trait means as possible for the following list of species:
names.txt
Using vectorized versions of BIEN_trait_mean(vector_of_species_names, vector_of_traits) usually crashes my R console. I'm not sure if its on the backend, but returning the list of trait ids by default could be part of the issue. Maybe we could add a flag to optionally add the list of trait IDs? It greatly increases the size of the data frame that gets returned, and it would be nice if it were optional.
So what I'm doing now is querying means one-by-one:
for species in species_list:
for trait in trait_list:
BIEN_trait_mean(species, trait)
rbind(traits, new_trait)
This isn't the 'R' way of doing it but it works quickly - vectorizing a list of 20 species crashes my console.
The text was updated successfully, but these errors were encountered: