Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update download.R to allow year vectors for baci data #250

Merged
merged 2 commits into from
Sep 6, 2024

Conversation

OlivazShai
Copy link
Contributor

Description of the bug

While downloading BACI data for single years is functional, downloding data from a vector of years raises errors. Say, seting year = 1995:2022 to download the entire series. The result is the following:

Error in utils::download.file(url = path, destfile = temp, method = "curl",  : 
 	 'url' must be a length-one character vector

Causes

The year parameter is used both for data cleaning and for downloading from the sources, when data for each year has a different url. But for Baci data, download is done from a single url, so we don’t need to generate a path for each year.

The download path is first defined as:

path <- param$url

Which would work. But, to account for sources with different urls by year, we have:

if (!is.null(param$year)) {
    path <- path %>%
      stringr::str_replace("\\$year\\$", as.character(param$year))
  }

If the user-provided year is a one-length vector or NULL, nothing changes and the download proceeds. But if we have an n-length vector, path also becomes a vector, with n repetitions of the url.

Then the download operation below raises the error I presented in the beginning:

if (download_method == "curl") {
    utils::download.file(url = path, destfile = temp, method = "curl", quiet = quiet)
  }

Workaround year = NULL fails

The possible workaround of doing year = NULL downloads everything to dat, including files “country_codes_V202401b” and “product_codes_HS92_V202401b”, since the regex only has the term “V202401b” now.

So rbindlist() in baci.R fails, returning:

Error in data.table::rbindlist(.) : 
  Item 29 has 4 columns, inconsistent with item 1 which has 6 columns. To fill missing columns use fill=TRUE.

Solution

I added a new (simple) exception to the definition of the download path in external_download. It ensures 'path' is the single url in 'param$url' when sourcing baci files.

Ensures 'path' is the single url in 'param$url' for baci source in external_download.
@IgorRigolon
Copy link
Contributor

Very nice, thank you. I'll make some other changes later today as I'm afraid this error might the affecting other functions as well. I added the str_replace bit withiut realising how it interacts with year vectors.

now this won't happen for baci, but also for other functions
@IgorRigolon
Copy link
Contributor

Pelo que eu testei essa outra forma também resolve. É que alguns links têm tipo www.../...$year$...., e usamos esses placeholders pra substituir o ano automaticamente. Pus um if (str_detect(path, "$year$")) pra só tentar usar o str_replace nesses casos e não gerar esse erro quando year é um vetor.

@IgorRigolon IgorRigolon merged commit 85204d2 into datazoompuc:master Sep 6, 2024
4 of 5 checks passed
@OlivazShai
Copy link
Contributor Author

Muito bom, fica melhor mesmo!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants