-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filter option in get_eurostat() doesn't work behind firewall #185
Comments
Thank you for reporting and proposing a solution. These connection issues are really tricky. This could also create problems, as in some systems Do someone have idea, what could be most coherent solution? Also, have you tried to set proxy options in |
I have a lot of sympathy for anyone who has to deal with connection issues! Thanks for taking the time to respond to me :) I had a look through #181 and also through the read_tsv calls. I think in the two cases that count wrapping the variable price_url<-eurostat:::eurostat_json_url("prc_hicp_midx",
filters = list(geo="EA",
unit="I96",
coicop= c('CP00', 'CP011', 'CP02', 'CP03')),
lang ='en')
options('url.method'= 'default')
url(price_url)
# A connection with
# description "http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/prc_hicp_midx?geo=EA&unit=I96&coicop=CP00&coicop=CP011&coicop=CP02&coicop=CP03"
# class "url-wininet"
# mode "r"
# text "text"
# opened "closed"
# can read "yes"
# can write "no"
options('url.method'= 'libcurl')
url(price_url)
# A connection with
# description "http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/prc_hicp_midx?geo=EA&unit=I96&coicop=CP00&coicop=CP011&coicop=CP02&coicop=CP03"
# class "url-libcurl"
# mode "r"
# text "text"
# opened "closed"
# can read "yes"
# can write "no"
options('url.method'= 'internal')
url(price_url)
# A connection with
# description "http://ec.europa.eu/eurostat/wdds/rest/data/v2.1/json/en/prc_hicp_midx?geo=EA&unit=I96&coicop=CP00&coicop=CP011&coicop=CP02&coicop=CP03"
# class "url"
# mode "r"
# text "text"
# opened "closed"
# can read "yes"
# can write "no"
I think without wrapping a the URL value with the Nevertheless, I looked into the internals of get_eurostat_json_2<-function(id, filters = NULL, type = c("code", "label",
"both"), lang = c("en", "fr", "de"),
stringsAsFactors = default.stringsAsFactors(), ...){
json_url <- eurostat_json_url(id = id, filters = filters,
lang = lang)
file<-url(json_url)
name<- readr:::source_name(file)
file<- readr:::standardise_path(file)
if (readr:::is.connection(file)) {
data <- readr:::datasource_connection(file, skip=F, skip_empty_rows=F)
} else{
stop("file is not a valid connection")
}
content <- file(data[[1]])
json <- readLines(content, warn = F)
close(content)
jdat<-jsonlite::fromJSON(json)
type <- 'code'
dims <- jdat$dimension
ids <- jdat$id
dims_list <- lapply(dims[rev(ids)], function(x) {
y <- x$category$label
if (type[1] == "label") {
y <- unlist(y, use.names = FALSE)
}
else if (type[1] == "code") {
y <- names(unlist(y))
}
else if (type[1] == "both") {
y <- unlist(y)
}
else {
stop("Invalid type ", type)
}
})
variables <- expand.grid(dims_list, KEEP.OUT.ATTRS = FALSE,
stringsAsFactors = stringsAsFactors)
dat <- as.data.frame(variables[rev(names(variables))])
vals <- unlist(jdat$value, use.names = FALSE)
dat$values <- rep(NA, nrow(dat))
inds <- 1 + as.numeric(names(jdat$value))
if (!length(vals) == length(inds)) {
stop("Complex indexing not implemented.")
}
dat$values[inds] <- vals
tibble::as_tibble(dat)
} One last thought though, on how to merge this version with the existing version, if you wrap the URL generated in a get_eurostat_json<-function (id, filters = NULL, type = c("code", "label",
"both"), lang = c("en", "fr", "de"),
stringsAsFactors = default.stringsAsFactors(), ...)
{
if (!check_access_to_data()) {
message("You have no access to ec.europe.eu. \n Please check your connection and/or review your proxy settings")
}
else {
url <- eurostat_json_url(id = id, filters = filters,
lang = lang)
file<- url(url)
if(summary(file)$class == 'url-wininet'){
name<- readr:::source_name(file)
file<- readr:::standardise_path(file)
if (readr:::is.connection(file)) {
data <- readr:::datasource_connection(file, skip=F, skip_empty_rows=F)
} else{
stop("file is not a valid connection")
}
content <- file(data[[1]])
json <- readLines(content, warn = F)
close(content)
jdat<-jsonlite::fromJSON(json)
} else{
resp <- httr::GET(url)
if (httr::http_error(resp)) {
stop(paste("The requested url cannot be found within the get_eurostat_json function:",
url))
}
status <- httr::status_code(resp)
msg <- ". Some datasets are not accessible via the eurostat\n interface. You can try to search the data manually from the comext\n \t database at http://epp.eurostat.ec.europa.eu/newxtweb/ or bulk\n \t download facility at\n \t http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing\n \t or annual Excel files\n \t http://ec.europa.eu/eurostat/web/prodcom/data/excel-files-nace-rev.2"
if (status == 200) {
jdat <- jsonlite::fromJSON(url)
}
else if (status == 400) {
stop("Failure to get data. Probably invalid dataset id. Status code: ",
status, msg)
}
else if (status == 500) {
stop("Failure to get data. Probably filters did not return any data \n or data exceeded query size limitation. Status code: ",
status, msg)
}
else {
stop("Failure to get data. Status code: ",
status, msg)
}
}
dims <- jdat$dimension
ids <- jdat$id
dims_list <- lapply(dims[rev(ids)], function(x) {
y <- x$category$label
if (type[1] == "label") {
y <- unlist(y, use.names = FALSE)
}
else if (type[1] == "code") {
y <- names(unlist(y))
}
else if (type[1] == "both") {
y <- unlist(y)
}
else {
stop("Invalid type ", type)
}
})
variables <- expand.grid(dims_list, KEEP.OUT.ATTRS = FALSE,
stringsAsFactors = stringsAsFactors)
dat <- as.data.frame(variables[rev(names(variables))])
vals <- unlist(jdat$value, use.names = FALSE)
dat$values <- rep(NA, nrow(dat))
inds <- 1 + as.numeric(names(jdat$value))
if (!length(vals) == length(inds)) {
stop("Complex indexing not implemented.")
}
dat$values[inds] <- vals
tibble::as_tibble(dat)
}
}
The problem for setting the proxy on my side is not one for me, but for new users internally, for whom any speak of networking is already 'very deep' technical knowledge :) . |
If this can be solved, it would be great and highly relevant. The only thing I am concerned about is that we might like to be careful with how much extra overheads and instability the possible solution would introduce in the code. This is not the only package with this problem, and in general institutions may need to develop more generic solutions for these anyway if there is a need to use these workflows. Although, making it easier for some packages might help to advance those efforts.. |
Certainly. If this can't be incorporated because of complexity/stability risks, perhaps it is sufficient to leave this issue here. Others may copy the code snippet above if they face similar problems. |
Thanks @cbizzo for working on this issue. I am also bit hesitant to use non-exported functions as solution. However, your first solution using My understandin is that majorty of users do not have these issues, but
Question is how to solve them both with a robust solution. |
Hi, I noticed that the
filter
option ofget_eurostat
breaks when working on Windows and behind a corporate firewall.However if I don't specify the filters this works fine:
Digging deeper, I can see it is caused by the
httr::GET()
command inget_eurostat_json
, which is invoked when the user specifies a filter value. The difference is, when no filter option is calledfile.download
is used to get the data, which may use WinINet, or libcurl depending on the users installation, whereas if the filter option is used the user is forced to use libcurl (via httr) which may require proxy settings.Could you add an option or a separate function which allows the user to download using
file()
instead of httr?file()
will use the system default (libcurl on UNIX/Linux, WinINet on Windows).The below code replicates your excellent work but has some changes that allow the download to work behind a corporate firewall, without need to revert to the proxy:
I think this issue is related (at least thematically) to an issue I raised before #150
The text was updated successfully, but these errors were encountered: