Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PROPOSAL: Use prettyunits::pretty_bytes to prettify logging for multipart suggestion in put_object #417

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: aws.s3
Type: Package
Title: 'AWS S3' Client Package
Version: 0.3.22
Version: 0.3.23
Authors@R: c(person("Thomas J.", "Leeper", role = "aut",
email = "[email protected]",
comment = c(ORCID = "0000-0003-4097-6326")),
Expand All @@ -15,6 +15,7 @@ Authors@R: c(person("Thomas J.", "Leeper", role = "aut",
person("Andrii", "Degtiarov", role = "ctb"),
person("Dhruv", "Aggarwal", role = "ctb"),
person("Alyssa", "Columbus", role = "ctb"),
person("Matt", "Kaye", role = "ctb"),
person("Simon", "Urbanek", role = c("cre", "ctb"),
email = "[email protected]")
)
Expand All @@ -33,7 +34,8 @@ Imports:
xml2 (> 1.0.0),
base64enc,
digest,
aws.signature (>= 0.3.7)
aws.signature (>= 0.3.7),
prettyunits
Suggests:
testthat,
datasets
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ importFrom(curl,curl)
importFrom(curl,handle_setheaders)
importFrom(curl,new_handle)
importFrom(digest,digest)
importFrom(prettyunits,pretty_bytes)
importFrom(tools,file_ext)
importFrom(tools,md5sum)
importFrom(utils,URLencode)
Expand Down
7 changes: 7 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
# aws.s3 0.3.23

## Features

* `put_object` now uses `prettyunits::pretty_bytes` to prettify log messages printed when suggesting
using `multipart = TRUE`.

# aws.s3 0.3.22

## API changes
Expand Down
37 changes: 19 additions & 18 deletions R/put_object.R
Original file line number Diff line number Diff line change
Expand Up @@ -14,48 +14,48 @@
#' @param partsize numeric, size of each part when using multipart upload. AWS imposes a minimum size (currently 5MB) so setting a too low value may fail. Note that it can be set to \code{Inf} in conjunction with \code{multipart=FALSE} to silence the warning suggesting multipart uploads for large content.
#' @template dots
#' @details This provides a generic interface for storing objects to S3. Some convenience wrappers are provided for common tasks: e.g., \code{\link{s3save}} and \code{\link{s3saveRDS}}.
#'
#'
#' Note that S3 is a flat file store. So there is no folder hierarchy as in a traditional hard drive. However, S3 allows users to create pseudo-folders by prepending object keys with \code{foldername/}. The \code{put_folder} function is provided as a high-level convenience function for creating folders. This is not actually necessary as objects with slashes in their key will be displayed in the S3 web console as if they were in folders, but it may be useful for creating an empty directory (which is possible in the web console).
#'
#' \strong{IMPORTANT}: In aws.s3 versions before 0.3.22 the first positional argument was \code{file} and \code{put_object} changed behavior depending on whether the file could be found or not. This is inherently very dangerous since \code{put_object} would only store the filename in cases there was any problem with the input. Therefore the first argument was changed to \code{what} which is always the content to store and now also supports connection. If not used, \code{file} is still a named argument and can be set instead - it will be always interpreted as a filename, failing with an error if it doesn't exist.
#'
#' When using connections in \code{what} it is preferrable that they are either unopened or open in binary mode. This condition is mandatory for multipart uploads. Text connections are inherently much slower and may not deliver identical results since they mangle line endings. \code{put_object} will automatically open unopened connections and always closes the connection before returning.
#'
#'
#' @return If successful, \code{TRUE}.
#' @examples
#' \dontrun{
#' library("datasets")
#'
#'
#' # write file to S3
#' tmp <- tempfile()
#' on.exit(unlink(tmp))
#' utils::write.csv(mtcars, file = tmp)
#' # put object with an upload progress bar
#' put_object(file = tmp, object = "mtcars.csv", bucket = "myexamplebucket", show_progress = TRUE)
#'
#'
#' # create a "folder" in a bucket (NOT required! Folders are really just 0-length files)
#' put_folder("example", bucket = "myexamplebucket")
#' ## write object to the "folder"
#' put_object(file = tmp, object = "example/mtcars.csv", bucket = "myexamplebucket")
#'
#'
#' # write serialized, in-memory object to S3
#' x <- rawConnection(raw(), "w")
#' utils::write.csv(mtcars, x)
#' put_object(rawConnectionValue(x), object = "mtcars.csv", bucket = "myexamplebucketname")
#'
#'
#' # use `headers` for server-side encryption
#' ## require appropriate bucket policy
#' ## encryption can also be set at the bucket-level using \code{\link{put_encryption}}
#' put_object(file = tmp, object = "mtcars.csv", bucket = "myexamplebucket",
#' headers = c('x-amz-server-side-encryption' = 'AES256'))
#'
#'
#' # alternative "S3 URI" syntax:
#' put_object(rawConnectionValue(x), object = "s3://myexamplebucketname/mtcars.csv")
#' close(x)
#'
#'
#' # read the object back from S3
#' read.csv(text = rawToChar(get_object(object = "s3://myexamplebucketname/mtcars.csv")))
#'
#'
#' # multi-part uploads for objects over 5MB
#' \donttest{
#' x <- rnorm(3e6)
Expand All @@ -68,6 +68,7 @@
#' @references \href{http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html}{API Documentation}
#' @seealso \code{\link{put_bucket}}, \code{\link{get_object}}, \code{\link{delete_object}}, \code{\link{put_encryption}}
#' @importFrom utils head
#' @importFrom prettyunits pretty_bytes
#' @export
put_object <-
function(
Expand Down Expand Up @@ -104,7 +105,7 @@ function(

## we cache connection info
what.info <- if (inherits(what, "connection")) summary(what) else NULL

## auto-detect file name if object is not provided
if (missing(object) && inherits(what, "connection") && what.info$class == "file") {
if (missing(bucket))
Expand Down Expand Up @@ -190,7 +191,7 @@ function(
headers = headers,
...)
id <- initialize[["UploadId"]]

# function to call abort if any part fails (otherwise the user pays for incomplete payload!)
abort.upload <- function(id) delete_object(object = object, bucket = bucket, query = list(uploadId = id), ...)

Expand All @@ -214,7 +215,7 @@ function(
if (length(data) == 0) ## end of payload
break

r <- s3HTTP(verb = "PUT",
r <- s3HTTP(verb = "PUT",
bucket = bucket,
path = paste0('/', object),
query = list(partNumber = i, uploadId = id),
Expand Down Expand Up @@ -247,7 +248,7 @@ function(
}

if (!is.na(size) && size > partsize)
message("File size is ", size, ", consider setting using multipart=TRUE")
message("File size is ", pretty_bytes(size), ", consider setting using multipart=TRUE")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the meat of it


## httr doesn't support connections so we have to read it all into memory first
if (inherits(what, "connection")) {
Expand All @@ -271,10 +272,10 @@ function(
}
}

r <- s3HTTP(verb = "PUT",
r <- s3HTTP(verb = "PUT",
bucket = bucket,
path = paste0('/', object),
headers = headers,
headers = headers,
request_body = what,
verbose = verbose,
show_progress = show_progress,
Expand Down Expand Up @@ -303,10 +304,10 @@ post_object <- function(file, object, bucket, headers = list(), ...) {
if (!"Content-Length" %in% names(headers)) {
headers <- c(headers, list(`Content-Length` = formatSize(calculate_data_size(file))))
}
r <- s3HTTP(verb = "POST",
r <- s3HTTP(verb = "POST",
bucket = bucket,
path = paste0("/", object),
headers = headers,
headers = headers,
request_body = file,
...)
structure(r, class = "s3_object")
Expand Down Expand Up @@ -334,7 +335,7 @@ complete_parts <- function(object, bucket, id, parts, ...) {
bucket <- get_bucketname(object)
}
object <- get_objectkey(object)

tmp <- tempfile()
xml2::write_xml(xml2::as_xml_document(list(CompleteMultipartUpload = parts)), tmp, options = "no_declaration")
post_object(file = tmp, object = object, bucket = bucket, query = list(uploadId = id), ...)
Expand Down
2 changes: 1 addition & 1 deletion man/put_object.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.