Skip to content

Commit

Permalink
Merge pull request #14 from richardrodgers/archive_ext
Browse files Browse the repository at this point in the history
Refactor archive timestamp suppression - API change. Closes #13
  • Loading branch information
Richard Rodgers authored Jan 8, 2017
2 parents 9a8ed87 + 8262f01 commit 0b4d7bf
Show file tree
Hide file tree
Showing 7 changed files with 96 additions and 123 deletions.
20 changes: 11 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,17 @@ commons compression library for support of tarred Gzip archive format (".tgz"),

[![Build Status](https://travis-ci.org/richardrodgers/bagit.svg?branch=master)]
(https://travis-ci.org/richardrodgers/bagit)
[![Dependency Status](https://dependencyci.com/github/richardrodgers/bagit/badge)]
(https://dependencyci.com/github/richardrodgers/bagit)

## Use Cases ##

The library attempts to simplify a few of the most common use cases/patterns involving bag packages.
The first (the _producer_ pattern) is where content is assembled and placed into a bag, and the bag is then serialized
for transport/hand-off to another component or system. The goal here is to ensure that the constructed bag is correct.
A helper class - bag _Filler_ - is used to orchestrate this assembly. Sequence: new Filler -> add content -> add more content -> serialize.
A helper class - _Filler_ - is used to orchestrate this assembly. Sequence: new Filler -> add content -> add more content -> serialize.
The second (the _consumer_ pattern) is where a bag serialization (or a loose directory) is given and must
be interpreted and validated for use. Here another helper class _Loader_ is used to deserialize.
be interpreted and validated for use. Here another helper class - _Loader_ - is used to deserialize.
Sequence: new Loader -> load serialization -> convert to Bag -> process contents. If you have more complex needs
in java, (e.g. support for multiple spec versions), you may wish to consider the [Library of Congress Java Library](https://github.com/LibraryOfCongress/bagit-java).

Expand All @@ -35,7 +37,7 @@ archive. To convert the same bag to use a compressed tar format:

InputStream bagStream = filler.toStream("tgz");

We don't always need bag I/O streams - suppose we wish obtain a reference to an archive file object instead:
We don't always want bag I/O streams - suppose we wish to obtain a bag archive file package instead:

Path bagPackage = new Filler().payload(file1).metadata("External-Identifier", "mit.edu.0001").toPackage();

Expand Down Expand Up @@ -93,10 +95,10 @@ See the [Javadoc](http://richardrodgers.github.io/bagit/javadoc/index.html) for
## Archive formats ##

Bags are commonly serialized to standard archive formats such as ZIP. The library supports two archive formats:
'zip' and 'tgz' and the variants 'zip.nt' and 'tgz.nt' (no time). If these variants are used, the library
suppresses the file creation/modification time attributes, in order that checksums of archives produced at different times
may accurately reflect only bag contents. That is, the checksum of a zip.nt bag (of the same name) is time-of-archiving-
and filesystem-time-invariant, but content-sensitive.
'zip' and 'tgz' and a variant in each of these. If the variant is used, the library suppresses the file
creation/modification time attributes, in order that checksums of archives produced at different times
may accurately reflect only bag contents. That is, the checksum of a zipped bag (with no timestamp variant) is
time-of-archiving and filesystem-time-invariant, but content-sensitive. The variant is requested with an API call.

## Extras ##

Expand Down Expand Up @@ -131,14 +133,14 @@ Fat jars include all dependencies in a single executable jar (no classpath decla
The distribution jars are kept at [Bintray](https://bintray.com), so make sure that repository is declared.
Then (NB: using the most current version), for Gradle:

compile 'edu.mit.lib:bagit:0.7'
compile 'edu.mit.lib:bagit:0.8'

or Maven:

<dependency>
<groupId>edu.mit.lib</groupId>
<artifactId>bagit</artifactId>
<version>0.7</version>
<version>0.8</version>
</dependency>

in a standard pom.xml dependencies block.
4 changes: 2 additions & 2 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ sourceCompatibility = 1.8

group = 'edu.mit.lib'
archivesBaseName = 'bagit'
version = '0.7'
version = '0.8'
description = 'Compact Java BagIt library'

ext {
Expand All @@ -28,7 +28,7 @@ ext {
}

dependencies {
compile group: 'org.apache.commons', name: 'commons-compress', version: '1.9'
compile group: 'org.apache.commons', name: 'commons-compress', version: '1.13'
testCompile group: 'junit', name: 'junit', version: '4.11'
}

Expand Down
30 changes: 11 additions & 19 deletions src/main/java/edu/mit/lib/bagit/Bag.java
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@
import java.nio.file.Path;
import java.nio.file.attribute.BasicFileAttributes;
import java.security.DigestInputStream;
import java.security.DigestOutputStream;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.ArrayList;
Expand Down Expand Up @@ -44,7 +43,7 @@ public class Bag {
static final String ENCODING = "UTF-8";
static final String CS_ALGO = "MD5";
static final String BAGIT_VSN = "0.97";
static final String LIB_VSN = "0.7";
static final String LIB_VSN = "0.8";
static final String DFLT_FMT = "zip";
static final String TGZIP_FMT = "tgz";
static final String SPACER = " ";
Expand Down Expand Up @@ -77,9 +76,9 @@ public enum MetadataName {
INTERNAL_SENDER_DESC("Internal-Sender-Description"),
BAG_SOFTWARE_AGENT("Bag-Software-Agent"); // not in IETF spec

private String mdName;
private final String mdName;

private MetadataName(String name) {
MetadataName(String name) {
mdName = name;
}

Expand Down Expand Up @@ -168,11 +167,9 @@ public boolean isComplete() throws IOException {
// # tag files and # manifest entries must agree
// tag files consist of any top-level files except:
// tagmanifest itself, and the payload directory.
DirectoryStream.Filter<Path> filter = new DirectoryStream.Filter<Path>() {
public boolean accept(Path file) throws IOException {
String name = file.getFileName().toString();
return ! (name.startsWith(TAGMANIF_FILE) || name.startsWith(DATA_DIR));
}
DirectoryStream.Filter<Path> filter = file -> {
String name = file.getFileName().toString();
return ! (name.startsWith(TAGMANIF_FILE) || name.startsWith(DATA_DIR));
};
int tagCount = 0;
try (DirectoryStream<Path> stream = Files.newDirectoryStream(baseDir, filter)) {
Expand Down Expand Up @@ -277,10 +274,9 @@ public Map<String, String> payloadRefs() throws IOException {
*
* @param relPath the relative path of the file from the bag root directory
* @return tagfile the tag file path, or null if no file at the specified path
* @throws IOException if unable to access tag file
* @throws IllegalAccessException if bag is sealed
*/
public Path tagFile(String relPath) throws IOException, IllegalAccessException {
public Path tagFile(String relPath) throws IllegalAccessException {
if (sealed) {
throw new IllegalAccessException("Sealed Bag: no file access allowed");
}
Expand Down Expand Up @@ -354,7 +350,7 @@ public List<String> property(String relPath, String name) throws IOException {
try (BufferedReader reader = Files.newBufferedReader(bagFile(relPath), StandardCharsets.UTF_8)) {
String propName = null;
StringBuilder valSb = new StringBuilder();
String line = null;
String line;
while ((line = reader.readLine()) != null) {
// if line does not start with spacer, it is a new property
if (! line.startsWith(SPACER)) {
Expand Down Expand Up @@ -414,7 +410,7 @@ public Map<String, String> tagManifest() throws IOException {
public Map<String, String> manifest(String relPath) throws IOException {
Map<String, String> mfMap = new HashMap<>();
try (BufferedReader reader = Files.newBufferedReader(bagFile(relPath), StandardCharsets.UTF_8)) {
String line = null;
String line;
while((line = reader.readLine()) != null) {
String[] parts = line.split(" ");
mfMap.put(parts[1], parts[0]);
Expand All @@ -439,11 +435,7 @@ private int fileCount(Path dir) throws IOException {
}

private void addProp(String name, String value, Map<String, List<String>> mdSet) {
List<String> vals = mdSet.get(name);
if (vals == null) {
vals = new ArrayList<>();
mdSet.put(name, vals);
}
List<String> vals = mdSet.computeIfAbsent(name, k -> new ArrayList<>());
vals.add(value.trim());
}

Expand Down Expand Up @@ -484,7 +476,7 @@ static Map<String, String> payloadRefs(Path refFile) throws IOException {
Map<String, String> refMap = new HashMap<>();
if (Files.exists(refFile)) {
try (BufferedReader reader = Files.newBufferedReader(refFile, StandardCharsets.UTF_8)) {
String line = null;
String line;
while((line = reader.readLine()) != null) {
String[] parts = line.split(" ");
refMap.put(parts[2], parts[0]);
Expand Down
19 changes: 11 additions & 8 deletions src/main/java/edu/mit/lib/bagit/Bagger.java
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,14 @@
public class Bagger {
/* A bit clunky in the cmd-line arg handling, but deliberately so as to limit
external dependencies for those who want to only use the library API directly. */
private List<String> payloads = new ArrayList<>();
private List<String> references = new ArrayList<>();
private List<String> tags = new ArrayList<>();
private List<String> statements = new ArrayList<>();
private final List<String> payloads = new ArrayList<>();
private final List<String> references = new ArrayList<>();
private final List<String> tags = new ArrayList<>();
private final List<String> statements = new ArrayList<>();
private String archFmt = "directory";
private boolean noTime = false;
private String csAlg = "MD5";
private List<String> optFlags = new ArrayList<>();
private final List<String> optFlags = new ArrayList<>();
private int verbosityLevel;

public static void main(String[] args) throws IOException, IllegalAccessException {
Expand All @@ -47,6 +48,7 @@ public static void main(String[] args) throws IOException, IllegalAccessExceptio
case "-r": bagger.references.add(args[i+1]); break;
case "-t": bagger.tags.add(args[i+1]); break;
case "-m": bagger.statements.add(args[i+1]); break;
case "-n": bagger.noTime = Boolean.valueOf(args[i+1]); break;
case "-a": bagger.archFmt = args[i+1]; break;
case "-c": bagger.csAlg = args[i+1]; break;
case "-o": bagger.optFlags.add(args[i+1]); break;
Expand Down Expand Up @@ -80,7 +82,8 @@ public static void usage() {
"-r <bag path>=<URL> - payload reference\n" +
"-t [<bag path>=]<tag file>\n" +
"-m <name>=<value> - metadata statement\n" +
"-a <archive format> - e.g. 'zip', 'zip.nt', 'tgz', 'tgz.nt' (default: loose directory)\n" +
"-a <archive format> - e.g. 'zip', 'tgz', (default: loose directory)\n" +
"-n <noTime> - 'true' or 'false'\n" +
"-c <checksum algorithm> - default: 'MD5'\n" +
"-o <optimization flag>\n" +
"-v <level> - output level to console (default: 0 = no output)");
Expand Down Expand Up @@ -121,11 +124,11 @@ private void fill(Path baseDir) throws IOException {
String[] parts = statement.split("=");
filler.metadata(parts[0], parts[1]);
}
Path bagPath = null;
Path bagPath;
if (archFmt.equals("directory")) {
bagPath = filler.toDirectory();
} else {
bagPath = filler.toPackage(archFmt);
bagPath = filler.toPackage(archFmt, noTime);
}
if (verbosityLevel > 0) {
message(bagPath.getFileName().toString(), true, "created");
Expand Down
Loading

0 comments on commit 0b4d7bf

Please sign in to comment.