Skip to content

Commit

Permalink
Merge pull request #3384 from trailofbits/release-0-5-0
Browse files Browse the repository at this point in the history
Release v0.5.0
  • Loading branch information
ESultanik authored Nov 22, 2022
2 parents 8ff4e62 + 625889f commit ec0216f
Show file tree
Hide file tree
Showing 3 changed files with 68 additions and 64 deletions.
97 changes: 54 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ pip3 install polyfile

To install PolyFile from source, in the same directory as this README, run:
```
pip3 install -e .
pip3 install .
```

Important: Before installing from source, make sure Java is installed. Java is used to
Expand All @@ -35,11 +35,33 @@ This will automatically install the `polyfile` and `polymerge` executables in yo

## Usage

Running `polyfile` on a file with no arguments will mimic the behavior of `file --keep-going`:
```console
$ polyfile png-polyglot.png
PNG image data, 256 x 144, 8-bit/color RGB, non-interlaced
Brainfu** Program
Malformed PDF
PDF document, version 1.3, 1 pages
ZIP end of central directory record Java JAR archive
```
To generate an interactive hex viewer for the file, use the `--html` option:
```console
$ polyfile --html output.html png-polyglot.png
Found a file of type application/pdf at byte offset 0
Found a file of type application/x-brainfuck at byte offset 0
Found a file of type image/png at byte offset 0
Found a file of type application/zip at byte offset 0
Found a file of type application/java-archive at byte offset 0
Saved HTML output to output.html
```
usage: polyfile [-h] [--format {mime,html,json,sbud}] [--output OUTPUT]
[--filetype FILETYPE] [--list] [--html HTML]

Full usage instructions follow:
```
usage: polyfile [-h] [--format {file,mime,html,json,sbud}] [--output OUTPUT]
[--filetype FILETYPE] [--list] [--html HTML] [--explain]
[--only-match-mime] [--only-match] [--require-match]
[--max-matches MAX_MATCHES] [--debugger] [--no-debug-python]
[--max-matches MAX_MATCHES] [--debugger]
[--eval-command EVAL_COMMAND] [--no-debug-python]
[--quiet | --debug | --trace] [--version] [-dumpversion]
[FILE]
Expand All @@ -48,43 +70,46 @@ A utility to recursively map the structure of a file.
positional arguments:
FILE the file to analyze; pass '-' or omit to read from STDIN
optional arguments:
options:
-h, --help show this help message and exit
--format {mime,html,json,sbud}, -r {mime,html,json,sbud}
--format {file,mime,html,json,sbud}, -r {file,mime,html,json,sbud}
PolyFile's output format
Output formats are:
mime ... the detected MIME types associated with the file,
like the output of the `file` command
html ... an interactive HTML-based hex viewer
json ... a modified version of the SBUD format in JSON syntax
sbud ... equivalent to 'json'
file ...... the detected formats associated with the file,
like the output of the `file` command
mime ...... the detected MIME types associated with the file,
like the output of the `file --mime-type` command
explain ... like 'mime', but adds a human-readable explanation
for why each MIME type matched
html ...... an interactive HTML-based hex viewer
json ...... a modified version of the SBUD format in JSON syntax
sbud ...... equivalent to 'json'
Multiple formats can be output at once:
polyfile INPUT_FILE -f mime -f json
Their output will be concatenated to STDOUT in the order that
they occur in the arguments.
To save each format to a separate file, see the `--output` argument.
If no format is specified, PolyFile defaults to `--format sbud`,
but this will change to `--format mime` in v0.5.0
If no format is specified, PolyFile defaults to `--format file`
--output OUTPUT, -o OUTPUT
an optional output path for `--format`
Each instance of `--output` applies to the previous instance
of the `--format` option.
For example:
polyfile INPUT_FILE --format html --output output.html \
--format sbud --output output.json
will save HTML to to `output.html` and SBUD to `output.json`.
No two outputs can be directed at the same file path.
The path can be '-' for STDOUT.
If an `--output` is omitted for a format,
then it will implicitly be printed to STDOUT.
Expand All @@ -93,6 +118,7 @@ optional arguments:
--list, -l list the supported filetypes for the `--filetype` argument and exit
--html HTML, -t HTML path to write an interactive HTML file for exploring the PDF;
equivalent to `--format html --output HTML`
--explain equivalent to `--format explain
--only-match-mime, -I
"just print out the matching MIME types for the file, one on each line;
equivalent to `--format mime`
Expand All @@ -101,6 +127,8 @@ optional arguments:
--max-matches MAX_MATCHES
stop scanning after having found this many matches
--debugger, -db drop into an interactive debugger for libmagic file definition matching and PolyFile parsing
--eval-command EVAL_COMMAND, -ex EVAL_COMMAND
execute the given debugger command
--no-debug-python by default, the `--debugger` option will break on custom matchers and prompt to debug using PDB. This option will suppress those prompts.
--quiet, -q suppress all log output
--debug, -d print debug information
Expand All @@ -109,17 +137,6 @@ optional arguments:
-dumpversion print PolyFile's raw version information to STDOUT and exit
```

To generate a JSON mapping of a file, run:

```
polyfile INPUT_FILE > output.json
```

You can optionally have PolyFile output an interactive HTML page containing a labeled, interactive hexdump of the file:
```
polyfile INPUT_FILE --html output.html > output.json
```

### Interactive Debugger

PolyFile has an interactive debugger both for its file matching and parsing. It can be used to debug a libmagic pattern
Expand All @@ -140,7 +157,7 @@ It currently has support for parsing and semantically mapping the following form

For an example that exercises all of these file formats, run:
```bash
curl -v --silent https://www.sultanik.com/files/ESultanikResume.pdf | polyfile --html ESultanikResume.html - > ESultanikResume.json
curl -v --silent https://www.sultanik.com/files/ESultanikResume.pdf | polyfile --html ESultanikResume.html -
```

Prior to PolyFile version 0.3.0, it used the [TrID database](http://mark0.net/soft-trid-deflist.html) for file
Expand All @@ -150,13 +167,7 @@ TrID matching code is still shipped with PolyFile and can be invoked programmati

### Output Format

PolyFile outputs its mapping in an extension of the [SBuD](https://github.com/corkami/sbud) JSON format described [in the documentation](docs/json_format.md).

PolyFile can also emit a standalone HTML document that contains an interactive hex viewer as well as syntax trees for
the discovered file formats. Simply pass the `--html` argument to PolyFile with an output path:
```console
$ polyfile input_file --html output.html
```
PolyFile has several options for outputting its results, specified by its `--format` option. For computer-readable output, PolyFile has an extension of the [SBuD](https://github.com/corkami/sbud) JSON format described [in the documentation](docs/json_format.md). Prior to version 0.5.0 this was the default output format of PolyFile. However, now the default output format is to mimic the behavior of the `file` command. To maintain the original behavior, use the `--format sbud` option.

### libMagic Implementation

Expand Down
33 changes: 13 additions & 20 deletions polyfile/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,7 @@ def __exit__(self, exc_type, exc_val, exc_tb):

class FormatOutput:
valid_formats = ("mime", "html", "json", "sbud", "explain")
# TODO: Change this from "sbud" to "mime" in v0.5.0:
default_format = "sbud"
default_format = "file"

def __init__(self, output_format: Optional[str] = None, output_path: Optional[str] = None):
if output_format is None:
Expand Down Expand Up @@ -144,8 +143,7 @@ def main(argv=None):
To save each format to a separate file, see the `--output` argument.
If no format is specified, PolyFile defaults to `--format sbud`,
but this will change to `--format file` in v0.5.0"""))
If no format is specified, PolyFile defaults to `--format file`"""))

parser.add_argument('--output', '-o', action=ValidateOutput, type=str, # nargs=2,
# metavar=(f"{{{','.join(ValidateOutput.valid_outputs)}}}", "PATH"),
Expand Down Expand Up @@ -304,16 +302,7 @@ def main(argv=None):
stack.enter_context(debugger)
elif args.no_debug_python:
log.warning("Ignoring `--no-debug-python`; it can only be used with the --debugger option.")
if not sys.stdout.isatty() or not sys.stdin.isatty():
log.warning("""WARNING
!!!!!!!
The default output format for PolyFile will be changing in forthcoming release v0.5.0!
Currently, the default output format is SBUD/JSON.
In release v0.5.0, it will switch to the equivalent of the current `--format file` option.
To preserve the original behavior, add the `--format sbud` command line option.
Please update your scripts!
""")

analyzer = Analyzer(file_path, parse=not args.only_match, magic_matcher=magic_matcher)

needs_sbud = any(output_format.output_format in {"html", "json", "sbud"} for output_format in args.format)
Expand All @@ -339,14 +328,18 @@ def main(argv=None):
with output_format.output_stream as output:
if output_format.output_format == "file":
istty = sys.stderr.isatty() and output.isatty() and logging.root.level <= logging.INFO
lines = set()
with KeyboardInterruptHandler():
for match in analyzer.magic_matches():
if istty:
log.clear_status()
output.write(f"{match!s}\n")
output.flush()
else:
output.write(f"{match!s}\n")
line = str(match)
if line not in lines:
lines.add(line)
if istty:
log.clear_status()
output.write(f"{line}\n")
output.flush()
else:
output.write(f"{line}\n")
if istty:
log.clear_status()
elif output_format.output_format in ("mime", "explain"):
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
long_description_content_type="text/markdown",
url='https://github.com/trailofbits/polyfile',
author='Trail of Bits',
version="0.4.2",
version="0.5.0",
packages=find_packages(exclude=("tests",)),
python_requires='>=3.7',
install_requires=[
Expand Down

0 comments on commit ec0216f

Please sign in to comment.