-
Notifications
You must be signed in to change notification settings - Fork 551
Wrapping FileOptimizer
FileOptimizer is a GNU LGPLv3 licensed tool for Windows platforms that reduces file sizes for many dozens of file types - among them PDF. It actually is a frontend to a plethora of other, highly specialized compression tools as plugins to achieve this.
If you want to squeeze your PDFs as much as possible, consider trying FileOptimizer.
For PDFs, FileOptimizer uses plugins for Ghostscript and smpdf. Compression results can be quite impressive: I often get 30% to 50%, but I have seen 90%, too.
Here is the issue: smpdf is free software for personal use only. When you use FileOptimizer for your PDF, you will find afterwards, that both metadata fields /Producer
and /Creator
have been overwritten with the text Coherent Lossless PDF Compressor. Not for commercial use. http://www.coherentpdf.com
.
Annoying.
The following script (a wrapper-wrapper) restores a PDF's original metadata after optimization.
from __future__ import print_function
import fitz
import sys, os, subprocess, tempfile, time
'''
Optimizes a PDF with FileOptimizer. But as "/Producer" and "/Creator" get
spoiled by this, we first save the metadata and restore it afterwards.
We accept the cost of non-compressed object definitions (as created by
FileOptimizer).
'''
assert len(sys.argv) == 2, "need filename parameter"
fn = sys.argv[1]
assert fn.lower().endswith(".pdf"), "must be a PDF file"
fullname = os.path.abspath(fn) # get the full path & name
t0 = time.clock() # save current time
doc = fitz.open(fullname) # open PDF to save metadata
meta = doc.metadata
doc.close()
t1 = time.clock() # save current time again
subprocess.call(["fileoptimizer64", fullname]) # now invoke optimizer
t2 = time.clock() # save current time again
cdir = os.path.split(fullname)[0] # split dir from filename
fnout = tempfile.mkstemp(suffix = ".pdf", dir = cdir) # create temp pdf name
doc = fitz.open(fullname) # open optimized PDF
doc.setMetadata(meta) # restore old metadata
doc.save(fnout[1], garbage = 4) # save temp PDF with it
doc.close() # close it
os.remove(fn) # remove super optimized file
os.close(fnout[0]) # close temp file
os.rename(fnout[1], fn) # and rename it to original filename
t3 = time.clock() # save current time again
# put out runtime statistics
print("Timings:")
print(str(round(t1-t0, 4)).rjust(10), "save old metata")
print(str(round(t2-t1, 4)).rjust(10), "execute FileOptimizer")
print(str(round(t3-t2, 4)).rjust(10), "restore old metadata")
- Beware however, that this treatment does not change restriction to non-commercial use.
- FileOptimizer has reported to run with WINE under other platforms than Windows.
HOWTO Button annots with JavaScript
HOWTO work with PDF embedded files
HOWTO extract text from inside rectangles
HOWTO extract text in natural reading order
HOWTO create or extract graphics
HOWTO create your own PDF Drawing
Rectangle inclusion & intersection
Metadata & bookmark maintenance