-
Notifications
You must be signed in to change notification settings - Fork 551
Inserting Pages from other PDFs
The recipe below the line is no longer required since v1.16.8: PyMuPDF is now a module and thus supports execution vie the commandline. To join arbitrary number of pages from several PDFs execute the following command:
python -m fitz join -o output.pdf -password <of output> input1 input2 ...
Specify each input file like so: filename[,password[,pages]].
To join the complete input, just specify the filename with an eventual password comma separated. If you only want specific pages, specify them 1-based, either as single integers or as a range "m-n", comma-separated from each other. To specify the last page, you can use the symbolic name "N" (capital N). Numbers / ranges can be in any sequence, non-unique and / or overlapping. If m > n for a range "m-n", then the pages are copied in reversed sequence.
For example consider joining the files
- file1.pdf: all pages, but back to front, no password
- file2.pdf: last page, first page, password: "secret"
- file3.pdf: pages 5 to last, no password
specify this and forget the rest of this Wiki:
python -m fitz join -o output.pdf file1.pdf,,N-1 file2.pdf,secret,N,1 file3.pdf,,5-N
Method fitz.Document.insertPDF()
allows you to insert page ranges from another PDF document. Usage looks like this:
doc1 = fitz.open("file1.pdf") # must be a PDF
doc2 = fitz.open("file2.pdf") # must be a PDF
doc1.insertPDF(
doc2, # cannot be the same object as doc1
from_page=n, # first page to copy, default: 0
to_page=m, # last page to copy, default: last page
start_at=k, # target location in doc1, default: at end
rotate=deg, # rotate copied pages
links=True, # also copy links
annots=True, # also copy annotations
)
Except doc2
, all parameters are optional.
This makes available the MuPDF CLI tool mutool merge
to Python. In technical PDF terms, for every page object, /Contents, /Resources, /MediaBox, /CropBox, /BleedBox, /TrimBox, /ArtBox, /Rotate, /UserUnit, /Annots are copied.
Bookmarks / outlines of doc2 are not copied. But the TOC structure of doc1 will remain intact with the copy operation.
In PyMuPDF we have extended the copy scope in the following way:
- Links are copied if they point to pages in the copy range, or to some outside resource.
- Optionally rotate copied pages.
-
doc1
anddoc2
must not be the same object, but may be the same file (opened twice under different objects)
Obviously, from_page
may equal to_page
- then only one page is copied.
Less obvious: if you specify from_page > to_page (!), then the range is copied back to front.
It is quite easy to create joined tables of content (TOC) when concatenating complete files - see below. For a more sophisticated solution look at this example. It can join arbitrary ranges of PDF files together with their respective TOC pieces.
This will concatenate two PDFs, also joining their tables of content:
len1 = len(doc1) # number of doc1 pages
toc1 = doc1.getToC(False) # full TOC of doc1
toc2 = doc2.getToC(False) # full TOC of doc2
for bm in toc2: # bookmarks of doc2 ...
bm[2] += len1 # need increased page numbers
toc = toc1 + toc2 # concatenate full TOC's
doc1.insertPDF(doc2) # concatenate PDFs
doc1.setToC(toc) # new TOC
Copy pages 10 to 20 from some PDF, but rotated, in reversed order and in front of the doc1 pages:
doc1.insertPDF(
doc2,
from_page=20,
to_page=10,
start_at=0,
rotate=-90,
)
This snippet will create a new PDF from the last pages of a bunch of input files. Please especially note how we specify those last pages:
>>> import fitz
>>> flist = ("1.pdf", "2.pdf", "3.pdf", "4.pdf",)
>>> doc = fitz.open()
>>> for f in flist:
infile = fitz.open(f)
lastPage = len(infile) - 1
doc.insertPDF(infile, from_page=lastPage, to_page=lastPage, rotate=90)
infile.close()
>>> doc.save("out.pdf", deflate=True, garbage=3)
HOWTO Button annots with JavaScript
HOWTO work with PDF embedded files
HOWTO extract text from inside rectangles
HOWTO extract text in natural reading order
HOWTO create or extract graphics
HOWTO create your own PDF Drawing
Rectangle inclusion & intersection
Metadata & bookmark maintenance