toreenter.blogg.se - Pdfwriter module python pypi

PDFWRITER MODULE PYTHON PYPI HOW TO
PDFWRITER MODULE PYTHON PYPI PDF
PDFWRITER MODULE PYTHON PYPI CODE

(The watermark will be the first and only XObject in the PageMerge list.) We immediately discard the blank page itself, after extracting the watermark XObject from it. Rather than complicate our loop by creating a watermarkon the first page, and then pulling it out to use on subsequent pages, we simply create, and watermark, a completely blank page. Watermarks are a special case, because we want to create a form XObject of the watermark page, and then reuse the same XObject on every page, so we don’t increase the size of the output file too much.

PDFWRITER MODULE PYTHON PYPI PDF

(Most other pdfrw examples construct new pages, and construct a new PDF document incorporating those.)įrom pdfrw import PdfReader, PdfWriter, PageMerge Unlike many of the pdfrw examples, this one leaves things like bookmarks intact, because it watermarks the pages in-place, and then writes out the pre-existing PDF file document. If pdfrw rejects your watermark, you can probably fix it by running the watermark PDF through pdftk or some other package that can decompress it, perhaps even PyPDF2. This is problematic because the contents of the form XObject have to be a PDF dictionary, not an array.

PDFWRITER MODULE PYTHON PYPI HOW TO

The cases where it doesn’t work well are where the contents of the watermark page are in an array of compressed objects that pdfrw doesn’t yet know how to decompress.

This is, in some ways, easier to get right in many cases, because there are fewer possible resource dictionary conflicts between the watermark page and the page it is applied to. Since pdfrw gives you low level access to PDF objects, you could mimic this behavior with pdfrw and a small bit of graphics code, but the canonical pdfrw version of this example uses a form XObject to represent the watermark. The layer merge example from Tim’s tutorial applies a watermark to a PDF by opening a source PDF and a watermark PDF, and modifying each page object by drawing the first page of the watermark PDF on top of every source PDF page.

For the purposes of this article, I’m simply going to take the PyPDF2 examples from Tim’s tutorial, and rework them to use pdfrw. They need a bit more documentation, and the library needs more documentation, but I’m slowly working on that. There are several examples at the pdfrw home page, including examples that use pdfrw in conjunction with reportlab. It also looks and feels a bit different, because of this focus on lower-level PDF container objects.

PDFWRITER MODULE PYTHON PYPI CODE

Also, rather than trying to create full-featured objects that provide attributes for every single thing you could do with a document, pdfrw has a more simple model that is built on modelling low-level PDF objects, and then adding some domain-specific procedural code on top of that for a few different tasks. Pdfrw has (I believe) a faster parser than the other libraries. Due to pdfrw’s form XObject support, I believe that it is the only package, aside from reportlab’s proprietary pagecatcher software, that supports reuse of elements from preexisting PDF files in reportlab output. One area where pdfrw shines is in reusing PDFs in conjunction with reportlab. I am actively working on bookmark support for pdfrw, but it has none at present. It also has specialized functions for several things such as merging bookmarks from two different PDFs. PyPDF2 supports more PDF features than pdfrw, including decryption and more types of decompression.

(I’m not an expert with PyPDF2 by any means, so please let me know in the comments if I have made any egregious errors.) In terms of focus, pdfrw is much closer to PyPDF2 than it is to pdfminer, so the rest of this article discusses pdfrw in relation to PyPDF2. As discussed in Tim’s tutorial, the two most popular pure Python PDF libraries are pdfminer and PyPDF2. It also has no dependencies except Python, and the current version (0.2) is available on PyPI for both Python 2 and Python 3 (2.6, 2.7, 3.3, and 3.4). What good is it?Īs you may have garnered from either the introduction, or from the name of the library, pdfrw can read and write PDF files. In this tutotial, I’ll provide a primer on pdfrw, complete with an overview of its features and some examples. Since I’ve started cleaning it up, I figured I might as well also put some effort into telling people about it. Now pdfrw is at version 0.2, and I hope not to get so far behind in the future. During the transition I’ve fixed bugs, incorporated some tests, added support for Python 3, and merged some code that someone contributed for parsing PDF 1.5 stream objects. Since Google Code shut down, I finally moved the project to Github.