Shell API

pypdfium2 can also be used from the command-line.

Version

$ pypdfium2 --version
pypdfium2 5.12.1+1.g397216d5
pdfium 152.0.7947.0 at /home/docs/checkouts/readthedocs.org/user_builds/pypdfium2/envs/stable/lib/python3.14/site-packages/pypdfium2_raw/libpdfium.so

Main Help

$ pypdfium2 --help
usage: pypdfium2 [-h] [-v]
                 {arrange,attachments,extract-images,extract-text,imgtopdf,pageobjects,pdfinfo,fonts,default-fonts,render,tile,toc} ...

pypdfium2 is a Python binding to PDFium, a PDF processing library.
This is the command-line interface. Invoke as `pypdfium2` or `python -m pypdfium2_cli`.

pypdfium2's CLI mainly serves testing purposes, similar to pdfium_test upstream.
It is not meant as a feature-complete PDF toolkit for end users.
There are no API stability promises; backward incompatible changes may be made.

Environment variables:
- PYPDFIUM_LOGLEVEL {debug,info,warning,error,critical} = debug
  Controls the logging level.
- DEBUG_AUTOCLOSE {debug,warning,critical} = warning
  How much info to print about (auto-)closing of PDFium objects.
- DEBUG_UNSUPPORTED {0,1} = 1
  Whether to enable or disable the unsupported feature handler.
- DEBUG_SYSFONTS {0,1} = 0
  Whether to install a sysfont listener.

positional arguments:
  {arrange,attachments,extract-images,extract-text,imgtopdf,pageobjects,pdfinfo,fonts,default-fonts,render,tile,toc}
    arrange             Rearrange/merge documents
    attachments         List/extract/edit embedded files
    extract-images      Extract images
    extract-text        Extract text
    imgtopdf            Convert images to PDF
    pageobjects         Print info on pageobjects
    pdfinfo             Print info on document and pages
    fonts               List a document's fonts
    default-fonts       Dump info about default fonts
    render              Rasterize pages
    tile                Tile pages (N-up)
    toc                 Print table of contents

options:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit

Arranger

$ pypdfium2 arrange --help
usage: pypdfium2 arrange [-h] [--pages PAGES [PAGES ...]]
                         [--passwords PASSWORDS [PASSWORDS ...]]
                         --output OUTPUT
                         inputs [inputs ...]

Rearrange/merge documents

positional arguments:
  inputs                Sequence of PDF files.

options:
  -h, --help            show this help message and exit
  --pages PAGES [PAGES ...]
                        Sequence of page texts, definig the pages to include from each PDF. Use '_' as placeholder for all pages.
  --passwords PASSWORDS [PASSWORDS ...]
                        Passwords to unlock encrypted PDFs. Any placeholder may be used for non-encrypted documents.
  --output, -o OUTPUT   Target path for the output document

Attachments

$ pypdfium2 attachments --help
usage: pypdfium2 attachments [-h] [--password PASSWORD]
                             input {list,extract,edit} ...

List/extract/edit embedded files

positional arguments:
  input                Input PDF document
  {list,extract,edit}

options:
  -h, --help           show this help message and exit
  --password PASSWORD  A password to unlock the PDF, if encrypted

$ pypdfium2 attachments file.pdf list --help
usage: pypdfium2 attachments input list [-h]

options:
  -h, --help  show this help message and exit

$ pypdfium2 attachments file.pdf extract --help
usage: pypdfium2 attachments input extract [-h] [--numbers NUMBERS]
                                           --output-dir OUTPUT_DIR

options:
  -h, --help            show this help message and exit
  --numbers NUMBERS
  --output-dir, -o OUTPUT_DIR

$ pypdfium2 attachments file.pdf edit --help
usage: pypdfium2 attachments input edit [-h] [--del-numbers DEL_NUMBERS]
                                        [--add-files F [F ...]]
                                        --output OUTPUT

options:
  -h, --help            show this help message and exit
  --del-numbers, -d DEL_NUMBERS
  --add-files, -a F [F ...]
  --output, -o OUTPUT

Image Extractor

$ pypdfium2 extract-images --help
usage: pypdfium2 extract-images [-h] [--password PASSWORD] [--pages PAGES]
                                --output-dir OUTPUT_DIR
                                [--max-depth MAX_DEPTH] [--use-bitmap]
                                [--format FORMAT] [--render]
                                [--scale-to-original | --no-scale-to-original]
                                input

Extract images

positional arguments:
  input                 Input PDF document

options:
  -h, --help            show this help message and exit
  --password PASSWORD   A password to unlock the PDF, if encrypted
  --pages PAGES         Page numbers and ranges to include
  --output-dir, -o OUTPUT_DIR
                        Output directory to take the extracted images
  --max-depth MAX_DEPTH
                        Maximum recursion depth to consider when looking for pageobjects.
  --use-bitmap          Enforce the use of bitmaps rather than attempting a smart extraction of the image.
  --format FORMAT       Image format to use when saving bitmaps. (Fallback if doing smart extraction.)
  --render              When --use-bitmap is given, whether to get rendered bitmaps, taking masks and transform matrices into account.
  --scale-to-original, --no-scale-to-original
                        When --use-bitmap --render is given, whether to scale the image so it is rendered at its native resolution, or close to that. This should improve output quality. The default is True, but you may opt out.

Text Extractor

$ pypdfium2 extract-text --help
usage: pypdfium2 extract-text [-h] [--password PASSWORD] [--pages PAGES]
                              [--strategy {range,bounded}]
                              input

Extract text

Note that PDFium outputs CRLF (\r\n) style line breaks.
This may be undesirable or confusing in some situations, e.g. when processing the output with an (unaware) parser on the command line.
If this is an issue, run e.g. `dos2unix` on the output, or use the Python API.

positional arguments:
  input                 Input PDF document

options:
  -h, --help            show this help message and exit
  --password PASSWORD   A password to unlock the PDF, if encrypted
  --pages PAGES         Page numbers and ranges to include
  --strategy {range,bounded}
                        PDFium text extraction strategy (range, bounded).

Image Converter

$ pypdfium2 imgtopdf --help
usage: pypdfium2 imgtopdf [-h] --output OUTPUT [--inline] images [images ...]

Convert images to PDF

positional arguments:
  images               Input images

options:
  -h, --help           show this help message and exit
  --output, -o OUTPUT  Target path for the new PDF
  --inline             If JPEG, whether to use PDFium's inline loading function.

Pageobjects Info

$ pypdfium2 pageobjects --help
usage: pypdfium2 pageobjects [-h] [--password PASSWORD] [--pages PAGES]
                             [--n-digits N_DIGITS] [--filter T [T ...]]
                             [--max-depth MAX_DEPTH]
                             [--info {pos,imginfo,text} [{pos,imginfo,text} ...]]
                             input

Print info on pageobjects

positional arguments:
  input                 Input PDF document

options:
  -h, --help            show this help message and exit
  --password PASSWORD   A password to unlock the PDF, if encrypted
  --pages PAGES         Page numbers and ranges to include
  --n-digits N_DIGITS   Number of digits to which coordinates/sizes shall be rounded
  --filter T [T ...]    Object types to include. Choices: ['?', 'text', 'path', 'image', 'shading', 'form']
  --max-depth MAX_DEPTH
                        Maximum recursion depth to consider when descending into Form XObjects.
  --info {pos,imginfo,text} [{pos,imginfo,text} ...]
                        Object details to show.

Document Info

$ pypdfium2 pdfinfo --help
usage: pypdfium2 pdfinfo [-h] [--password PASSWORD] [--pages PAGES]
                         [--n-digits N_DIGITS]
                         input

Print info on document and pages

positional arguments:
  input                Input PDF document

options:
  -h, --help           show this help message and exit
  --password PASSWORD  A password to unlock the PDF, if encrypted
  --pages PAGES        Page numbers and ranges to include
  --n-digits N_DIGITS  Number of digits to which coordinates/sizes shall be rounded

Font Info

$ pypdfium2 fonts --help
You may want to install `tabulate` for prettier output.
usage: pypdfium2 fonts [-h] [--password PASSWORD] [--pages PAGES] input

List a document's fonts

Font objects are compared by memory address, so the same font name may occur multiple times
in different configurations (e.g. differing weights, or even hidden differences like /Subtype).
This is intentional. Nameless fonts may also occur.

positional arguments:
  input                Input PDF document

options:
  -h, --help           show this help message and exit
  --password PASSWORD  A password to unlock the PDF, if encrypted
  --pages PAGES        Page numbers and ranges to include

Renderer

$ pypdfium2 render --help
usage: pypdfium2 render [-h] [--password PASSWORD] [--pages PAGES]
                        --output OUTPUT [--prefix PREFIX] [--format FORMAT]
                        [--engine ENGINE_CLS] [--scale SCALE]
                        [--rotation {0,90,180,270}] [--fill-color C C C C]
                        [--optimize-mode {lcd,print}] [--crop C C C C]
                        [--draw-annots | --no-draw-annots]
                        [--draw-forms | --no-draw-forms]
                        [--no-antialias {text,image,path} [{text,image,path} ...]]
                        [--force-halftone]
                        [--bitmap-maker {native,foreign,foreign_packed,foreign_simple}]
                        [--grayscale] [--byteorder REV_BYTEORDER]
                        [--x-channel | --no-x-channel]
                        [--maybe-alpha | --no-maybe-alpha] [--linear [LINEAR]]
                        [--processes PROCESSES]
                        [--parallel-strategy {spawn,forkserver,fork}]
                        [--parallel-lib {mp,ft}] [--parallel-map PARALLEL_MAP]
                        [--sample-theme] [--path-fill C C C C]
                        [--path-stroke C C C C] [--text-fill C C C C]
                        [--text-stroke C C C C] [--fill-to-stroke]
                        [--invert-lightness] [--exclude-images]
                        input

Rasterize pages

positional arguments:
  input                 Input PDF document

options:
  -h, --help            show this help message and exit
  --password PASSWORD   A password to unlock the PDF, if encrypted
  --pages PAGES         Page numbers and ranges to include
  --output, -o OUTPUT   Output directory where the serially numbered images shall be placed.
  --prefix PREFIX       Custom prefix for the images. Defaults to the input filename's stem.
  --format, -f FORMAT   The image format to use (default: conditional).
  --engine ENGINE_CLS   The saver engine to use ('pil', 'numpy+pil', 'numpy+cv2')
  --scale SCALE         Define the resolution of the output images. By default, one PDF point (1/72in) is rendered to 1x1 pixel. This factor scales the number of pixels that represent one point.
  --rotation {0,90,180,270}
                        Rotate pages by 90, 180 or 270 degrees.
  --fill-color C C C C  Color the bitmap will be filled with before rendering. Shall be given in RGBA format as a sequence of integers ranging from 0 to 255. Defaults to white.
  --optimize-mode {lcd,print}
                        The rendering optimisation mode. None if not given.
  --crop C C C C        Amount to crop from (left, bottom, right, top).
  --draw-annots, --no-draw-annots
                        Whether annotations may be shown (default: true).
  --draw-forms, --no-draw-forms
                        Whether forms may be shown (default: true).
  --no-antialias {text,image,path} [{text,image,path} ...]
                        Item types that shall not be smoothed.
  --force-halftone      Always use halftone for image stretching.

Bitmap options:
  Bitmap config, including pixel format.

  --bitmap-maker {native,foreign,foreign_packed,foreign_simple}
                        The bitmap maker to use.
  --grayscale           Whether to render in grayscale mode (no colors).
  --byteorder REV_BYTEORDER
                        Whether to use BGR or RGB byteorder (default: conditional).
  --x-channel, --no-x-channel
                        Whether to prefer BGRx/RGBx over BGR/RGB (default: conditional).
  --maybe-alpha, --no-maybe-alpha
                        Whether to use BGRA if page content has transparency. Note, this makes format selection page-dependent. As this behavior can be confusing, it is not currently the default, but recommended for performance in these cases.

Parallelization:
  Options for rendering with multiple processes.

  --linear [LINEAR]     Render non-parallel if page count is less or equal to the specified value (default: 4). If this flag is given without a value, then render linear regardless of document length.
  --processes PROCESSES
                        The maximum number of parallel rendering processes. Defaults to the number of CPU cores.
  --parallel-strategy {spawn,forkserver,fork}
                        The process start method to use. ('fork' is discouraged due to stability issues.)
  --parallel-lib {mp,ft}
                        The parallelization module to use (mp = multiprocessing, ft = concurrent.futures).
  --parallel-map PARALLEL_MAP
                        The map function to use (backend specific, the default is an iterative map).

Flat color scheme:
  Options for using pdfium's color scheme renderer. Note that this may flatten different colors into one, so the usability of this is limited. Alternatively, consider post-processing with lightness inversion (see below).

  --sample-theme        Use a dark background sample theme as base. Explicit color params override selectively.
  --path-fill C C C C
  --path-stroke C C C C
  --text-fill C C C C
  --text-stroke C C C C
  --fill-to-stroke      When rendering with custom color scheme, only draw borders around fill areas using the `path_stroke` color, instead of filling with the `path_fill` color. This is actually recommended, since with a single fill color for paths the boundaries of adjacent fill paths are less visible.

Post processing:
  Options to post-process rendered images. Note, this may have a strongly negative impact on performance.

  --invert-lightness    Invert lightness using the HLS color space (e.g. white<->black, dark_blue<->light_blue). The intent is to achieve a dark theme for documents with light background, while providing better visual results than classical color inversion or a flat pdfium color scheme. However, note that --optimize-mode lcd is not recommendable when inverting lightness.
  --exclude-images      Whether to exclude PDF images from lightness inversion.

Page Tiler

$ pypdfium2 tile --help
usage: pypdfium2 tile [-h] [--password PASSWORD] --output OUTPUT --rows ROWS
                      --cols COLS --width WIDTH --height HEIGHT [--unit UNIT]
                      input

Tile pages (N-up)

positional arguments:
  input                Input PDF document

options:
  -h, --help           show this help message and exit
  --password PASSWORD  A password to unlock the PDF, if encrypted
  --output, -o OUTPUT  Target path for the new document
  --rows, -r ROWS      Number of rows (horizontal tiles)
  --cols, -c COLS      Number of columns (vertical tiles)
  --width WIDTH        Target width
  --height HEIGHT      Target height
  --unit, -u UNIT      Unit for target width and height (pt, mm, cm, in)

TOC Reader

$ pypdfium2 toc --help
usage: pypdfium2 toc [-h] [--password PASSWORD] [--n-digits N_DIGITS]
                     [--max-depth MAX_DEPTH]
                     [--color-indicator | --no-color-indicator]
                     input

Print table of contents

positional arguments:
  input                 Input PDF document

options:
  -h, --help            show this help message and exit
  --password PASSWORD   A password to unlock the PDF, if encrypted
  --n-digits N_DIGITS   Number of digits to which coordinates/sizes shall be rounded
  --max-depth MAX_DEPTH
                        Maximum recursion depth to consider when parsing the table of contents
  --color-indicator, --no-color-indicator
                        Whether to add a color indicator to bookmarks that declare a color. The indicator is a Unicode symbol wrapped in an ANSI escape sequence. Default is enabled.