ocrmypdf

add an OCR text layer to scanned PDF files

WWW CVSWeb GITHub
  1. Package version
    ocrmypdf-16.1.1
  2. Maintainer
    The OpenBSD ports mailing-list

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to
be searched or copy+pasted.

- Generates a searchable PDF/A file from a regular PDF
- Places OCR text accurately below the image to ease copy / paste
- Keeps the exact resolution of the original embedded images
- When possible, inserts OCR information as a "lossless" operation
without disrupting any other content
- Optimizes PDF images, often producing files smaller than the input file
- If requested, deskews and/or cleans the image before performing OCR
- Validates input and output files
- Distributes work across all available CPU cores
- Uses Tesseract OCR engine to recognize more than 100 languages
(use "pkg_info -Q tesseract" to locate language packs to install)
- Keeps your private data private
- Scales properly to handle files with thousands of pages
- Battle-tested on millions of PDFs

ocrmypdf # it's a scriptable command line program
-l eng+fra # it supports multiple languages
--rotate-pages # it can fix pages that are misrotated
--deskew # it can deskew crooked PDFs!
--title "My PDF" # it can change output metadata
--jobs 4 # it uses multiple cores by default
--output-type pdfa # it produces PDF/A by default
input_scanned.pdf # takes PDF input (or images)
output_searchable.pdf # produces validated PDF output

  • lang/python/3.10
  • devel/py-build,python3
  • devel/py-installer,python3
  • devel/py-setuptools,python3
  • devel/py-wheel,python3
  • devel/py-setuptools_scm,python3

  • graphics/py-Pillow,python3
  • textproc/py-coloredlogs,python3
  • devel/py-deprecation,python3
  • graphics/img2pdf
  • sysutils/py-packaging,python3
  • textproc/py-pdfminer,python3
  • print/py-pikepdf,python3
  • devel/py-pluggy,python3
  • print/py-reportlab,python3
  • devel/py-rich,python3
  • devel/py-tqdm,python3
  • graphics/tesseract/tesseract
  • graphics/tesseract/tessdata,-osd
  • graphics/pngquant
  • print/ghostscript/gnu
  • print/unpaper
  • lang/python/3.10