Skip to content

DCR-CORE - Application - Requirements

GitHub (Pre-)Release GitHub (Pre-)Release Date

The required software is listed below. Regarding the corresponding software versions, you will find the detailed information in the Release Notes.

1. Operating System

Continuous delivery / integration (CD/CI) runs on Ubuntu and development is done with Windows 10. For the Windows operating systems, only additional the functionality of the grep, make and sed tools must be made available, e.g. via Grep for Windows, Make for Windows or sed for Windows.

2. Pandoc & TeX Live

To convert the non-PDF documents into pdf files for PDFlib TET processing, the universal document converter Pandoc and the TeX typesetting system TeX Live are used and must therefore also be installed. The installation of the TeX Live Frontend is not required.

3. PDFlib TET

The software library PDFlib TET is used to tokenize the pdf documents. DCR-CORE contains the free version of PDFlib TET. This free version is limited to files with a maximum size of 1 MB and a maximum number of pages of 10. If larger files are to be processed, a licence must be purchased from PDFlib GmbH. Details on the conditions can be found here.

4. Poppler

To convert the scanned PDF documents into image files for Tesseract OCR, the rendering library Poppler is used and must therefore also be installed.

5. Python

Because of the use of the new typing features, Python is required.

6. Tesseract OCR

To convert image files into pdf files, Tesseract OCR is required.