DCR-CORE - Application - Requirements
The required software is listed below. Regarding the corresponding software versions, you will find the detailed information in the Release Notes.
1. Operating System
Continuous delivery / integration (CD/CI) runs on Ubuntu
and development is done with Windows 10
.
For the Windows operating systems, only additional the functionality of the grep
, make
and sed
tools must be made available, e.g. via Grep for Windows, Make for Windows or sed for Windows.
2. Pandoc & TeX Live
To convert the non-PDF documents into pdf
files for PDFlib TET processing,
the universal document converter Pandoc
and the TeX typesetting system TeX Live are used and must therefore also be installed.
The installation of the TeX Live Frontend is not required.
3. PDFlib TET
The software library PDFlib TET is used to tokenize the pdf
documents.
DCR-CORE
contains the free version of PDFlib TET.
This free version is limited to files with a maximum size of 1 MB and a maximum number of pages of 10.
If larger files are to be processed, a licence must be purchased from PDFlib GmbH.
Details on the conditions can be found here.
4. Poppler
To convert the scanned PDF documents into image files for Tesseract OCR, the rendering library Poppler is used and must therefore also be installed.
5. Python
Because of the use of the new typing features, Python is required.
6. Tesseract OCR
To convert image files into pdf
files, Tesseract OCR is required.