Skip to content

DCR - Release History

GitHub (Pre-)Release GitHub (Pre-)Release Date

Version 0.9.6

Release Date: 07.08.2022

1 New Features

  • API documentation added
  • Determination of bulleted lists.
  • Determination of numbered lists.
  • Determination of headings.

2 Modified Features

  • Code refactoring.

3 Applied Software

Software Version Remark Status
DBeaver 22.1.0 for virtual machine only [optional]
Docker Desktop 20.10.17 base version [Docker Image & VM]
Git 2.25.1 base version
Pandoc 2.18
PFlib TET 5.3
Poppler 0.86.1 base version
Python3 3.10.6 upgrade
Python3 - pip 22.1.2
Tesseract OCR 5.1.0 base version
TeX Live 2019 base version
TeX Live - pdfTeX 3.14159265-2.6-1.40.20 base version

3.1 Unix-specific Software

Software Version Remark Status
asdf v0.10.2-7e7a1fa base version (optional)
cURL 7.68.0 base version
dos2unix 7.4.0 base version
GCC & G++ 9.4.0 base version
GNU Autoconf 2.69 base version
GNU Automake 1.16.1 base version
GNU make 4.2.1 base version
htop 3.2.1 optional
OpenSSL 1.1.1f base version
procps 3.3.16 base version (optional)
tmux 3.3a optional
Ubuntu 20.04.4 LTS base version
Vim 8.1.3741 base version (optional)
Wget 1.20.3

3.2 Windows-specific Software

Software Version Remark Status
Grep for Windows 2.5.4 base version
Make for Windows 3.81 base version
sed for Windows 4.2.1 base version

Version 0.9.3

Release Date: 17.06.2022

1 New Features

  • Description of the algorithms for determining the line type.
  • Determination of the lines belonging to the TOC (Table of Content).

2 Modified Features

  • Major refactoring of the tokenizer.
  • pylint: Adjustments for latest version.

3 Applied Software

Software Version Remark Status
DBeaver 22.1.0 for virtual machine only [optional] upgrade
Docker Desktop 20.10.17 base version [Docker Image & VM] upgrade
Git 2.25.1 base version
Pandoc 2.18
PFlib TET 5.3
Poppler 0.86.1 base version
Python3 3.10.5 upgrade
Python3 - pip 22.1.2 upgrade
Tesseract OCR 5.1.0 base version
TeX Live 2019 base version
TeX Live - pdfTeX 3.14159265-2.6-1.40.20 base version

3.1 Unix-specific Software

Software Version Remark Status
asdf v0.10.2-7e7a1fa base version (optional) upgrade
cURL 7.68.0 base version
dos2unix 7.4.0 base version
GCC & G++ 9.4.0 base version
GNU Autoconf 2.69 base version
GNU Automake 1.16.1 base version
GNU make 4.2.1 base version
htop 3.2.1 optional upgrade
OpenSSL 1.1.1f base version
procps 3.3.16 base version (optional)
tmux 3.3a optional upgrade
Ubuntu 20.04.4 LTS base version upgrade
Vim 8.1.3741 base version (optional)
Wget 1.20.3

3.2 Windows-specific Software

Software Version Remark Status
Grep for Windows 2.5.4 base version
Make for Windows 3.81 base version
sed for Windows 4.2.1 base version

Version 0.9.2

Release Date: 01.06.2022

1 New Features

  • object-oriented design.
  • selectable spaCy token attributes completed.

2 Modified Features

  • database schema refactored

3 Applied Software

Software Version Remark Status
DBeaver 22.0.5 for virtual machine only [optional] upgrade
Docker Desktop 20.10.16 base version [Docker Image & VM] upgrade
Git 2.25.1 base version
Pandoc 2.18
PFlib TET 5.3
Poppler 0.86.1 base version
Python3 3.10.4
Python3 - pip 22.0.4
Tesseract OCR 5.1.0 base version
TeX Live 2019 base version
TeX Live - pdfTeX 3.14159265-2.6-1.40.20 base version

3.1 Unix-specific Software

Software Version Remark Status
asdf v0.10.1-711ad99 base version (optional) upgrade
cURL 7.68.0 base version
dos2unix 7.4.0 base version
GCC & G++ 9.4.0 base version
GNU Autoconf 2.69 base version
GNU Automake 1.16.1 base version
GNU make 4.2.1 base version
htop 3.2.0 optional
OpenSSL 1.1.1f base version
procps 3.3.16 base version (optional)
tmux 3.2a optional
Ubuntu 20.04.4 LTS base version upgrade
Vim 8.1.3741 base version (optional)
Wget 1.20.3

3.2 Windows-specific Software

Software Version Remark Status
Grep for Windows 2.5.4 base version
Make for Windows 3.81 base version
sed for Windows 4.2.1 base version

1.4 Open issues

  1. Microsoft Windows Server 2019: (see here)

  2. MkApi: (see here)

  3. Tesseract OCR: (see here)

Version 0.9.1

Release Date: 05.05.2022

1. New Features

  • classification of lines into headers, footers and body lines
  • support for documents in different languages - English, French, German and Italian as standard
  • tokenizer based on spaCy

2. Modified Features

  • extending the parser to the granularities page, line and word
  • refactoring to separate preprocessor and NLP specific processes

3. Applied Software

Software Version Remark Status
DBeaver 22.0.4 for virtual machine only [optional] upgrade
Docker Desktop 20.10.14 base version [Docker Image & VM] upgrade
Git 2.25.1 base version
Pandoc 2.18 upgrade
PFlib TET 5.3
Poppler 0.86.1 base version
Python3 3.10.4 upgrade
Python3 - pip 22.0.4
Tesseract OCR 5.10.0 base version
TeX Live 2019 base version
TeX Live - pdfTeX 3.14159265-2.6-1.40.20 base version

3.1 Unix-specific Software

Software Version Remark Status
asdf v0.10.0-a9caa5b base version (optional) upgrade
cURL 7.6.80 base version
dos2unix 7.4.0 base version
GCC & G++ 9.4.0 base version
GNU Autoconf 2.69 base version
GNU Automake 1.16.1 base version
GNU make 4.2.1 base version
htop 3.2.0 optional upgrade
OpenSSL 1.1.1f base version
procps 3.3.16 base version (optional)
tmux 3.2a optional
Ubuntu 20.04.4 LTS base version
Vim 8.1.3741 base version (optional) upgrade
Wget 1.20.3

3.2 Windows-specific Software

Software Version Remark Status
Grep for Windows 2.5.4 base version
Make for Windows 3.81 base version
sed for Windows 4.2.1 base version

Version 0.9.0

Release Date: 06.04.2022

1. New Features

  • support for documents in different languages - English, French, German and Italian as standard

2. Applied Software

Software Version Remark Status
DBeaver 22.0.2 for virtual machine only [optional] upgrade
Docker Desktop 20.10.14 base version [Docker Image & VM] upgrade
Git 2.25.1 base version
Pandoc 2.18 upgrade
PFlib TET 5.3
Poppler 0.86.1 base version
Python3 3.10.4 upgrade
Python3 - pip 22.0.4
Tesseract OCR 5.10.0 base version
TeX Live 2019 base version
TeX Live - pdfTeX 3.14159265-2.6-1.40.20 base version

2.1 Unix-specific Software

Software Version Remark Status
asdf v0.9.0-e0d27e6 base version (optional)
cURL 7.6.80 base version
dos2unix 7.4.0 base version
GCC & G++ 9.4.0 base version
GNU Autoconf 2.69 base version
GNU Automake 1.16.1 base version
GNU make 4.2.1 base version
htop 3.1.2 optional
OpenSSL 1.1.1f base version
procps 3.3.16 base version (optional)
tmux 3.2a optional
Ubuntu 20.04.4 LTS base version
Vim 8.1.2269 base version (optional)
Wget 1.20.3

2.2 Windows-specific Software

Software Version Remark Status
Grep for Windows 2.5.4 base version
Make for Windows 3.81 base version
sed for Windows 4.2.1 base version

Version 0.8.0

Release Date: 18.03.2022

1. New Features

  • processing step tet: Extract text from pdf files.

2. Applied Software

Software Version Remark Status
DBeaver 22.0.0 for virtual machine only [optional]
Docker Desktop 20.10.13 base version [Docker Image & VM]
Git 2.25.1 base version
Pandoc 2.17.1.1
PFlib TET 5.3 new
Poppler 0.86.1 base version
Python3 3.10.3 upgrade
Python3 - pip 22.0.4
Tesseract OCR 5.10.0 base version
TeX Live 2019 base version
TeX Live - pdfTeX 3.14159265-2.6-1.40.20 base version

2.1 Unix-specific Software

Software Version Remark Status
asdf v0.9.0-e0d27e6 base version (optional)
cURL 7.6.80 base version
dos2unix 7.4.0 base version
GCC & G++ 9.4.0 base version
GNU Autoconf 2.69 base version
GNU Automake 1.16.1 base version
GNU make 4.2.1 base version
htop 3.1.2 optional
OpenSSL 1.1.1f base version
procps 3.3.16 base version (optional)
tmux 3.2a optional
Ubuntu 20.04.4 LTS base version
Vim 8.1.2269 base version (optional)
Wget 1.20.3

2.2 Windows-specific Software

Software Version Remark Status
Grep for Windows 2.5.4 base version
Make for Windows 3.81 base version
sed for Windows 4.2.1 base version

Version 0.7.0

Release Date: 15.03.2022

1. New Features

  • processing step ocr: Convert appropriate image files to pdf files.

2. Applied Software

Software Version Remark Status
DBeaver 22.0.0 for virtual machine only [optional]
Docker Desktop 20.10.13 base version [Docker Image & VM] upgrade
Git 2.25.1 base version
Pandoc 2.17.1.1
Poppler 0.86.1 base version
Python3 3.10.2
Python3 - pip 22.0.4
Tesseract OCR 5.10.0 base version new
TeX Live 2019 base version
TeX Live - pdfTeX 3.14159265-2.6-1.40.20 base version

2.1 Unix-specific Software

Software Version Remark Status
asdf v0.9.0-e0d27e6 base version (optional)
cURL 7.6.80 base version
dos2unix 7.4.0 base version
GCC & G++ 9.4.0 base version
GNU Autoconf 2.69 base version
GNU Automake 1.16.1 base version
GNU make 4.2.1 base version
htop 3.1.2 optional
OpenSSL 1.1.1f base version
procps 3.3.16 base version (optional)
tmux 3.2a optional
Ubuntu 20.04.4 LTS base version
Vim 8.1.2269 base version (optional)
Wget 1.20.3

2.2 Windows-specific Software

Software Version Remark Status
Grep for Windows 2.5.4 base version
Make for Windows 3.81 base version
sed for Windows 4.2.1 base version

Version 0.6.5

Release Date: 10.03.2022

1. New Features

  • processing step n_2_p: Convert appropriate non-pdf documents to pdf files.

2. Applied Software

Software Version Remark Status
DBeaver 22.0.0 for virtual machine only [optional] upgrade
Docker Desktop 20.10.12 base version [Docker Image & VM]
Git 2.25.1 base version
Pandoc 2.17.1.1 new
Poppler 0.86.1 base version
Python3 3.10.2
Python3 - pip 22.0.4 upgrade
TeX Live 2019 base version new
TeX Live - pdfTeX 3.14159265-2.6-1.40.20 base version new

2.1 Unix-specific Software

Software Version Remark Status
asdf v0.9.0-e0d27e6 base version (optional)
cURL 7.6.80 base version
dos2unix 7.4.0 base version
GCC & G++ 9.4.0 base version upgrade
GNU Autoconf 2.69 base version
GNU Automake 1.16.1 base version
GNU make 4.2.1 base version
htop 3.1.2 optional
OpenSSL 1.1.1f base version
procps 3.3.16 base version (optional)
tmux 3.2a optional
Ubuntu 20.04.4 LTS base version upgrade
Vim 8.1.2269 base version (optional)
Wget 1.20.3

2.2 Windows-specific Software

Software Version Remark Status
Grep for Windows 2.5.4 base version
Make for Windows 3.81 base version
sed for Windows 4.2.1 base version

Version 0.6.0

Release Date: 04.03.2022

1. New Features

  • Processing step db_u: Upgrade the database.
  • Processing step p_2_i: Convert pdf documents into image files.

2. Applied Software

Software Version Remark Status
DBeaver 21.3.5 for virtual machine only [optional]
Docker Desktop 20.10.12 base version [Docker Image & VM] new
Git 2.25.1 base version
Poppler 0.86.1 base version new
Python3 3.10.2
Python3 - pip 22.0.3

2.1 Unix-specific Software

Software Version Remark Status
asdf v0.9.0-e0d27e6 base version (optional)
cURL 7.6.80 base version
dos2unix 7.4.0 base version
GCC & G++ 9.3.0 base version
GNU Autoconf 2.69 base version
GNU Automake 1.16.1 base version
GNU make 4.2.1 base version
htop 3.1.2 optional
OpenSSL 1.1.1f base version
procps 3.3.16 base version (optional)
tmux 3.2a optional
Ubuntu 20.04.3 LTS base version
Vim 8.1.3741 base version (optional)
Wget 1.20.3

2.2 Windows-specific Software

Software Version Remark Status
Grep for Windows 2.5.4 base version new
Make for Windows 3.81 base version new
sed for Windows 4.2.1 base version new

Version 0.5.0

Release Date: 14.02.2022

1. New Features

  • Setup of the entire development infrastructure
  • Creation of the first version of the user documentation
  • Processing of new document arrivals in the file directory ìnbox

2. Applied Software

Software Version Remark
asdf v0.9.0-e0d27e6 base version
cURL 7.6.80 base version
DBeaver 21.3.4 for virtual machine only
dos2unix 7.4.0 base version
GCC & G++ 9.3.0 base version
Git 2.25.1 base version
GNU Autoconf 2.69 base version
GNU Automake 1.16.1 base version
GNU make 4.2.1 base version
htop 3.1.2
OpenSSL 1.1.1f base version
procps-ng 3.3.16 base version
Python3 3.10.2
Python3 - pip 22.0.3
tmux 3.2a
Ubuntu 20.04.3 LTS base version
Vim 8.1.3741 base version
Wget 1.20.3