DCR - Release History
Version 0.9.6
Release Date: 07.08.2022
1 New Features
- API documentation added
- Determination of bulleted lists.
- Determination of numbered lists.
- Determination of headings.
2 Modified Features
- Code refactoring.
3 Applied Software
Software | Version | Remark | Status |
---|---|---|---|
DBeaver | 22.1.0 | for virtual machine only [optional] | |
Docker Desktop | 20.10.17 | base version [Docker Image & VM] | |
Git | 2.25.1 | base version | |
Pandoc | 2.18 | ||
PFlib TET | 5.3 | ||
Poppler | 0.86.1 | base version | |
Python3 | 3.10.6 | upgrade | |
Python3 - pip | 22.1.2 | ||
Tesseract OCR | 5.1.0 | base version | |
TeX Live | 2019 | base version | |
TeX Live - pdfTeX | 3.14159265-2.6-1.40.20 | base version |
3.1 Unix-specific Software
Software | Version | Remark | Status |
---|---|---|---|
asdf | v0.10.2-7e7a1fa | base version (optional) | |
cURL | 7.68.0 | base version | |
dos2unix | 7.4.0 | base version | |
GCC & G++ | 9.4.0 | base version | |
GNU Autoconf | 2.69 | base version | |
GNU Automake | 1.16.1 | base version | |
GNU make | 4.2.1 | base version | |
htop | 3.2.1 | optional | |
OpenSSL | 1.1.1f | base version | |
procps | 3.3.16 | base version (optional) | |
tmux | 3.3a | optional | |
Ubuntu | 20.04.4 LTS | base version | |
Vim | 8.1.3741 | base version (optional) | |
Wget | 1.20.3 |
3.2 Windows-specific Software
Software | Version | Remark | Status |
---|---|---|---|
Grep for Windows | 2.5.4 | base version | |
Make for Windows | 3.81 | base version | |
sed for Windows | 4.2.1 | base version |
Version 0.9.3
Release Date: 17.06.2022
1 New Features
- Description of the algorithms for determining the line type.
- Determination of the lines belonging to the TOC (Table of Content).
2 Modified Features
- Major refactoring of the tokenizer.
- pylint: Adjustments for latest version.
3 Applied Software
Software | Version | Remark | Status |
---|---|---|---|
DBeaver | 22.1.0 | for virtual machine only [optional] | upgrade |
Docker Desktop | 20.10.17 | base version [Docker Image & VM] | upgrade |
Git | 2.25.1 | base version | |
Pandoc | 2.18 | ||
PFlib TET | 5.3 | ||
Poppler | 0.86.1 | base version | |
Python3 | 3.10.5 | upgrade | |
Python3 - pip | 22.1.2 | upgrade | |
Tesseract OCR | 5.1.0 | base version | |
TeX Live | 2019 | base version | |
TeX Live - pdfTeX | 3.14159265-2.6-1.40.20 | base version |
3.1 Unix-specific Software
Software | Version | Remark | Status |
---|---|---|---|
asdf | v0.10.2-7e7a1fa | base version (optional) | upgrade |
cURL | 7.68.0 | base version | |
dos2unix | 7.4.0 | base version | |
GCC & G++ | 9.4.0 | base version | |
GNU Autoconf | 2.69 | base version | |
GNU Automake | 1.16.1 | base version | |
GNU make | 4.2.1 | base version | |
htop | 3.2.1 | optional | upgrade |
OpenSSL | 1.1.1f | base version | |
procps | 3.3.16 | base version (optional) | |
tmux | 3.3a | optional | upgrade |
Ubuntu | 20.04.4 LTS | base version | upgrade |
Vim | 8.1.3741 | base version (optional) | |
Wget | 1.20.3 |
3.2 Windows-specific Software
Software | Version | Remark | Status |
---|---|---|---|
Grep for Windows | 2.5.4 | base version | |
Make for Windows | 3.81 | base version | |
sed for Windows | 4.2.1 | base version |
Version 0.9.2
Release Date: 01.06.2022
1 New Features
- object-oriented design.
- selectable spaCy token attributes completed.
2 Modified Features
- database schema refactored
3 Applied Software
Software | Version | Remark | Status |
---|---|---|---|
DBeaver | 22.0.5 | for virtual machine only [optional] | upgrade |
Docker Desktop | 20.10.16 | base version [Docker Image & VM] | upgrade |
Git | 2.25.1 | base version | |
Pandoc | 2.18 | ||
PFlib TET | 5.3 | ||
Poppler | 0.86.1 | base version | |
Python3 | 3.10.4 | ||
Python3 - pip | 22.0.4 | ||
Tesseract OCR | 5.1.0 | base version | |
TeX Live | 2019 | base version | |
TeX Live - pdfTeX | 3.14159265-2.6-1.40.20 | base version |
3.1 Unix-specific Software
Software | Version | Remark | Status |
---|---|---|---|
asdf | v0.10.1-711ad99 | base version (optional) | upgrade |
cURL | 7.68.0 | base version | |
dos2unix | 7.4.0 | base version | |
GCC & G++ | 9.4.0 | base version | |
GNU Autoconf | 2.69 | base version | |
GNU Automake | 1.16.1 | base version | |
GNU make | 4.2.1 | base version | |
htop | 3.2.0 | optional | |
OpenSSL | 1.1.1f | base version | |
procps | 3.3.16 | base version (optional) | |
tmux | 3.2a | optional | |
Ubuntu | 20.04.4 LTS | base version | upgrade |
Vim | 8.1.3741 | base version (optional) | |
Wget | 1.20.3 |
3.2 Windows-specific Software
Software | Version | Remark | Status |
---|---|---|---|
Grep for Windows | 2.5.4 | base version | |
Make for Windows | 3.81 | base version | |
sed for Windows | 4.2.1 | base version |
1.4 Open issues
Version 0.9.1
Release Date: 05.05.2022
1. New Features
- classification of lines into headers, footers and body lines
- support for documents in different languages - English, French, German and Italian as standard
- tokenizer based on spaCy
2. Modified Features
- extending the parser to the granularities page, line and word
- refactoring to separate preprocessor and NLP specific processes
3. Applied Software
Software | Version | Remark | Status |
---|---|---|---|
DBeaver | 22.0.4 | for virtual machine only [optional] | upgrade |
Docker Desktop | 20.10.14 | base version [Docker Image & VM] | upgrade |
Git | 2.25.1 | base version | |
Pandoc | 2.18 | upgrade | |
PFlib TET | 5.3 | ||
Poppler | 0.86.1 | base version | |
Python3 | 3.10.4 | upgrade | |
Python3 - pip | 22.0.4 | ||
Tesseract OCR | 5.10.0 | base version | |
TeX Live | 2019 | base version | |
TeX Live - pdfTeX | 3.14159265-2.6-1.40.20 | base version |
3.1 Unix-specific Software
Software | Version | Remark | Status |
---|---|---|---|
asdf | v0.10.0-a9caa5b | base version (optional) | upgrade |
cURL | 7.6.80 | base version | |
dos2unix | 7.4.0 | base version | |
GCC & G++ | 9.4.0 | base version | |
GNU Autoconf | 2.69 | base version | |
GNU Automake | 1.16.1 | base version | |
GNU make | 4.2.1 | base version | |
htop | 3.2.0 | optional | upgrade |
OpenSSL | 1.1.1f | base version | |
procps | 3.3.16 | base version (optional) | |
tmux | 3.2a | optional | |
Ubuntu | 20.04.4 LTS | base version | |
Vim | 8.1.3741 | base version (optional) | upgrade |
Wget | 1.20.3 |
3.2 Windows-specific Software
Software | Version | Remark | Status |
---|---|---|---|
Grep for Windows | 2.5.4 | base version | |
Make for Windows | 3.81 | base version | |
sed for Windows | 4.2.1 | base version |
Version 0.9.0
Release Date: 06.04.2022
1. New Features
- support for documents in different languages - English, French, German and Italian as standard
2. Applied Software
Software | Version | Remark | Status |
---|---|---|---|
DBeaver | 22.0.2 | for virtual machine only [optional] | upgrade |
Docker Desktop | 20.10.14 | base version [Docker Image & VM] | upgrade |
Git | 2.25.1 | base version | |
Pandoc | 2.18 | upgrade | |
PFlib TET | 5.3 | ||
Poppler | 0.86.1 | base version | |
Python3 | 3.10.4 | upgrade | |
Python3 - pip | 22.0.4 | ||
Tesseract OCR | 5.10.0 | base version | |
TeX Live | 2019 | base version | |
TeX Live - pdfTeX | 3.14159265-2.6-1.40.20 | base version |
2.1 Unix-specific Software
Software | Version | Remark | Status |
---|---|---|---|
asdf | v0.9.0-e0d27e6 | base version (optional) | |
cURL | 7.6.80 | base version | |
dos2unix | 7.4.0 | base version | |
GCC & G++ | 9.4.0 | base version | |
GNU Autoconf | 2.69 | base version | |
GNU Automake | 1.16.1 | base version | |
GNU make | 4.2.1 | base version | |
htop | 3.1.2 | optional | |
OpenSSL | 1.1.1f | base version | |
procps | 3.3.16 | base version (optional) | |
tmux | 3.2a | optional | |
Ubuntu | 20.04.4 LTS | base version | |
Vim | 8.1.2269 | base version (optional) | |
Wget | 1.20.3 |
2.2 Windows-specific Software
Software | Version | Remark | Status |
---|---|---|---|
Grep for Windows | 2.5.4 | base version | |
Make for Windows | 3.81 | base version | |
sed for Windows | 4.2.1 | base version |
Version 0.8.0
Release Date: 18.03.2022
1. New Features
- processing step
tet
: Extract text frompdf
files.
2. Applied Software
Software | Version | Remark | Status |
---|---|---|---|
DBeaver | 22.0.0 | for virtual machine only [optional] | |
Docker Desktop | 20.10.13 | base version [Docker Image & VM] | |
Git | 2.25.1 | base version | |
Pandoc | 2.17.1.1 | ||
PFlib TET | 5.3 | new | |
Poppler | 0.86.1 | base version | |
Python3 | 3.10.3 | upgrade | |
Python3 - pip | 22.0.4 | ||
Tesseract OCR | 5.10.0 | base version | |
TeX Live | 2019 | base version | |
TeX Live - pdfTeX | 3.14159265-2.6-1.40.20 | base version |
2.1 Unix-specific Software
Software | Version | Remark | Status |
---|---|---|---|
asdf | v0.9.0-e0d27e6 | base version (optional) | |
cURL | 7.6.80 | base version | |
dos2unix | 7.4.0 | base version | |
GCC & G++ | 9.4.0 | base version | |
GNU Autoconf | 2.69 | base version | |
GNU Automake | 1.16.1 | base version | |
GNU make | 4.2.1 | base version | |
htop | 3.1.2 | optional | |
OpenSSL | 1.1.1f | base version | |
procps | 3.3.16 | base version (optional) | |
tmux | 3.2a | optional | |
Ubuntu | 20.04.4 LTS | base version | |
Vim | 8.1.2269 | base version (optional) | |
Wget | 1.20.3 |
2.2 Windows-specific Software
Software | Version | Remark | Status |
---|---|---|---|
Grep for Windows | 2.5.4 | base version | |
Make for Windows | 3.81 | base version | |
sed for Windows | 4.2.1 | base version |
Version 0.7.0
Release Date: 15.03.2022
1. New Features
- processing step
ocr
: Convert appropriate image files topdf
files.
2. Applied Software
Software | Version | Remark | Status |
---|---|---|---|
DBeaver | 22.0.0 | for virtual machine only [optional] | |
Docker Desktop | 20.10.13 | base version [Docker Image & VM] | upgrade |
Git | 2.25.1 | base version | |
Pandoc | 2.17.1.1 | ||
Poppler | 0.86.1 | base version | |
Python3 | 3.10.2 | ||
Python3 - pip | 22.0.4 | ||
Tesseract OCR | 5.10.0 | base version | new |
TeX Live | 2019 | base version | |
TeX Live - pdfTeX | 3.14159265-2.6-1.40.20 | base version |
2.1 Unix-specific Software
Software | Version | Remark | Status |
---|---|---|---|
asdf | v0.9.0-e0d27e6 | base version (optional) | |
cURL | 7.6.80 | base version | |
dos2unix | 7.4.0 | base version | |
GCC & G++ | 9.4.0 | base version | |
GNU Autoconf | 2.69 | base version | |
GNU Automake | 1.16.1 | base version | |
GNU make | 4.2.1 | base version | |
htop | 3.1.2 | optional | |
OpenSSL | 1.1.1f | base version | |
procps | 3.3.16 | base version (optional) | |
tmux | 3.2a | optional | |
Ubuntu | 20.04.4 LTS | base version | |
Vim | 8.1.2269 | base version (optional) | |
Wget | 1.20.3 |
2.2 Windows-specific Software
Software | Version | Remark | Status |
---|---|---|---|
Grep for Windows | 2.5.4 | base version | |
Make for Windows | 3.81 | base version | |
sed for Windows | 4.2.1 | base version |
Version 0.6.5
Release Date: 10.03.2022
1. New Features
- processing step
n_2_p
: Convert appropriate non-pdf documents topdf
files.
2. Applied Software
Software | Version | Remark | Status |
---|---|---|---|
DBeaver | 22.0.0 | for virtual machine only [optional] | upgrade |
Docker Desktop | 20.10.12 | base version [Docker Image & VM] | |
Git | 2.25.1 | base version | |
Pandoc | 2.17.1.1 | new | |
Poppler | 0.86.1 | base version | |
Python3 | 3.10.2 | ||
Python3 - pip | 22.0.4 | upgrade | |
TeX Live | 2019 | base version | new |
TeX Live - pdfTeX | 3.14159265-2.6-1.40.20 | base version | new |
2.1 Unix-specific Software
Software | Version | Remark | Status |
---|---|---|---|
asdf | v0.9.0-e0d27e6 | base version (optional) | |
cURL | 7.6.80 | base version | |
dos2unix | 7.4.0 | base version | |
GCC & G++ | 9.4.0 | base version | upgrade |
GNU Autoconf | 2.69 | base version | |
GNU Automake | 1.16.1 | base version | |
GNU make | 4.2.1 | base version | |
htop | 3.1.2 | optional | |
OpenSSL | 1.1.1f | base version | |
procps | 3.3.16 | base version (optional) | |
tmux | 3.2a | optional | |
Ubuntu | 20.04.4 LTS | base version | upgrade |
Vim | 8.1.2269 | base version (optional) | |
Wget | 1.20.3 |
2.2 Windows-specific Software
Software | Version | Remark | Status |
---|---|---|---|
Grep for Windows | 2.5.4 | base version | |
Make for Windows | 3.81 | base version | |
sed for Windows | 4.2.1 | base version |
Version 0.6.0
Release Date: 04.03.2022
1. New Features
- Processing step
db_u
: Upgrade the database. - Processing step
p_2_i
: Convertpdf
documents into image files.
2. Applied Software
Software | Version | Remark | Status |
---|---|---|---|
DBeaver | 21.3.5 | for virtual machine only [optional] | |
Docker Desktop | 20.10.12 | base version [Docker Image & VM] | new |
Git | 2.25.1 | base version | |
Poppler | 0.86.1 | base version | new |
Python3 | 3.10.2 | ||
Python3 - pip | 22.0.3 |
2.1 Unix-specific Software
Software | Version | Remark | Status |
---|---|---|---|
asdf | v0.9.0-e0d27e6 | base version (optional) | |
cURL | 7.6.80 | base version | |
dos2unix | 7.4.0 | base version | |
GCC & G++ | 9.3.0 | base version | |
GNU Autoconf | 2.69 | base version | |
GNU Automake | 1.16.1 | base version | |
GNU make | 4.2.1 | base version | |
htop | 3.1.2 | optional | |
OpenSSL | 1.1.1f | base version | |
procps | 3.3.16 | base version (optional) | |
tmux | 3.2a | optional | |
Ubuntu | 20.04.3 LTS | base version | |
Vim | 8.1.3741 | base version (optional) | |
Wget | 1.20.3 |
2.2 Windows-specific Software
Software | Version | Remark | Status |
---|---|---|---|
Grep for Windows | 2.5.4 | base version | new |
Make for Windows | 3.81 | base version | new |
sed for Windows | 4.2.1 | base version | new |
Version 0.5.0
Release Date: 14.02.2022
1. New Features
- Setup of the entire development infrastructure
- Creation of the first version of the user documentation
- Processing of new document arrivals in the file directory
ìnbox
2. Applied Software
Software | Version | Remark |
---|---|---|
asdf | v0.9.0-e0d27e6 | base version |
cURL | 7.6.80 | base version |
DBeaver | 21.3.4 | for virtual machine only |
dos2unix | 7.4.0 | base version |
GCC & G++ | 9.3.0 | base version |
Git | 2.25.1 | base version |
GNU Autoconf | 2.69 | base version |
GNU Automake | 1.16.1 | base version |
GNU make | 4.2.1 | base version |
htop | 3.1.2 | |
OpenSSL | 1.1.1f | base version |
procps-ng | 3.3.16 | base version |
Python3 | 3.10.2 | |
Python3 - pip | 22.0.3 | |
tmux | 3.2a | |
Ubuntu | 20.04.3 LTS | base version |
Vim | 8.1.3741 | base version |
Wget | 1.20.3 |