Digitization of documents
Within the numerous projects that Neurosoft has carried out so far also the services of digitizing documents have been provided. Various types of documents were processed, ranging from contemporary prints of Polish Standards (Polskie Normy) to old editions of the Polish Journal of Laws (Dziennik Ustaw, published since 1918) or the Polish Monitor (Monitor Polski). In the digitization process we have used, above all, our own technologies and tools. The requirements connected with scanning documents of very poor quality or untypical formats forced us to design our own equipment – ‘work scanner’ (“skaner dziełowy”).
Our experience and the growing demand for such services made us prepare a special offer for the digitization of documents.
We digitize documents up to A3 format and depending on the customer’s needs we provide the final product in various formats.
There are three stages in the digitization process:
- preparation of materials
- scanning / photographing
- processing the scan output file
- generation of the final products of digitization (OCR, ICR, manual data input).
Preparation of materials
At this stage, the material to be subjected to the process of digitization is verified. The criteria taken into account are as following:
- type of document (an old print, font size, number of pages, …)
- the possibility of damaging the original (scanning books after cutting their backbones)
- the quality of the generated image
- the technical capabilities of the place where the scanning process is taking place.
This stage results in creating in appropriate schedule taking into account both the technical and substantive possibilities.
Scanning / photographing
Depending on the technical capabilities, the number of pages to be processed and the type of document we use the industrial Fujitsu scanners. With our equipment we can perform up to 2,5 thousand photos a day. The use of our solution does not cause any damage to the original document. The immediate effect of the scanning are the result files in TIFF, JPEG, DjVu or BIP.
The processing of the result files of the image scanning.
In the image processing the image is straightened (especially important when scanning thick books), the spots or other interference characters are removed. Depending on the quality of the source material also various types of filters are used.
Generation of the final products of digitization
The stage of generating the final products closes the digitization process, although from the point of view of Neurosoft it may also be the first stage of the project to create the presentation of the prepared data. The source material are photographs of the original images which can be subjected to the process of recognition (OCR), structuralization or the generation of keywords for full-text search. In the final stage we generate the final set of files:
- scans (photos of the originals, TIFF, JPEG, DjVu, BIP, …),
- XML files containing recognized and possibly structuralized content of the original
- auxiliary files containing the keywords necessary for full-text search
We certainly prefer to use our own BIP format.