Plagiarism Detector has several methods of Extracting Plain text from each supported file type.
(
The List of supported file formats)
For each file type has it's own Text Extraction Engines.
Text Extraction Engine (TEE) is a sub-program that extracts text from a specified file type.
By default, every time you start Document Manager - Plagiarism Detector automatically selects the most optimal TEE for each supported file type.
Still, there exist 2 cases, when you possibly need to change TEE to get better text extraction:
- Incomplete Text Extraction (some parts of the document are missing).
- Dirty Text Extraction (when html code is present in the document).
- Wrong encoding detection for complex file formats.
TEEs configuration made with Document Manager is global - it is used in all cases.
TEEs configuration made with Advanced Report Viewer (ARV) is local - it is used only within ARV session and then reseted to global setting.
To change a particular TEE for a specific file type, click to the right, combobox appears:
(The example below shows how to change TEE for DocX files)
