Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Topics - Mike Sanders

Pages: 1 2 [3] 4
"Silver Bullets" / What is "Use Last Session"\"Manually add Documents"?
« on: September 20, 2013, 10:50:40 AM »
If you need to have the same document (or multiple documents) checked for the second time with the same options\or different options,
you can check "use Last session docs" to do a new check for plagiarism from scratch.

If you check "add docs manually" - you will be prompted to select documents manually.

This setting is re-set at each DocManager start.

+ most important: increased Plagiarism Detection sensitivity. We used a number of manually-rewritten documents to get the most optimal detection behavior.
+ default session option is reseted to "manually adding documents" at program restart.
+ large number of reports is now displayed with progress bar indicating their loading process.
+ filters in Report Viewer are reseted at form open stage. Thus fixed bug with incorrect Originality Report loading on active filters.
+ added more "Content-Types" to avoid resource processing due UnsupportedContentType error.
+ added "what's new in the update" - to the auto-update system linking to this board.
+ fixed Blue-Help buttons.

Known issues:
- support dropped for TC configuration and pre-sets. (we will fix\update this in 2 weeks approximately.)

We would like to thank lordlegion for extensive feedback and
provided files with rewritten sections - that helped us greatly to configure the new version!

+ Added "Wrong File" warning screen for better explanation on how to work with Shell Context Menu.
+ Second screen modal bug fixed.
+ Added "Blue Help Buttons" linking to the related Community Forum threads.
+ Core Version is now hard-coded into the main executable.
+ Better support for "File in use" issue. Tested with "Read non-exclusive Lock" and "Write-Read exclusive Lock".


"Silver Bullets" / Wrong File warning!
« on: September 15, 2013, 05:06:06 PM »
Dear client, you are doing it wrong!

You are trying to check the PORGRAM itself, instead
of checking a target file:

Your target file is usually a Microsoft Word of PDF Document.
So you need to use the CONTEXT menu on that file, and NOT the Link of the Plagiarism Detector!

"Silver Bullets" / Text Extrcation Engines Configuration
« on: September 15, 2013, 04:47:16 PM »
Plagiarism Detector has several methods of Extracting Plain text from each supported file type.
(The List of supported file formats)

For each file type has it's own Text Extraction Engines.

Text Extraction Engine (TEE) is a sub-program that extracts text from a specified file type.

By default, every time you start Document Manager - Plagiarism Detector automatically selects the most optimal TEE for each supported file type.

Still, there exist 2 cases, when you possibly need to change TEE to get better text extraction:
  • Incomplete Text Extraction (some parts of the document are missing).
  • Dirty Text Extraction (when html code is present in the document).
  • Wrong encoding detection for complex file formats.
TEEs configuration made with Document Manager is global - it is used in all cases.
TEEs configuration made with Advanced Report Viewer (ARV) is local - it is used only within ARV session and then reseted to global setting.

To change a particular TEE for a specific file type, click to the right, combobox appears:
(The example below shows how to change TEE for DocX files)

"Silver Bullets" / Correct Encoding Detection
« on: September 15, 2013, 04:07:11 PM »
    Plagiarism Detector tries to correctly detect and use the correct encoding for every loaded document.

    But sometimes it may detect wrong encoding.
    It will look like this (either "lots of question marks" or "lots of unreadable symbols") in the Originality Report:

To avoid this situation you need to tell Plagiarism Detector which encoding to use.

To do this:

  • Start Plagiarism Detector Client.
  • Open New Check Wizard.
  • Select the Documents to check.
  • Choose Manual Config option: (detailed configuration tab appears)

Before changing ANY values you must click the following, otherwise changes does not take place!
1. Reset to Defaults.
2. Start Test.

Select one of the following options:

  • use automatic detection
  • use system default encoding
  • use custom encoding
  • use utf8 encoding
  • use iFilters

e.g. to select custom encoding for a text file:

Q - How to make sure that the encoding is correctly chosen?
A - Double click any document in Document Selector and you will see what the text will look like.
It must look like a desired text:
(in the example below automatic detection is used to get default Chinese encoding)


"Silver Bullets" / Text Extraction Troubleshooting
« on: September 15, 2013, 03:30:08 PM »
Here below you may find valuable info on Text Extraction Troubleshooting.

There exist 3 main issues you may experience:

1. Notification Baloon is displayed and Plagiarism Detector denies plagiarism check:
"Error Code Diagnostics"
Details here:,37.0.html

2. Check is completed Ok, but the extracted text is not complete (some parts are missing):
"Incomplete Text Extraction issue"
Details here:,37.0.html

3. The Originality Report displays lots of question marks instead of text or unreadable symbols:
"Wrong encoding issue"
Details here:,36.msg37.html#msg37

"Silver Bullets" / Check for Plagiarism with Shell Context Menu
« on: September 14, 2013, 06:08:50 PM »
Plagiarism Detector has a really easy-to-use feature allowing to make ultra-fast automatic plagiarism check - Shell Context Menu.

To check a document you need to:

1. Go to the location where your document resides.
2. Select it with RMB.
3. Select "Check for Plagiarism" option form context menu.
4. Wait for the Originality Report to open.

You can find video Demonstration here: "url"

"Silver Bullets" / Document processing Error Codes Description
« on: September 14, 2013, 03:22:23 PM »
If Plagiarism Detector fails to process a document it will return one of the following error codes:

If for some reason text extraction failed, Plagiarism Detector displays a message in a baloon with
detailed information containing ERROR DESCRIPTION CODE and document title:

desc.: unable to find a cause.
possible workaround: email us the document for further investigation.

desc.: unable to find the file.
possible workaround: double check that the file is still in the location you have pointed to. Make a copy and retry.
make sure no application is using the file.

desc.: unable to define the valid file extension.
possible workaround: make sure file has a full title and extension. Rename it properly.

desc.: some other program is using the file.
possible workaround: find such a program, close it and retry. If this does not work, reboot the computer and retry.

desc.: file does not contain valid text.
possible workaround: make sure the file contains some text inside.

desc.: the software was not able to extract any text from the file.
possible workaround: [Extended troubleshooting TEE config].

desc.: iFilters - one of the Text Extraction Engines failed to initialize.
possible workaround: [Extended troubleshooting iFilters].

desc.: iFilters - one of the Text Extraction Engines failed to initialize.
possible workaround: [Extended troubleshooting iFilters].

desc.: you are trying to load a file that is not supported at the moment.
possible workaround: doublecheck if the file you are trying to load is the correct one
[supported file types].

desc.: Plagiarism Detector failed to load the document in the alloted time.
possible workaround: Try to increase the timeout for document processing.

desc.: the document you are trying to process is too large.
Currently the max allowed size is 100 Mbs.
possible workaround: Try to increase the maximum allowed size for document processing in the main configuration file.

desc.: subprogram was not found.
possible workaround: reinstall the product.

desc.: internal falure.
possible workaround: email us the document.

desc.: internal falure.
possible workaround: email us the document.

desc.: subprogram was not found.
possible workaround: reinstall the product.

"Silver Bullets" / Supported file formats
« on: September 14, 2013, 02:51:06 PM »
Plagiarism Detector deals with documents in different formats and web-pages.

Following document formats are supported:

*.docx - Microsoft Word (newest versions)
*.pptx - Microsoft Word (newest versions)
*.txt- Plain Text [* requires correct encoding detection]
*.rtf - Ritch text Format
*.doc - Microsoft Word (older versions)
*.htm *.html *.asp *.php - web  pages [* requires correct encoding detection]
*.pdf - Adobe PDF [requires text, not images as text]
*.odt - Open Office Document

97% of all documents are going to be processed Ok.

Most common unsupported file type is rasterized *.pdf.
Such document contains images not text, thus requires OCR processing.

For example (you can download it using the at the bottom):

+ Extended error codes for Resource processing added in Originality Reports:
+ Changed hardcoded K for Plagiarism Detection, this will result in more dense cluster detection.
+ Resource processing completely rewritten. Clean logics for web page processing separated from external file download.


"Silver Bullets" / Resources Error Codes
« on: September 14, 2013, 12:34:48 PM »
[relates to core v. 790 and up:]

Here you can find info what to do if resource loading failed for some reason.
  • Define the exact error code.
  • See below for a possible workaround.
  • Try it out.
  • Contact us if you experience any issues.
Plagiarism Detector Client has the following resource processing error codes:

explanation: not able to diagnose the exact cause - attach the problematic Originality Report and mail us.
possible workaround: contact us, send files for us to investigate.

explanation: not able to read\write to intermediary disk cache.
possible workaround: check if you are under Admin.

Timeouted (operation timeouted)
explanation: the program was not able to process this resource withing the resource processing timeout.
possible workaround: increase resource processing timeout value in settings.

explanation: the processed resource exceeds the maximum allowed size.
possible workaround: increase maximum allowed resource size in settings.

explanation: plagiarism detector was not able to correctly parse the resource url.
possible workaround: --

explanation: the resource is probably down (website is offline) - not working.
possible workarounds:
a) retry
b) try to access the resource later
c) check if you could reach it with the default browser

explanation: text extraction subsystem failed to extract plaintext from the specified resource.
possible workaround: try to manually use a different text extraction engine for the specified resource.

explanation: Plagiarism Detector has failed to connect to the intermediary proxy.
possible workaround: check proxy settings.

explanation: During a Local check this file was missing for the File System.
possible workaround: Make sure the file is in place and can be opened. Then re-run the check.

explanation: the web server containing the resource replies with an unsupported Content-Type.
possible workaround: --

Supported content-types are:

text/plain for: ".txt"
text/xml for: ".txt"
text/richtext for: ".rtf"
application/msword for: ".doc"
application/pdf for: ".pdf"
application/rtf for: ".rtf"
application/ for: ".ppt"
application/x-latex for: ".tex"
application/vnd.openxmlformats-officedocument.wordprocessingml.document for: ".docx"
application/vnd.openxmlformats-officedocument.presentationml.presentation for: ".pptx"

explanation: the web server containing the resource replies zero Content-Type (see above.).
possible workaround: please email us the exat URL form the Originality Report.

explanation: the resource points not to the web page, but to a separate file. Plagiarism detector was not able to correctly download and parse this file.
possible workaround: try to manually use a different text extraction engine for the specified resource.

Pages: 1 2 [3] 4