Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Messages - Mike Sanders

Pages: 1 2 [3]
"Silver Bullets" / Text Extraction Troubleshooting
« on: September 15, 2013, 03:30:08 PM »
Here below you may find valuable info on Text Extraction Troubleshooting.

There exist 3 main issues you may experience:

1. Notification Baloon is displayed and Plagiarism Detector denies plagiarism check:
"Error Code Diagnostics"
Details here:,37.0.html

2. Check is completed Ok, but the extracted text is not complete (some parts are missing):
"Incomplete Text Extraction issue"
Details here:,37.0.html

3. The Originality Report displays lots of question marks instead of text or unreadable symbols:
"Wrong encoding issue"
Details here:,36.msg37.html#msg37

"Silver Bullets" / Check for Plagiarism with Shell Context Menu
« on: September 14, 2013, 06:08:50 PM »
Plagiarism Detector has a really easy-to-use feature allowing to make ultra-fast automatic plagiarism check - Shell Context Menu.

To check a document you need to:

1. Go to the location where your document resides.
2. Select it with RMB.
3. Select "Check for Plagiarism" option form context menu.
4. Wait for the Originality Report to open.

You can find video Demonstration here: "url"

"Silver Bullets" / Document processing Error Codes Description
« on: September 14, 2013, 03:22:23 PM »
If Plagiarism Detector fails to process a document it will return one of the following error codes:

If for some reason text extraction failed, Plagiarism Detector displays a message in a baloon with
detailed information containing ERROR DESCRIPTION CODE and document title:

desc.: unable to find a cause.
possible workaround: email us the document for further investigation.

desc.: unable to find the file.
possible workaround: double check that the file is still in the location you have pointed to. Make a copy and retry.
make sure no application is using the file.

desc.: unable to define the valid file extension.
possible workaround: make sure file has a full title and extension. Rename it properly.

desc.: some other program is using the file.
possible workaround: find such a program, close it and retry. If this does not work, reboot the computer and retry.

desc.: file does not contain valid text.
possible workaround: make sure the file contains some text inside.

desc.: the software was not able to extract any text from the file.
possible workaround: [Extended troubleshooting TEE config].

desc.: iFilters - one of the Text Extraction Engines failed to initialize.
possible workaround: [Extended troubleshooting iFilters].

desc.: iFilters - one of the Text Extraction Engines failed to initialize.
possible workaround: [Extended troubleshooting iFilters].

desc.: you are trying to load a file that is not supported at the moment.
possible workaround: doublecheck if the file you are trying to load is the correct one
[supported file types].

desc.: Plagiarism Detector failed to load the document in the alloted time.
possible workaround: Try to increase the timeout for document processing.

desc.: the document you are trying to process is too large.
Currently the max allowed size is 100 Mbs.
possible workaround: Try to increase the maximum allowed size for document processing in the main configuration file.

desc.: subprogram was not found.
possible workaround: reinstall the product.

desc.: internal falure.
possible workaround: email us the document.

desc.: internal falure.
possible workaround: email us the document.

desc.: subprogram was not found.
possible workaround: reinstall the product.

"Silver Bullets" / Supported file formats
« on: September 14, 2013, 02:51:06 PM »
Plagiarism Detector deals with documents in different formats and web-pages.

Following document formats are supported:

*.docx - Microsoft Word (newest versions)
*.pptx - Microsoft Word (newest versions)
*.txt- Plain Text [* requires correct encoding detection]
*.rtf - Ritch text Format
*.doc - Microsoft Word (older versions)
*.htm *.html *.asp *.php - web  pages [* requires correct encoding detection]
*.pdf - Adobe PDF [requires text, not images as text]
*.odt - Open Office Document

97% of all documents are going to be processed Ok.

Most common unsupported file type is rasterized *.pdf.
Such document contains images not text, thus requires OCR processing.

For example (you can download it using the at the bottom):

+ Extended error codes for Resource processing added in Originality Reports:
+ Changed hardcoded K for Plagiarism Detection, this will result in more dense cluster detection.
+ Resource processing completely rewritten. Clean logics for web page processing separated from external file download.


"Silver Bullets" / Resources Error Codes
« on: September 14, 2013, 12:34:48 PM »
[relates to core v. 790 and up:]

Here you can find info what to do if resource loading failed for some reason.
  • Define the exact error code.
  • See below for a possible workaround.
  • Try it out.
  • Contact us if you experience any issues.
Plagiarism Detector Client has the following resource processing error codes:

explanation: not able to diagnose the exact cause - attach the problematic Originality Report and mail us.
possible workaround: contact us, send files for us to investigate.

explanation: not able to read\write to intermediary disk cache.
possible workaround: check if you are under Admin.

Timeouted (operation timeouted)
explanation: the program was not able to process this resource withing the resource processing timeout.
possible workaround: increase resource processing timeout value in settings.

explanation: the processed resource exceeds the maximum allowed size.
possible workaround: increase maximum allowed resource size in settings.

explanation: plagiarism detector was not able to correctly parse the resource url.
possible workaround: --

explanation: the resource is probably down (website is offline) - not working.
possible workarounds:
a) retry
b) try to access the resource later
c) check if you could reach it with the default browser

explanation: text extraction subsystem failed to extract plaintext from the specified resource.
possible workaround: try to manually use a different text extraction engine for the specified resource.

explanation: Plagiarism Detector has failed to connect to the intermediary proxy.
possible workaround: check proxy settings.

explanation: During a Local check this file was missing for the File System.
possible workaround: Make sure the file is in place and can be opened. Then re-run the check.

explanation: the web server containing the resource replies with an unsupported Content-Type.
possible workaround: --

Supported content-types are:

text/plain for: ".txt"
text/xml for: ".txt"
text/richtext for: ".rtf"
application/msword for: ".doc"
application/pdf for: ".pdf"
application/rtf for: ".rtf"
application/ for: ".ppt"
application/x-latex for: ".tex"
application/vnd.openxmlformats-officedocument.wordprocessingml.document for: ".docx"
application/vnd.openxmlformats-officedocument.presentationml.presentation for: ".pptx"

explanation: the web server containing the resource replies zero Content-Type (see above.).
possible workaround: please email us the exat URL form the Originality Report.

explanation: the resource points not to the web page, but to a separate file. Plagiarism detector was not able to correctly download and parse this file.
possible workaround: try to manually use a different text extraction engine for the specified resource.

Plagiarism Detector core version 749 minor update.

Features added:

+ Added support for *.odt files (Open Office Document). Now you can run checks for Plagiarism for Open Office documents!
The support for Open Office presentations is coming soon.
We are investigating to add support for TEX\LaTEX formats.

+ "Kittens Screen" added.
+ Several bugs removed:
- on user switch
- on hibernate
- on shutdown
+ session memory is now working for RMB Check.
+ Many small fixes and improvements.
Plagiarism Detector is a bit better software now :-)

Versions - Release History / PDC core version 749 - minor update
« on: August 30, 2013, 08:51:37 PM »
Plagiarism Detector core version 749 minor update.

Features added:

+ Advanced Text Extraction engines (TEEs) configuration in Document Manager.
+ Alternative TEEs added.
+ Added auto-configuration of TEEs (not active yet).
+ REP sub-core completely rewritten thread-safe.
+ Much faster REP turnaround time.
+ Removed final REP timeout.
+ Same results for Q-Cache and Direct Download.

- iFIlters diagnostics removed.

Pages: 1 2 [3]