Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Topics - Mike Sanders

Pages: 1 [2] 3
"Silver Bullets" / Avoiding "Resource Processing Timeouts" error.
« on: October 10, 2013, 01:57:51 AM »
Plagiarism Detector has a default timeout of 30 seconds for a single resource processing.

A failure may occur if it does not have enough time to process a resource.

Resource processing includes:

1. Download resource.
2. Extract plain text.
3. Compare it with the checked document.

1 and 3 stages are time consuming.

This error may be caused by:

1. Slow Internet connection (low bandwidth).
2. Resource overload.
3. Resource is big (large amounts of text).

If Plagiarism Detector fails to process a resource in the alloted time period it will increment special counter in the main window:

And a special warning will be displayed if more than 10 resources timeouted:

To resolve this problem you can try to do the following:

1. Make sure no other application is using your bandwidth, and if so - unload it (torrent clients, on-line films\music, games etc.).
2. Increase Resource Processing timeout in Plagiarism Detector: Settings->Core Config->"Single Resource Processing Timeout" in seconds.
3. Decrease the number of simultaneously used workers for Resource Processing: Settings->Core Config->"REP Threads". Set it to 1 thread.
4. Investigate the exact status of the resource in Web Browser.
5. If none of the above helps - contact us!

"Silver Bullets" / Adobe PDF Support on x64 operating systems
« on: September 20, 2013, 02:48:58 PM »
    To use iFilters as a PDF files text extraction engine please download and install:

relates to:

  • WinXP x64
  • Win7 x64
  • Win8 x64

and other OSs running x64 architecture.[/list]

"Silver Bullets" / What is PDAS?
« on: September 20, 2013, 10:59:24 AM »
This section explains what is PDAS.

"Silver Bullets" / Plagiarism Check types (plagiarism check scope)
« on: September 20, 2013, 10:57:55 AM »
This sections explains what Plagiarism check types are supported by Plagiarism Detector and their scope.

Currently Plagiarism Detector Supports the following types of Plagiarism Checks:

1. Global Internet Check.
2. Database Check (PDAS) (1 to very large amount of huge documents up to millions).
3. Check 2 Documents against each other.
4. Check against Local Folder Check (1 to many up to thousands of documents).
5. Hybrid Check [Internet + PDAS Database].

1. Global Internet Check.

It runs a check against 2 major search engines: Google and Bing.

a) "If it is on the internet - it will be probably detected".
b) If it is not in the search index - it will not be detected.
c) If it is in the search index - Plagiarism Detector can still miss it (bad things happen too).

Most common resons:
- Zipf law.
- Bad search engine SERP relevancy (Recall\Precision).
- Fingerprint miss.

2. Database Check (PDAS = Plagiarism Detector Accumulator Server).

PDAS is a separately sold product that integrates with Plagiarism Detector Client programs.
Plagiarism Detector - is a Client program.
PDAS - Plagiarism Detector Accumulator Server - is a Server database.

To run checks:
1. You need to have a special need for this :-). You need to have a corpus of documents or a little library.
2. You need to download PDAS (free demo is available).
3. You need to install PDAS.
4. You need to import your corpus of documents or a little library into PDAS. This requires some time and effort.
5. At this point you will be able to run checks against your brand new PDAS Database.
6. Start the client (any Plagiarism Detector), it will detect PDAS instance and you are ready to go!

3. Check 2 Documents against each other.

This is pretty self-explanatory. One thing to note - you must select the 1st document
via the Document Manager, and the second one - you need to select at the last step of the New Check Wizard.

4. Check against Local Folder Check.

This is pretty self-explanatory. If you have a folder full of many documents and you need to
figure out if there is cross-plagiarism in between. This is the right place to start!
One thing to note - you must select the 1st document
via the Document Manager, and the target folder to check against - you need to select at the last
step of the New Check Wizard.

5. Hybrid Check is in development atm.

"Silver Bullets" / References
« on: September 20, 2013, 10:55:24 AM »
Plagiarism Detector supports 2 types of references:

2. Reference by URL.


Quotes are detected by a complex algorithm that is stable to incorrect quoting.
It will adjust itself with minimum error.

The following glyph pairs are supported by Plagiarism Detector:

" " - ordinary double
' ' - apostrophe
« » - left-pointing double angle quotation mark \ right-pointing double angle quotation mark
‹ › - single left-pointing angle quotation mark \ single right-pointing angle quotation mark
‘ ’ - left single quotation mark \ right single quotation mark
‚ ‛ - single low-9 quotation mark \ single high-reversed-9 quotation mark
„ " - Romanian variant
„ ‟ - Romanian 1 double low-9 quotation mark \ double high-reversed-9 quotation mark
„ + Chr(148) - Romanian 2
„ + Chr(0) - Romanian 3
ChrW(8220) ChrW(8221) "can't be displayed!", "can't be displayed!" 'double low-9 quotation mark \ double high-reversed-9 quotation mark

Quotation marks in Chinese, Japanese, and Korean (CJK)
「 」 left corner bracket \ right corner bracket
『 』 - left white corner bracket \ right white corner bracket
〝〝 - reversed double prime quotation mark
〞〟 - double prime quotation mark \ low double prime quotation mark

'Alternate encodings:
﹁ ﹂ - presentation form for vertical left corner bracket \ presentation form for vertical right corner bracket
﹃  ﹄ - presentation form for vertical left corner white bracket \ presentation form for vertical right corner white bracket
' - fullwidth apostrophe
「 」 - halfwidth left corner bracket \ halfwidth right corner bracket

It is strongly recommended to correctly put quotation marks - this will ensure the correct detection and
align of the quoted sections.

You can disable or enable the quoting detection via the Check wizard interface.

URL references:

There is a setting called "Detect URL references".
If it is ON Plagiarism Detector scans the checked document and
extracts all the found URLs within. It will download the named resource and test it against the
original checked document for shared sections. It will mark all shared sections as referenced if found.

Exclusion \ Inclusion Lists

If you need to permanently add\remove any URL from analysis - use
Include\Exclude Lists.

More info:,317.0.html

References by DOI are going to be implemented soon.

"Silver Bullets" / What is "Use Last Session"\"Manually add Documents"?
« on: September 20, 2013, 10:50:40 AM »
If you need to have the same document (or multiple documents) checked for the second time with the same options\or different options,
you can check "use Last session docs" to do a new check for plagiarism from scratch.

If you check "add docs manually" - you will be prompted to select documents manually.

This setting is re-set at each DocManager start.

+ most important: increased Plagiarism Detection sensitivity. We used a number of manually-rewritten documents to get the most optimal detection behavior.
+ default session option is reseted to "manually adding documents" at program restart.
+ large number of reports is now displayed with progress bar indicating their loading process.
+ filters in Report Viewer are reseted at form open stage. Thus fixed bug with incorrect Originality Report loading on active filters.
+ added more "Content-Types" to avoid resource processing due UnsupportedContentType error.
+ added "what's new in the update" - to the auto-update system linking to this board.
+ fixed Blue-Help buttons.

Known issues:
- support dropped for TC configuration and pre-sets. (we will fix\update this in 2 weeks approximately.)

We would like to thank lordlegion for extensive feedback and
provided files with rewritten sections - that helped us greatly to configure the new version!

+ Added "Wrong File" warning screen for better explanation on how to work with Shell Context Menu.
+ Second screen modal bug fixed.
+ Added "Blue Help Buttons" linking to the related Community Forum threads.
+ Core Version is now hard-coded into the main executable.
+ Better support for "File in use" issue. Tested with "Read non-exclusive Lock" and "Write-Read exclusive Lock".


"Silver Bullets" / Wrong File warning!
« on: September 15, 2013, 05:06:06 PM »
Dear client, you are doing it wrong!

You are trying to check the PORGRAM itself, instead
of checking a target file:

Your target file is usually a Microsoft Word of PDF Document.
So you need to use the CONTEXT menu on that file, and NOT the Link of the Plagiarism Detector!

"Silver Bullets" / Text Extrcation Engines Configuration
« on: September 15, 2013, 04:47:16 PM »
Plagiarism Detector has several methods of Extracting Plain text from each supported file type.
(The List of supported file formats)

For each file type has it's own Text Extraction Engines.

Text Extraction Engine (TEE) is a sub-program that extracts text from a specified file type.

By default, every time you start Document Manager - Plagiarism Detector automatically selects the most optimal TEE for each supported file type.

Still, there exist 2 cases, when you possibly need to change TEE to get better text extraction:
  • Incomplete Text Extraction (some parts of the document are missing).
  • Dirty Text Extraction (when html code is present in the document).
  • Wrong encoding detection for complex file formats.
TEEs configuration made with Document Manager is global - it is used in all cases.
TEEs configuration made with Advanced Report Viewer (ARV) is local - it is used only within ARV session and then reseted to global setting.

To change a particular TEE for a specific file type, click to the right, combobox appears:
(The example below shows how to change TEE for DocX files)

"Silver Bullets" / Correct Encoding Detection
« on: September 15, 2013, 04:07:11 PM »
    Plagiarism Detector tries to correctly detect and use the correct encoding for every loaded document.

    But sometimes it may detect wrong encoding.
    It will look like this (either "lots of question marks" or "lots of unreadable symbols") in the Originality Report:

To avoid this situation you need to tell Plagiarism Detector which encoding to use.

To do this:

  • Start Plagiarism Detector Client.
  • Open New Check Wizard.
  • Select the Documents to check.
  • Choose Manual Config option: (detailed configuration tab appears)

Before changing ANY values you must click the following, otherwise changes does not take place!
1. Reset to Defaults.
2. Start Test.

Select one of the following options:

  • use automatic detection
  • use system default encoding
  • use custom encoding
  • use utf8 encoding
  • use iFilters

e.g. to select custom encoding for a text file:

Q - How to make sure that the encoding is correctly chosen?
A - Double click any document in Document Selector and you will see what the text will look like.
It must look like a desired text:
(in the example below automatic detection is used to get default Chinese encoding)


"Silver Bullets" / Text Extraction Troubleshooting
« on: September 15, 2013, 03:30:08 PM »
Here below you may find valuable info on Text Extraction Troubleshooting.

There exist 3 main issues you may experience:

1. Notification Baloon is displayed and Plagiarism Detector denies plagiarism check:
"Error Code Diagnostics"
Details here:,37.0.html

2. Check is completed Ok, but the extracted text is not complete (some parts are missing):
"Incomplete Text Extraction issue"
Details here:,37.0.html

3. The Originality Report displays lots of question marks instead of text or unreadable symbols:
"Wrong encoding issue"
Details here:,36.msg37.html#msg37

Pages: 1 [2] 3