Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Messages - Mike Sanders

Pages: 1 2 [3] 4
"Silver Bullets" / Adobe PDF Support on x64 operating systems
« on: September 20, 2013, 02:48:58 PM »
    To use iFilters as a PDF files text extraction engine please download and install:

relates to:

  • WinXP x64
  • Win7 x64
  • Win8 x64

and other OSs running x64 architecture.[/list]

"Silver Bullets" / What is PDAS?
« on: September 20, 2013, 10:59:24 AM »
This section explains what is PDAS.

"Silver Bullets" / Plagiarism Check types (plagiarism check scope)
« on: September 20, 2013, 10:57:55 AM »
This sections explains what Plagiarism check types are supported by Plagiarism Detector and their scope.

Currently Plagiarism Detector Supports the following types of Plagiarism Checks:

1. Global Internet Check (includes  SciPap check).
2. Custom Database Check (PDAS) (Chek against your own PDAS Database, that allows to store very large amounts of documents).
3. Combined Check [Internet + PDAS Database].
4. Check against a Local Folder
5. Check 2 Documents against each other.
6. Check against our internal scientific papers Database (SciPap).

1. Global Internet Check.

It runs a check against 2 major search engines: Google and Bing.  Also includes results from our SciPap database.

a) "If it is on the internet - it will be probably detected".
b) If it is not in the search index - it will not be detected.
c) If it is in the search index - Plagiarism Detector can still miss it (bad things happen too).

Most common reasons:
- Zipf law.
- Bad search engine SERP relevancy (Recall\Precision).
- Fingerprint miss.

2. Database Check (PDAS = Plagiarism Detector Accumulator Server).

PDAS is a separately sold product that integrates with Plagiarism Detector Client programs.
Plagiarism Detector - is a Client program.
PDAS - Plagiarism Detector Accumulator Server - is a Server database.

To run checks:
1. You need to have a special need for this :-). You need to have a corpus of documents or a little library.
2. You need to download PDAS (free demo is available).
3. You need to install PDAS.
4. You need to import your corpus of documents or a little library into PDAS. This requires some time and effort.
5. At this point you will be able to run checks against your brand new PDAS Database.
6. Start the client (any Plagiarism Detector), it will detect PDAS instance and you are ready to go!

3. Combined check.
Combines results from both Internet and PDAS  check to a single report.

4. Check against Local Folder Check.

This is pretty self-explanatory. If you have a folder full of many documents and you need to
figure out if there is cross-plagiarism in between. This is the right place to start!
One thing to note - you must select the 1st document
via the Document Manager, and the target folder to check against - you need to select at the last
step of the New Check Wizard.

5. Check 2 Documents against each other.

This is pretty self-explanatory. One thing to note - you must select the 1st document
via the Document Manager, and the second one - you need to select at the last step of the New Check Wizard.

6. SciPap check.

Check against our internal scientific papers Database (SciPap). Important note: no clients' document are added to this database. As of now this is more a supplement for the Internet check.

"Silver Bullets" / References
« on: September 20, 2013, 10:55:24 AM »
Plagiarism Detector supports 2 types of references:

2. Reference by URL.


Quotes are detected by a complex algorithm that is stable to incorrect quoting.
It will adjust itself with minimum error.

The following glyph pairs are supported by Plagiarism Detector:

" " - ordinary double
' ' - apostrophe
« » - left-pointing double angle quotation mark \ right-pointing double angle quotation mark
‹ › - single left-pointing angle quotation mark \ single right-pointing angle quotation mark
‘ ’ - left single quotation mark \ right single quotation mark
‚ ‛ - single low-9 quotation mark \ single high-reversed-9 quotation mark
„ " - Romanian variant
„ ‟ - Romanian 1 double low-9 quotation mark \ double high-reversed-9 quotation mark
„ + Chr(148) - Romanian 2
„ + Chr(0) - Romanian 3
ChrW(8220) ChrW(8221) "can't be displayed!", "can't be displayed!" 'double low-9 quotation mark \ double high-reversed-9 quotation mark

Quotation marks in Chinese, Japanese, and Korean (CJK)
「 」 left corner bracket \ right corner bracket
『 』 - left white corner bracket \ right white corner bracket
〝〝 - reversed double prime quotation mark
〞〟 - double prime quotation mark \ low double prime quotation mark

'Alternate encodings:
﹁ ﹂ - presentation form for vertical left corner bracket \ presentation form for vertical right corner bracket
﹃  ﹄ - presentation form for vertical left corner white bracket \ presentation form for vertical right corner white bracket
' - fullwidth apostrophe
「 」 - halfwidth left corner bracket \ halfwidth right corner bracket

It is strongly recommended to correctly put quotation marks - this will ensure the correct detection and
align of the quoted sections.

You can disable or enable the quoting detection via the Check wizard interface.

URL references:

There is a setting called "Detect URL references".
If it is ON Plagiarism Detector scans the checked document and
extracts all the found URLs within. It will download the named resource and test it against the
original checked document for shared sections. It will mark all shared sections as referenced if found.

Exclusion \ Inclusion Lists

If you need to permanently add\remove any URL from analysis - use
Include\Exclude Lists.

More info:,317.0.html

References by DOI are going to be implemented soon.

"Silver Bullets" / What is "Use Last Session"\"Manually add Documents"?
« on: September 20, 2013, 10:50:40 AM »
If you need to have the same document (or multiple documents) checked for the second time with the same options\or different options,
you can check "use Last session docs" to do a new check for plagiarism from scratch.

If you check "add docs manually" - you will be prompted to select documents manually.

This setting is re-set at each DocManager start.

+ most important: increased Plagiarism Detection sensitivity. We used a number of manually-rewritten documents to get the most optimal detection behavior.
+ default session option is reseted to "manually adding documents" at program restart.
+ large number of reports is now displayed with progress bar indicating their loading process.
+ filters in Report Viewer are reseted at form open stage. Thus fixed bug with incorrect Originality Report loading on active filters.
+ added more "Content-Types" to avoid resource processing due UnsupportedContentType error.
+ added "what's new in the update" - to the auto-update system linking to this board.
+ fixed Blue-Help buttons.

Known issues:
- support dropped for TC configuration and pre-sets. (we will fix\update this in 2 weeks approximately.)

We would like to thank lordlegion for extensive feedback and
provided files with rewritten sections - that helped us greatly to configure the new version!

+ Added "Wrong File" warning screen for better explanation on how to work with Shell Context Menu.
+ Second screen modal bug fixed.
+ Added "Blue Help Buttons" linking to the related Community Forum threads.
+ Core Version is now hard-coded into the main executable.
+ Better support for "File in use" issue. Tested with "Read non-exclusive Lock" and "Write-Read exclusive Lock".


"Silver Bullets" / Wrong File warning!
« on: September 15, 2013, 05:06:06 PM »
Dear client, you are doing it wrong!

You are trying to check the PORGRAM itself, instead
of checking a target file:

Your target file is usually a Microsoft Word of PDF Document.
So you need to use the CONTEXT menu on that file, and NOT the Link of the Plagiarism Detector!

"Silver Bullets" / Text Extrcation Engines Configuration
« on: September 15, 2013, 04:47:16 PM »
Plagiarism Detector has several methods of Extracting Plain text from each supported file type.
(The List of supported file formats)

For each file type has it's own Text Extraction Engines.

Text Extraction Engine (TEE) is a sub-program that extracts text from a specified file type.

By default, every time you start Document Manager - Plagiarism Detector automatically selects the most optimal TEE for each supported file type.

Still, there exist 2 cases, when you possibly need to change TEE to get better text extraction:
  • Incomplete Text Extraction (some parts of the document are missing).
  • Dirty Text Extraction (when html code is present in the document).
  • Wrong encoding detection for complex file formats.
TEEs configuration made with Document Manager is global - it is used in all cases.
TEEs configuration made with Advanced Report Viewer (ARV) is local - it is used only within ARV session and then reseted to global setting.

To change a particular TEE for a specific file type, click to the right, combobox appears:
(The example below shows how to change TEE for DocX files)

"Silver Bullets" / Correct Encoding Detection
« on: September 15, 2013, 04:07:11 PM »
    Plagiarism Detector tries to correctly detect and use the correct encoding for every loaded document.

    But sometimes it may detect wrong encoding.
    It will look like this (either "lots of question marks" or "lots of unreadable symbols") in the Originality Report:

To avoid this situation you need to tell Plagiarism Detector which encoding to use.

To do this:

  • Start Plagiarism Detector Client.
  • Open New Check Wizard.
  • Select the Documents to check.
  • Choose Manual Config option: (detailed configuration tab appears)

Before changing ANY values you must click the following, otherwise changes does not take place!
1. Reset to Defaults.
2. Start Test.

Select one of the following options:

  • use automatic detection
  • use system default encoding
  • use custom encoding
  • use utf8 encoding
  • use iFilters

e.g. to select custom encoding for a text file:

Q - How to make sure that the encoding is correctly chosen?
A - Double click any document in Document Selector and you will see what the text will look like.
It must look like a desired text:
(in the example below automatic detection is used to get default Chinese encoding)


"Silver Bullets" / Text Extraction Troubleshooting
« on: September 15, 2013, 03:30:08 PM »
Here below you may find valuable info on Text Extraction Troubleshooting.

There exist 3 main issues you may experience:

1. Notification Baloon is displayed and Plagiarism Detector denies plagiarism check:
"Error Code Diagnostics"
Details here:,37.0.html

2. Check is completed Ok, but the extracted text is not complete (some parts are missing):
"Incomplete Text Extraction issue"
Details here:,37.0.html

3. The Originality Report displays lots of question marks instead of text or unreadable symbols:
"Wrong encoding issue"
Details here:,36.msg37.html#msg37

"Silver Bullets" / Check for Plagiarism with Shell Context Menu
« on: September 14, 2013, 06:08:50 PM »
Plagiarism Detector has a really easy-to-use feature allowing to make ultra-fast automatic plagiarism check - Shell Context Menu.

To check a document you need to:

1. Go to the location where your document resides.
2. Select it with RMB.
3. Select "Check for Plagiarism" option form context menu.
4. Wait for the Originality Report to open.

You can find video Demonstration here: "url"

Pages: 1 2 [3] 4