Plagiarism Detector against directory of files? Size limit?

Started by fredw1970, April 02, 2014, 02:09:51 AM

Previous topic - Next topic

fredw1970

I just purchased a pro license of this tool, but am having issues.  I am trying to compare a document against a rather large directory of other word documents.  The system locks up every time.  I need to know what the directory size limits are?  ie how many documents can this system compare a document against?  1000 files, 5000 files, 50,000 files???

Thanks.

Alexei B.

Comparison to a folder was always considered more as a feature, then primary function. For a big part that is because such a check has to take much time.
To compare against a big document storage, we recommend using PDAS (Plagiarism Detector Accumulator Server), that is specifically created to check against a database, containing thousands of documents, and does it much faster.

So it was a surprise to us, that some customers started using the Folder comparison in such a way, and we never really tested this feature with such big packs of files, nor we created it for the task.

I will definitely ask our RnD to have a look into it.

May I ask you for some info on the computer specifications, such as the CPU and amount of RAM your computer has? It can be helpful. Same as what number of documents are you trying to check against.

Mike Sanders

#2
Several point covered that you may find useful:

1. "I am trying to compare a document against a rather large directory of other word documents." - we strongly advise to try out our product PDAS (http://www.plagiarism-detector.com/plagiarism-detector-accumulator-server-demo-download.php). It was developed to resolve your particular task - comparison to large amount of documents in minimal time.

The difference between two methods is the following:

- "Folder Check" in PDC alway takes linear time e.g.:


  • 1000 documents :: 10 minutes
  • 2000 documents :: 20 minutes
  • 3000 documents :: 30 minutes
etc.

- PDAS (Plagiarism Detector Accumulator Server) check will be around minimal, fixed time irrespective the number of documents in the Database e.g.:


  • 1000 documents :: 10 seconds
  • 2000 documents :: 10,2 seconds
  • 3000 documents :: 10,4 seconds
--

02. "The system locks up every time" - could you specify the exact moment this happens?
I've already sent your complaint to our RnD dpt so that will look into this.

03. "I need to know what the directory size limits are?"
At the moment there exists no limits. PDC recursively locates all the files within the target folder and starts comparison in "one-by-one" manner
aggregating the results. Putting it shortly PDC will check ALL found files.

04. "How many documents can this system compare a document against?" As many as you need. The time will grow in a linear manner though. Using PDAS is strongly advised.
--
Feel free to as any additional questions!
Plagiarism Detector is a swiss army knife.

Mike Sanders

Dear Sir,

I have contacted our RnD and they promptly responded with a little update.

Please install and try out the following:

--outdated version link removed --
(this will be soon released as an update)

Core version is 839.

This must work without a pre-start freeze.
--
"Local check" contained a bug (some debug source code) that resulted in a freeze before the
actual check started. In this update this code is completely removed, thus making this stage must faster.

The bug itself was an accumulative one - more files, more freeze. The less performing machine - the more freeze time.
The less RAM - the more freeze time.

We have tested in against 11000 sources vs 1 single file case,
with resulting check ETA = 14 minutes. On a relatively powerful desktop with HDD Drive.
--
Looking forward to your feedback!
Plagiarism Detector is a swiss army knife.