Author Topic: Plagiarism Detector against directory of files? Size limit?  (Read 31397 times)

fredw1970

  • Newbie
  • *
  • Posts: 3
    • View Profile
Plagiarism Detector against directory of files? Size limit?
« on: April 02, 2014, 05:09:51 AM »
I just purchased a pro license of this tool, but am having issues.  I am trying to compare a document against a rather large directory of other word documents.  The system locks up every time.  I need to know what the directory size limits are?  ie how many documents can this system compare a document against?  1000 files, 5000 files, 50,000 files???

Thanks.

Alexei B.

  • Community Manager
  • Administrator
  • Jr. Member
  • *****
  • Posts: 85
    • View Profile
Re: Plagiarism Detector against directory of files? Size limit?
« Reply #1 on: April 02, 2014, 05:47:57 PM »
Comparison to a folder was always considered more as a feature, then primary function. For a big part that is because such a check has to take much time.
To compare against a big document storage, we recommend using PDAS (Plagiarism Detector Accumulator Server), that is specifically created to check against a database, containing thousands of documents, and does it much faster.

So it was a surprise to us, that some customers started using the Folder comparison in such a way, and we never really tested this feature with such big packs of files, nor we created it for the task.

I will definitely ask our RnD to have a look into it.

May I ask you for some info on the computer specifications, such as the CPU and amount of RAM your computer has? It can be helpful. Same as what number of documents are you trying to check against.

Mike Sanders

  • Plagiarism Detector support
  • Administrator
  • Jr. Member
  • *****
  • Posts: 51
  • I will gladly help, just ask!
    • View Profile
    • Support Superman
Re: Plagiarism Detector against directory of files? Size limit?
« Reply #2 on: April 03, 2014, 12:13:16 AM »
Several point covered that you may find useful:

1. "I am trying to compare a document against a rather large directory of other word documents." - we strongly advise to try out our product PDAS (http://www.plagiarism-detector.com/plagiarism-detector-accumulator-server-demo-download.php). It was developed to resolve your particular task - comparison to large amount of documents in minimal time.

The difference between two methods is the following:

- "Folder Check" in PDC alway takes linear time e.g.:

  • 1000 documents :: 10 minutes
  • 2000 documents :: 20 minutes
  • 3000 documents :: 30 minutes
etc.

- PDAS (Plagiarism Detector Accumulator Server) check will be around minimal, fixed time irrespective the number of documents in the Database e.g.:

  • 1000 documents :: 10 seconds
  • 2000 documents :: 10,2 seconds
  • 3000 documents :: 10,4 seconds
--

02. "The system locks up every time" - could you specify the exact moment this happens?
I've already sent your complaint to our RnD dpt so that will look into this.

03. "I need to know what the directory size limits are?"
At the moment there exists no limits. PDC recursively locates all the files within the target folder and starts comparison in "one-by-one" manner
aggregating the results. Putting it shortly PDC will check ALL found files.

04. "How many documents can this system compare a document against?" As many as you need. The time will grow in a linear manner though. Using PDAS is strongly advised.
--
Feel free to as any additional questions!
« Last Edit: April 03, 2014, 02:25:00 PM by Mike Sanders »
Plagiarism Detector is a swiss army knife.

Mike Sanders

  • Plagiarism Detector support
  • Administrator
  • Jr. Member
  • *****
  • Posts: 51
  • I will gladly help, just ask!
    • View Profile
    • Support Superman
Re: Plagiarism Detector against directory of files? Size limit?
« Reply #3 on: April 03, 2014, 02:23:32 PM »
Dear Sir,

I have contacted our RnD and they promptly responded with a little update.

Please install and try out the following:

http://78.47.128.158/pdc_setup_generic_839.exe
(this will be soon released as an update)

Core version is 839.

This must work without a pre-start freeze.
--
"Local check" contained a bug (some debug source code) that resulted in a freeze before the
actual check started. In this update this code is completely removed, thus making this stage must faster.

The bug itself was an accumulative one - more files, more freeze. The less performing machine - the more freeze time.
The less RAM - the more freeze time.

We have tested in against 11000 sources vs 1 single file case,
with resulting check ETA = 14 minutes. On a relatively powerful desktop with HDD Drive.
--
Looking forward to your feedback!
Plagiarism Detector is a swiss army knife.