Your Intellectual Property and Plagiarism Detection Software

Started by Alexei B., February 18, 2014, 01:11:09 AM

Previous topic - Next topic

Alexei B.

Over the time of our work in the field of Plagiarism Detection we have answered a lot of questions like "Are our documents stored elsewhere?". The answer is always the same: Your documents don't leave your computer during Plagiarism Detector checks.

But not so long ago a user started a more general discussion about the safety of such products in general. Safety for Your Documents, for sure.

With a permission from that kind person, I will publish our conversation here (with little edition). It may be useful for those worried about the problem.

_____________________

User:

I have a question regarding this software..

If I make a test with my documents to test against plagiarism? Why my documents and my scientific works must remain on the server online
stored? This software must be 100 % confidential. No need to upload my scientific works on the server and stay there permanently.

PD Team:

No checked documents leave your computer when searching for plagiarism (except for fragments used for a search itself), nor do we have any database at our side with users' documents. Thus your documents are confidential.

You  may have misunderstood PDAS software description. But PDAS is a software that a client can use to store documents in his own database and check against them. It is not used for Plagiarism Detector Internet check.

User:

That's why I asked because has hovered the suspicion that these softwares for against plagiarism take us the documents which we scan with these softwares against plagiarism and you receive and publish them before us and upload them on the internet.. It is not good with our work.
On many websites people complains regarding this thing. And many do not want to scan for against plagiarism for this reason, because they worked in vain and software developers receive all scanned documents and they publish them or upload on the websites.
It is intellectual theft.

PD Team:

We didn't do any research on the way our competitors work, but we can assure you that no documents leave your computer during check with Plagiarism Detector.

The thing you are worried about would seem rather counter-productive for any service that takes care of its clients. Thus the first risk-factor I would predict is "free of charge". Meaning that I wouldn't trust any plagiarism-check service that is free to all. As a question stands: "what's their interest then?".

But for any serious and well-established service a revenue lost from unsatisfied customers leaving (and making bad PR) would be of more importance then their documents. At least I believe so.

Once again: with our software no documents leave your computer to be stored elsewhere.

As you have raised a serious problem, I kindly ask your permission to publish this conversation (text only, no names) on our forums, as it may be of interest for other people.

User:

I raised this issue because, as I said, has hovered the suspicion that these kind of softwares of against plagiarism steal our PC documents, scientific works etc, when when we add in the software the document and check and scan the document.
And now, I don't just mean just at this software Plagiarism Detector. Generally this kind of software that checks if the document is plagiarized or not, people worryes, that their work would be compromised in vain due of intellectual thieft, ie, the thieft of the software..
Yes, you can use these phrases and put on the application's forum, but without my name or my e-mail. Thank you in advance!

PD Team:

Well, I totally understand your concerns. So let me analyze the risks from my experience.

In addition to the already mentioned "free cheese can be a mouse-trap":

1. Software installed at your computer is more secure then some Internet-site, providing the check service. All the following is said about an installed software, since no one knows what happens server-side.

2. Having a certain skills, you can check what data leaves you computer during the check. I can't go into the detail of the algorithms we use, but to my knowledge one cannot reproduce the document with those search-requests generated by our software. Besides, any traffic-analysis will show the requests are going to different places (since several search-engines are used), thus separating the fragments even more. If such analysis shows a whole document uploaded somewhere in one piece -  it doesn't look secure.

3. Easier way: if a software is observed to heavily-load the CPU - it is working on your side. If the CPU is not loaded - the software is either not so good, or the document is uploaded to some server and is checked there. Evidently, less secure.

4. Additional data. You can always check the Internet for third-party sites mentioning the software you are interested about. The more "serious business" it looks - the more secure it is likely to be. As I have said before - one is unlikely to risk his profits from a well-established business with stealing clients' intellectual property. But if the site looks like a home-work of post-graduate student and noone has ever heard of this product before - well, use it at your own risk.

Besides, I have just consulted our RnD about this and they did provide some additions from their perspective:

1. It is rather unlikely that clients' works are stolen for their scientific value. More likely as a part of regular process of filling the database that documents are checked against. For a widely-used service it looks impossible to analyze all the incoming documents for scientific value.

2. Indeed, some services do store all the documents that are checked with them (we don't see it right to mention them). You can follow the  above mentioned list of criteria to reduce the chance to use such a service. Even if the service stores clients' documents, there are two options: documents are stored for the service internal use only (later documents are checked against old ones) or documents are later indexed by search engines, which is indeed a serious threat, as you document becomes publicly available.

3. Someone interested in a detailed research in the field can make a set of "trap-documents" that are 100% original and check them with different services. Then in about 1.5 month repeat the same. If a document remains 100% with the same service - no documents are stored. If it is found plagiarized by the same service, but none of the others - documents are stored in the internal database. But if different services start finding plagiarism in a clean document that was checked with once service only - those services are just mirrors of a singe document-storing server or the document became publicly available.

We hope you find this information useful.

Alexei B.

Over the years since the initial discussion some additional features have been added to the software that require more data uploaded to our server. To keep things transparent I'd list some important changes here.
The main thing remains the same: no document checked with the Internet check of Plagiarism Detector is stored in any database to run checks against in the future.

Some cases when additional information, the file itself or your report is uploaded to our servers:
1. Text cannot be extracted from a document on your side (in case all local text extractors failed). The file is uploaded to our servers for text extraction and is deleted as soon as the task is completed.
2. Report to PDF export - this feature is working server-side, so any report exported to PDF is uploaded to our servers and the resulting PDF is then downloaded from our servers. The PDF will be available on our servers for a few hours and then deleted. Only the program that uploaded the file has the link to the PDF.
3. Report-sharing - if a client decides to share a report online - it will be stored on our servers for a long time and available by the link the client shares. These reports can only be accessible via the link provided, so if you share the link - be ready for search engines to index it!
4. We store statistical information on reports, which may include options used, file names and check result percentages (but not the text!). These are only used internally to detect possible problems and otherwise improve the program.
5. We consider the option to sometimes temporarily, anonymously and securely save clients' documents or reports for internal development or testing needs only. As of the moment of writing we don't have such on option. Some features we'd like to add to the program require a lot of real-world material to implement. For example we need documents with obfuscation cheating suspected to counter it, or documents for the program to start differentiating between "various kinds of Plagiarism".

If you have any concerns regarding any of these points - feel free to contact our support service in this regard.