Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - Alexei B.

Pages: 1 2 [3] 4 5
31
"Silver Bullets" / Re: Avoiding "Resource Processing Timeouts" error.
« on: September 26, 2014, 12:46:10 AM »
It has come to our attention, that in some cases users are displeased with the search efficiency of the software, when the problem is actually the Internet access speed. Starting with version 852 we will be adding some additional features to make this problem more vivid and understandable.

First: all reports with the number of failed resources higher then the threshold (20% at present) will have the "Failed resources" cell colored Red:



More then that - every time a mouse goes over such a report - the hole line will be covered red, to make sure you see it, even on small screens.

Second: when opening such a report a special message box will appear, telling about the problem and leading to this topic.


Now, back to the problem itself:

Q. What can be the problem?
A. During the second stage of the check, Plagiarism Detector downloads a lot of possible resources for a detailed analysis. This creates a significant load on the Internet connection, and if the bandwidth is not enough - some resources are failed.

Q. Why is that a problem?
A. If none of the sources of plagiarism was downloaded - then it will not be detected. The impact of this problem differs between documents, but it can lead to a very poor detection.

Q. How can I know the number of failed resources for my report?
A. There is a "Rs Failed" column in the Available reports list showing it. Nearby columns show also the total number and the number of "OK" resources. A report also has this counters right between the chart and graph.

Q. I have 5 resources failed! Is that a problem?
A. Not necessarily. Almost all reports show some failed resources, for example when the site is not available at the moment. It only becomes a problem when there are a lot of such fails in comparison with the overall number of resources.

Q. Can I get to know more about the failed resources?
A. Yes. Press "Toggle other sources" in the report. Failed resources are there in the beginning of the list. Here is the list of possible error code: http://www.plagiarism-detector.com/smf_bb/index.php/topic,27.0.html For the problem in question that is usually "Timouted".

Q. What can be done about it?
A. Please refer to the first message in this topic (in the end of it).

We hope it will help to make your work with our software even better!

32
I'm sorry for a delayed reply.

Indeed, 2TC features are now added to Plagiarism Detector.
Additionally, you can use it to check against a folder, not a single file.
You can have a two-tabs comparison by pressing on the "resource" in the report generated (for example, that one after the detected fragment).
There is no ability to adjust the plagiarism threshold on the fly now. I will poke our RnD about this.
Usual 600-word Demo limitation is active for such a check. You can contact us for an extended Demo without this limitation, if you like. You can still get a side-by-side comparison in the report.
We understand that the usual limitation of 600 words can be a hard one, we may re-consider it. But we also have some serious reasons for it.

Please send an e-mail to our support for an extended Demo before you buy it.

33
For the version 848 there is a problem when there is just one report shown in the list of the reports and this report is opened each time any document is checked.

It happens when there are more then 200 reports in the report folder. An easy way around is moving older reports in a separate folder.

This will be fixed in the next version released.

34
We have received several requests regarding references in the footnotes of the docX documents not being detected.

The problem is this: text from footnotes is not extracted. It is easy to check - no links that are present in the footnotes, are present in the report. We can do nothing here, since even Microsoft iFilters don't do this.

We recommend adding all the links from footnotes in the end of the document as text. For example as a list of used resources. In this case it will work just fine (remember to have URL-references enables in the program).

I hope it helps.

35
Comparison to a folder was always considered more as a feature, then primary function. For a big part that is because such a check has to take much time.
To compare against a big document storage, we recommend using PDAS (Plagiarism Detector Accumulator Server), that is specifically created to check against a database, containing thousands of documents, and does it much faster.

So it was a surprise to us, that some customers started using the Folder comparison in such a way, and we never really tested this feature with such big packs of files, nor we created it for the task.

I will definitely ask our RnD to have a look into it.

May I ask you for some info on the computer specifications, such as the CPU and amount of RAM your computer has? It can be helpful. Same as what number of documents are you trying to check against.

36
Versions - Release History / PDC core version 838 - minor update
« on: March 22, 2014, 02:34:39 PM »
- Fixed an error with a document loading failure in Demo version
- Compare a document to a folder of documents now compares with all the documents in a folder and all sub-folders

37
Over the time of our work in the field of Plagiarism Detection we have answered a lot of questions like "Are our documents stored elsewhere?". The answer is always the same: Your documents don't leave your computer during Plagiarism Detector checks.

But not so long ago a user started a more general discussion about the safety of such products in general. Safety for Your Documents, for sure.

With a permission from that kind person, I will publish our conversation here (with little edition). It may be useful for those worried about the problem.

_____________________

User:

I have a question regarding this software..

If I make a test with my documents to test against plagiarism? Why my documents and my scientific works must remain on the server online
stored? This software must be 100 % confidential. No need to upload my scientific works on the server and stay there permanently.

PD Team:

No checked documents leave your computer when searching for plagiarism (except for fragments used for a search itself), nor do we have any database at our side with users' documents. Thus your documents are confidential.

You  may have misunderstood PDAS software description. But PDAS is a software that a client can use to store documents in his own database and check against them. It is not used for Plagiarism Detector Internet check.

User:

That's why I asked because has hovered the suspicion that these softwares for against plagiarism take us the documents which we scan with these softwares against plagiarism and you receive and publish them before us and upload them on the internet.. It is not good with our work.
On many websites people complains regarding this thing. And many do not want to scan for against plagiarism for this reason, because they worked in vain and software developers receive all scanned documents and they publish them or upload on the websites.
It is intellectual theft.

PD Team:

We didn't do any research on the way our competitors work, but we can assure you that no documents leave your computer during check with Plagiarism Detector.

The thing you are worried about would seem rather counter-productive for any service that takes care of its clients. Thus the first risk-factor I would predict is "free of charge". Meaning that I wouldn't trust any plagiarism-check service that is free to all. As a question stands: "what's their interest then?".

But for any serious and well-established service a revenue lost from unsatisfied customers leaving (and making bad PR) would be of more importance then their documents. At least I believe so.

Once again: with our software no documents leave your computer to be stored elsewhere.

As you have raised a serious problem, I kindly ask your permission to publish this conversation (text only, no names) on our forums, as it may be of interest for other people.

User:

I raised this issue because, as I said, has hovered the suspicion that these kind of softwares of against plagiarism steal our PC documents, scientific works etc, when when we add in the software the document and check and scan the document.
And now, I don't just mean just at this software Plagiarism Detector. Generally this kind of software that checks if the document is plagiarized or not, people worryes, that their work would be compromised in vain due of intelectual thieft, ie, the thieft of the software..
Yes, you can use these phrases and put on the application's forum, but without my name or my e-mail. Thank you in advance!

PD Team:

Well, I totally understand your concerns. So let me analyze the risks from my experience.

In addition to the already mentioned "free cheese can be a mouse-trap":

1. Software installed at your computer is more secure then some Internet-site, providing the check service. All the following is said about an installed software, since noone knows what happens server-side.

2. Having a certain skills, you can check what data leaves you computer during the check. I can't go into the detail of the algorithms we use, but to my knowledge one cannot reproduce the document with those search-requests generated by our software. Besides, any traffic-analysis will show the requests are going to different places (since several search-engines are used), thus separating the fragments even more. If such analysis shows a whole document uploaded somewhere in one piece -  it doesn't look secure.

3. Easier way: if a software is observed to heavily-load the CPU - it is working on your side. If the CPU is not loaded - the software is either not so good, or the document is uploaded to some server and is checked there. Evidently, less secure.

4. Additional data. You can always check the Internet for third-party sites mentioning the software you are interested about. The more "serious business" it looks - the more secure it is likely to be. As I have said before - one is unlikely to risk his profits from a well-established business with stealing clients' intellectual property. But if the site looks like a home-work of post-graduate student and noone has ever heard of this product before - well, use it at your own risk.

Besides, I have just consulted our RnD about this and they did provide some additions from their perspective:

1. It is rather unlikely that clients' works are stolen for their scientific value. More likely as a part of regular process of filling the database that documents are checked against. For a widely-used service it looks impossible to analyze all the incoming documents for scientific value.

2. Indeed, some services do store all the documents that are checked with them (we don't see it right to mention them). You can follow the  above mentioned list of criteria to reduce the chance to use such a service. Even if the service stores clients' documents, there are two options: documents are stored for the service internal use only (later documents are checked against old ones) or documents are later indexed by search engines, which is indeed a serious threat, as you document becomes publicly available.

3. Someone interested in a detailed research in the field can make a set of "trap-documents" that are 100% original and check them with different services. Then in about 1.5 month repeat the same. If a document remains 100% with the same service - no documents are stored. If it is found plagiarized by the same service, but none of the others - documents are stored in the internal database. But if different services start finding plagiarism in a clean document that was checked with once service only - those services are just mirrors of a singe document-storing server or the document became publicly available.

We hope you find this information useful.

38
FAQ - Frequently Asked Questions / Exclude/Include lists
« on: January 13, 2014, 03:44:17 AM »
At the last page of the Step-By-Step Wizard you can open the second tab, which is Exclusion lists. It allows you to edit two different list, giving specific details for the search engine.

Exclude list: if any exclusion mask in this list is present in the found page URL - the page is ignored during check.
For example, adding the title of the document to the list will ignore all the Internet sources, that use the document name in the page address.
Example:
If the exclusion list contains "wikipedia" then ALL URLs to wikipedia.org will be ignored.

Include list: any URL in the list is thoroughly checked against during the search.
If you expect the page to be a source of part of the checked document, but the software does not find it, you can add the page to this list, to make sure it is checked in detail. The fact that the document is not detected without it can be explained by the SEP-stage fast check, finding candidates for a REP detailed comparison, is not so specific and detailed, but relies on the quantities of sites checked.

39
+ "Detect URL references" now works fine. If a document contains a section with a direct link to the Internet page containing it - the section is marked as Referenced (if the check-box is checked)
+ Exclude list now works as intended: if any word in this list is present in the found page URL - the page is ignored during check
+ Include list now works as intended: any URL in the list is thoroughly checked against during the search.
+ "Check against folder of documents" will now always ignore the file that is checked, if it is contained in the same folder.
+ Improved PDAS compatibility.

Planned for the next minor release: faster loading of Available Reports list.

40
FAQ - Frequently Asked Questions / Generation Time and Date
« on: November 13, 2013, 01:52:28 PM »
This section shows the time and date of the document check.

41
FAQ - Frequently Asked Questions / Document Words Count
« on: November 13, 2013, 01:51:50 PM »
This section shows number of words in the original document checked.

42
FAQ - Frequently Asked Questions / Document Name
« on: November 13, 2013, 01:51:12 PM »
This section shows the name of the original document checked.

43
FAQ - Frequently Asked Questions / Document Location
« on: November 13, 2013, 01:50:35 PM »
This section shows the location of the original document checked at the computer it was checked at.

44
FAQ - Frequently Asked Questions / Processed Resources List
« on: November 13, 2013, 01:49:53 PM »
This section gives some information about the sources, that were considered as possible sources of Plagiarism and the status of their check.
Processed OK count refers to the number of sources that were successfully downloaded and analyzed while the Failed counter refers to the number of sources that failed to load and thus were not checked.

For example: a big PDF file on the Internet was considered as a possible source and the download was initiated. However it was not available at the moment or it failed to load in the given time*. In both cases the document is considered Failed.

By pressing Toggle Resources List you can get a more detailed information on all the possible sources for the document under consideration.

First goes the status of the source: either OK or Failed.
Sources found to contain the text from the document under consideration are listed first, with the numbers of Chars and Words that are similar to those in the checked document.
Failed sources and sources that were not confirmed as having the same text are listed later. Please note, unconfirmed sources are colored with a different tone of Green.


* You can alter the timeout for a document downloading in the Settings > Core Config > Single resource processing timeout section of Plagiarism Detector. The more the time - the longer a document check can take.

45
FAQ - Frequently Asked Questions / Plagiarism Detection Chart
« on: November 13, 2013, 01:47:44 PM »
A special Chart that shows the correlations of different parts of the checked document. It shows the relative amount of Plagiarized part to the Original part of the document and Referenced part if any.

Original Part is marked in Green
All the text, that was not found to be taken from any sources after the check is considered Original.

Plagiarized part in Red
All the text, that was found to originate from some sources which were not referenced in the document is considered Plagiarized.

Referenced Part is in Blue.
Dependent on the settings of the Plagiarism Detector Step-by-step Wizard this may refer to:
- Text taken from any sources that are explicitly mentioned in the document by the means of their URLs. This is a default setting "Detect Active References".
- Any text between quotation marks. This is "Detect Passive References" setting. Please note that this is not a recommended setting!

Linked Part is in Yellow
Small sections of text between two Plagiarized fragments are considered linked and usually refer to altered Plagiarism. For example - one word added into the Plagiarized fragment.

Pages: 1 2 [3] 4 5