Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - Alexei B.

Pages: 1 2 [3] 4 5 6
31
"Silver Bullets" / Re: Check Type: Word-to-Word VS Re-Written
« on: September 18, 2015, 05:03:31 PM »
The Exact Sciences.


Text in these fields of knowledge showed certain features of their own, making the above-mentioned obfuscated Plagiarism detection algorithms unacceptable on many cases. Texts in Physics, Maths, etc. usually are much less flexible and enjoy a massive use of domain-specific constructions and expressions that are similar to many texts from the same domain of knowledge. One of the best examples was a certain medical prescription, which was considered Plagiarized upon checking. However a manual check did not confirm it. It turned out that most (if not all) of prescriptions use the same structure of the text as well as the same words and expressions. It is just the components, that change.

Let us take this example from Wikipedia:
“Take of pentobarbitone sodium, three grammes
of sulphate of morphia, two grammes
of hydrate of chloral, fifteen grammes
of table sugar, enough to make fifty grammes.”
And now let’s toss the ingredients randomly:
“Take of hydrate of chloral, three grammes
of pentobarbitone sodium, two grammes
of sulphate of morphia, fifty grammes
of table sugar, enough to make fifteen grammes.”

And now remember the example from the Arts section that was to be detected as Plagiarism. It is rather evident, that due to the same language used these parts will be considered the “same” text, that was obfuscated by changing the word order (one of the approached to obfuscation).

Sure, it is an error, one that we call false-positive. Errors of this kind are usual for all the Plagiarism detection algorithms that are aimed at detecting obfuscated Plagiarism.

Having it in mind, we modified the algorithm specifically for such texts, to detect only what we call “word-to-word” Plagiarism. This algorithm will correctly detect this “prescription” as two different texts, but will also detect those “Arts” example as different texts.

So this “word-to-word” Plagiarism Detection algorithm has the following features:
-   Detects only similar parts of texts
-   Prevents false-positive results
-   Usually shows less Plagiarism then a regular algorithm
-   Is bad at finding even slightly obfuscated Plagiarism

In the recent years we have had several versions of Plagiarism Detector, using this algorithm, and they were provided to customers, that required this kind of check. However having two very different versions is not what we see right, so our RnD spent much time on incorporating both algorithms into a single software!


32
"Silver Bullets" / Re: Check Type: Word-to-Word VS Re-Written
« on: September 18, 2015, 05:02:28 PM »
The Arts.

Texts in these subjects are very flexible in nature and allow much modification without actually changing the meaning. Any analysis of a piece of literature is a good example to it. To detect Plagiarism in the best possible way a software has to detect obfuscated “re-written” cases of Plagiarism – when sentences are modified (manually or automatically) to keep the meaning, but avoid detection. We have multiple cases of such modified documents, provided by our customers at different times, which shows some students’ struggle to avoid Plagiarism detection. For example the sentence “It was a need for him to have the computer fixed” is better be detected as similar to “he must have had the PC fixed”. Please note: these examples are hypothetical and very simplified, the algorithm is much more complex and this pair can be detected or not, depending on the context.

Such approach to Plagiarism detection is perceived to be better not only by us, but also by many competitors, and that is due to several advantages:
-   Obfuscated Plagiarism detection
-   More Plagiarism detected – users often compare software by the detection percent for the same document

That is why it has usually been a default setting for our software.
However, this approach was found to have a significant drawback:
-   False-positive results for certain documents (see below)

33
"Silver Bullets" / OR Section: "Check Type" Word-to-Word vs Re-Write
« on: September 18, 2015, 12:32:09 AM »
Plagiarism Detector team is proud to announce the absolutely awesome feature added to Plagiarism Detector – presets for different kinds of documents are now available “out of the box”!

Some theory behind this:
It has been some time since we first observed cases, when our usual Plagiarism detection algorithms provided unsatisfactory results for certain documents. Additional research highlighted some common features for such cases. These are the specific features of the text in the Arts subjects and Exact Sciences subjects, as we started to call them. While the language for these documents is the same, they usually have some very interesting characteristics, which require different attitude from the Plagiarism detection algorithms.

So, starting with version 885, a user can select the preferable algorithm of check in the Step-by-Step Wizard! Please select “detect Text Rewrite (maximum detection)” if you documents are in the Arts subjects or other similar, and “detect Word-to-Word (maximum exactness)” if your documents are in the Exact Sciences’ fields.

We really believe that this new feature (that we have never seen before elsewhere) will help our customers in the ever-ongoing struggle against copy-paste!


34
We are aware of the false-positive Norton alert about the setup-file (at the time of posting, v885).

You are most likely to see the WS.Reputation.1 as the alert reason. It looks like Norton react to our file, as we have released a new version and their anti-virus software is not "familiar" with it.

You can read more on the problem here:
http://community.norton.com/forums/clarification-wsreputation1-detection

There should be an option to remove the file from Quarantine, which would allow you to use it.

We are sure the install package for our software does not have any virus inside, as the online-service confirms:
https://www.virustotal.com/en/file/e500af153be86bd6ce305f1440f04f183a38eff8b059b04bcae01937dd59c4eb/analysis/

We see it possible later setups can also be falsely detected, so client are encouraged to use VirusTotal for checks.

We hope it helps.

35
Just discussed the situation with RnD.

1. Some kind of detailed setting to the T-Comparator is planned for the future.

2. Please provide a pair of reports (old-new) with a detailed explanation to what is wrong to our support e-mail (check forum PM). Our RnD requests that a problem is described so that we can be sure that "this" is what you are writing about. Screenshots may help much.

Any user-provided materials are used for internal testing only.

36
The latest version uses a different detection algorithm, with this setting no longer available - to change the mode of detection we now need to alter not just one thing. This algorithm is much better in general, so we moved to it.

In this version we keep two possible settings: for re-written and for word-to-word Plagiarism. I will soon post a detailed explanation as to these, but in general: re-written is close to "maximum detection" on the previous scale, "word-to-word" - maximum exactness. These settings are results to extensive tests on big collections of Plagiarized  pieces and provided best results we could reach in the recent years.

We are sorry to hear this was so important to you and I will ask our RnD about the possibility to return some manual settings in future.

Force updating to the current version was needed due to some changes on the third side, that made previous version worthless.

P.S. Any other users in need of the manual settings - please inform us either in this topic, or e-mail to our support. The more people need it - the more urgent will be the task.

37
Versions before 874 have a problem with occasional license loss, somehow connected with USB-storages used. If you need to have your license re-initialized - please send us an e-mail, mentioning your license number.

38
We switched back from v874 to v849(850 is almost similar) due to a negative feedback.

When you need your license re-initialized (new OS/PC) - please send us an e-mail, mentioning your license code.

39
I'm sorry for a much delayed reply.

Version 874 and later versions will have a special notification for a significant number of timeouted resources. It turned out, that many users, having poor Internet connection speeds didn't pay attention to this number, so we made it vivid. So we made this message.

Please take a look at this topic to know more about avoiding this problem: http://www.plagiarism-detector.com/smf_bb/index.php/topic,48.0.html

Version 874 uses a different check algorithm and does not have detailed settings so far.

Anyway, version 849 is back on our site, due to all these problems.

40
The type of the license is mention in the purchase notification e-mail.
In the program you can see the type of your license in the right middle of the screen.

41
"Plagiarism Detector" general discussion / Re: self sitation
« on: March 27, 2015, 11:44:30 AM »
You have to use the "Active URL references" feature.
The paper published before is to be available in the Internet.
First you need to add the exact URL of the published paper to the body of your document. Pay attention: not the "notes" in the doc/x documents, as only the main text is extracted.

Second, start the program and follow the wizard up to the last screen.
Enable Detect "URL" references option.

Check your document.

If everything is done right - all the text, available at the link present in the document, will be considered "referenced" and marked blue.

I hope it helps.

42
Pro license can be activated at two different computers with the same code.

So, you have 1 code, and you can use it twice.

When you need to re-initialize any of your licenses (new PC, for example), then you need to contact our support service, mentioning your license code.

I hope it helps.

43
"Silver Bullets" / Re: Avoiding "Resource Processing Timeouts" error.
« on: September 26, 2014, 12:46:10 AM »
It has come to our attention, that in some cases users are displeased with the search efficiency of the software, when the problem is actually the Internet access speed. Starting with version 852 we will be adding some additional features to make this problem more vivid and understandable.

First: all reports with the number of failed resources higher then the threshold (20% at present) will have the "Failed resources" cell colored Red:



More then that - every time a mouse goes over such a report - the hole line will be covered red, to make sure you see it, even on small screens.

Second: when opening such a report a special message box will appear, telling about the problem and leading to this topic.


Now, back to the problem itself:

Q. What can be the problem?
A. During the second stage of the check, Plagiarism Detector downloads a lot of possible resources for a detailed analysis. This creates a significant load on the Internet connection, and if the bandwidth is not enough - some resources are failed.

Q. Why is that a problem?
A. If none of the sources of plagiarism was downloaded - then it will not be detected. The impact of this problem differs between documents, but it can lead to a very poor detection.

Q. How can I know the number of failed resources for my report?
A. There is a "Rs Failed" column in the Available reports list showing it. Nearby columns show also the total number and the number of "OK" resources. A report also has this counters right between the chart and graph.

Q. I have 5 resources failed! Is that a problem?
A. Not necessarily. Almost all reports show some failed resources, for example when the site is not available at the moment. It only becomes a problem when there are a lot of such fails in comparison with the overall number of resources.

Q. Can I get to know more about the failed resources?
A. Yes. Press "Toggle other sources" in the report. Failed resources are there in the beginning of the list. Here is the list of possible error code: http://www.plagiarism-detector.com/smf_bb/index.php/topic,27.0.html For the problem in question that is usually "Timouted".

Q. What can be done about it?
A. Please refer to the first message in this topic (in the end of it).

We hope it will help to make your work with our software even better!

44
I'm sorry for a delayed reply.

Indeed, 2TC features are now added to Plagiarism Detector.
Additionally, you can use it to check against a folder, not a single file.
You can have a two-tabs comparison by pressing on the "resource" in the report generated (for example, that one after the detected fragment).
There is no ability to adjust the plagiarism threshold on the fly now. I will poke our RnD about this.
Usual 600-word Demo limitation is active for such a check. You can contact us for an extended Demo without this limitation, if you like. You can still get a side-by-side comparison in the report.
We understand that the usual limitation of 600 words can be a hard one, we may re-consider it. But we also have some serious reasons for it.

Please send an e-mail to our support for an extended Demo before you buy it.

45
For the version 848 there is a problem when there is just one report shown in the list of the reports and this report is opened each time any document is checked.

It happens when there are more then 200 reports in the report folder. An easy way around is moving older reports in a separate folder.

This will be fixed in the next version released.

Pages: 1 2 [3] 4 5 6