Plagiarism Detector deals with documents in different formats and web-pages.
Following document formats are supported:
*.
docx - Microsoft Word (newest versions)
*.
pptx - Microsoft Word (newest versions)
*.
txt- Plain Text
[* requires correct encoding detection]*.
rtf - Ritch text Format
*.
doc - Microsoft Word (older versions)
*.
htm *.html *.asp *.php - web pages
[* requires correct encoding detection]*.
pdf - Adobe PDF
[requires text, not images as text]*.
odt - Open Office Document
97% of all documents are going to be processed Ok.
Most common unsupported file type is rasterized *.pdf.
Such document contains images not text, thus requires OCR processing.
For example (you can download it using the at the bottom):
