Author Topic: Supported file formats  (Read 18870 times)

Mike Sanders

  • Plagiarism Detector support
  • Administrator
  • Jr. Member
  • *****
  • Posts: 51
  • I will gladly help, just ask!
    • View Profile
    • Support Superman
Supported file formats
« on: September 14, 2013, 02:51:06 PM »
Plagiarism Detector deals with documents in different formats and web-pages.

Following document formats are supported:

*.docx - Microsoft Word (newest versions)
*.pptx - Microsoft Word (newest versions)
*.txt- Plain Text [* requires correct encoding detection]
*.rtf - Ritch text Format
*.doc - Microsoft Word (older versions)
*.htm *.html *.asp *.php - web  pages [* requires correct encoding detection]
*.pdf - Adobe PDF [requires text, not images as text]
*.odt - Open Office Document


97% of all documents are going to be processed Ok.

Most common unsupported file type is rasterized *.pdf.
Such document contains images not text, thus requires OCR processing.

For example (you can download it using the at the bottom):
 
« Last Edit: November 19, 2017, 04:38:15 PM by Alexei B. »
Plagiarism Detector is a swiss army knife.