Analyzing Non-Textual Content Elements to Detect Academic Plagiarism

Lade...
Vorschaubild
Dateien
Meuschke_2-ll951b8bh8s30.pdf
Meuschke_2-ll951b8bh8s30.pdfGröße: 10.22 MBDownloads: 2539
Datum
2021
Autor:innen
Herausgeber:innen
Kontakt
ISSN der Zeitschrift
Electronic ISSN
ISBN
Bibliografische Daten
Verlag
Schriftenreihe
Auflagebezeichnung
DOI (zitierfähiger Link)
ArXiv-ID
Internationale Patentnummer
Link zur Lizenz
Angaben zur Forschungsförderung
Projekt
Open Access-Veröffentlichung
Open Access Green
Core Facility der Universität Konstanz
Gesperrt bis
Titel in einer weiteren Sprache
Forschungsvorhaben
Organisationseinheiten
Zeitschriftenheft
Publikationstyp
Dissertation
Publikationsstatus
Published
Erschienen in
Zusammenfassung

Identifying academic plagiarism is a pressing problem, among others, for research institutions, publishers, and funding organizations. Detection approaches proposed so far analyze lexical, syntactical, and semantic text similarity. These approaches find copied, moderately reworded, and literally translated text. However, reliably detecting disguised plagiarism, such as strong paraphrases, sense-for-sense translations, and the reuse of non-textual content and ideas, is an open research problem.

The thesis addresses this problem by proposing plagiarism detection approaches that implement a different concept—analyzing non-textual content in academic documents, such as citations, images, and mathematical content.

The thesis makes the following research contributions.

It provides the most extensive literature review on plagiarism detection technology to date. The study presents the weaknesses of current detection approaches for identifying strongly disguised plagiarism. Moreover, the survey identifies a significant research gap regarding methods that analyze features other than text.

Subsequently, the thesis summarizes work that initiated the research on analyzing non-textual content elements to detect academic plagiarism by studying citation patterns in academic documents.

To enable plagiarism checks of figures in academic documents, the thesis introduces an image-based detection process that adapts itself to the forms of image similarity typically found in academic work. The process includes established image similarity assessments and newly proposed use-case-specific methods.

To improve the identification of plagiarism in disciplines like mathematics, physics, and engineering, the thesis presents the first plagiarism detection approach that analyzes the similarity of mathematical expressions.

To demonstrate the benefit of combining non-textual and text-based detection methods, the thesis describes the first plagiarism detection system that integrates the analysis of citation-based, image-based, math-based, and text-based document similarity. The system’s user interface employs visualizations that significantly reduce the effort and time users must invest in examining content similarity.

To validate the effectiveness of the proposed detection approaches, the thesis presents five evaluations that use real cases of academic plagiarism and exploratory searches for unknown cases. Real plagiarism is committed by expert researchers with strong incentives to disguise their actions. Therefore, I consider the ability to identify such cases essential for assessing the benefit of any new plagiarism detection approach. The findings of these evaluations are as follows.

Citation-based plagiarism detection methods considerably outperformed text-based detection methods in identifying translated, paraphrased, and idea plagiarism instances. Moreover, citation-based detection methods found nine previously undiscovered cases of academic plagiarism.

The image-based plagiarism detection process proved effective for identifying frequently observed forms of image plagiarism for image types that authors typically use in academic documents.

Math-based plagiarism detection methods reliably retrieved confirmed cases of academic plagiarism involving mathematical content and identified a previously undiscovered case. Math-based detection methods offered advantages for identifying plagiarism cases that text-based methods could not detect, particularly in combination with citation-based detection methods.

These results show that non-textual content elements contain a high degree of semantic information, are language-independent, and largely immutable to the alterations that authors typically perform to conceal plagiarism. Analyzing non-textual content complements text-based detection approaches and increases the detection effectiveness, particularly for disguised forms of academic plagiarism.

Zusammenfassung in einer weiteren Sprache
Fachgebiet (DDC)
004 Informatik
Schlagwörter
Plagiarism Detection, Citation Analysis, Content-based Image Retrieval, Math Retrieval, Natural Language Processing, Information Visualization, User Interaction, Open Source Software
Konferenz
Rezension
undefined / . - undefined, undefined
Zitieren
ISO 690MEUSCHKE, Norman, 2021. Analyzing Non-Textual Content Elements to Detect Academic Plagiarism [Dissertation]. Konstanz: University of Konstanz
BibTex
@phdthesis{Meuschke2021Analy-53952,
  year={2021},
  doi={10.5281/zenodo.4913345},
  title={Analyzing Non-Textual Content Elements to Detect Academic Plagiarism},
  author={Meuschke, Norman},
  address={Konstanz},
  school={Universität Konstanz}
}
RDF
<rdf:RDF
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:bibo="http://purl.org/ontology/bibo/"
    xmlns:dspace="http://digital-repositories.org/ontologies/dspace/0.1.0#"
    xmlns:foaf="http://xmlns.com/foaf/0.1/"
    xmlns:void="http://rdfs.org/ns/void#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#" > 
  <rdf:Description rdf:about="https://kops.uni-konstanz.de/server/rdf/resource/123456789/53952">
    <foaf:homepage rdf:resource="http://localhost:8080/"/>
    <dcterms:title>Analyzing Non-Textual Content Elements to Detect Academic Plagiarism</dcterms:title>
    <dcterms:rights rdf:resource="http://creativecommons.org/licenses/by-nc/4.0/"/>
    <void:sparqlEndpoint rdf:resource="http://localhost/fuseki/dspace/sparql"/>
    <dcterms:available rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2021-06-10T13:39:47Z</dcterms:available>
    <dcterms:issued>2021</dcterms:issued>
    <dc:rights>Attribution-NonCommercial 4.0 International</dc:rights>
    <dcterms:hasPart rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/53952/5/Meuschke_2-ll951b8bh8s30.pdf"/>
    <dspace:isPartOfCollection rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dc:contributor>Meuschke, Norman</dc:contributor>
    <dspace:hasBitstream rdf:resource="https://kops.uni-konstanz.de/bitstream/123456789/53952/5/Meuschke_2-ll951b8bh8s30.pdf"/>
    <bibo:uri rdf:resource="https://kops.uni-konstanz.de/handle/123456789/53952"/>
    <dcterms:isPartOf rdf:resource="https://kops.uni-konstanz.de/server/rdf/resource/123456789/36"/>
    <dc:date rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">2021-06-10T13:39:47Z</dc:date>
    <dc:creator>Meuschke, Norman</dc:creator>
    <dc:language>eng</dc:language>
    <dcterms:abstract xml:lang="eng">Identifying academic plagiarism is a pressing problem, among others, for research institutions, publishers, and funding organizations. Detection approaches proposed so far analyze lexical, syntactical, and semantic text similarity. These approaches find copied, moderately reworded, and literally translated text. However, reliably detecting disguised plagiarism, such as strong paraphrases, sense-for-sense translations, and the reuse of non-textual content and ideas, is an open research problem.&lt;br /&gt;&lt;br /&gt;The thesis addresses this problem by proposing plagiarism detection approaches that implement a different concept—analyzing non-textual content in academic documents, such as citations, images, and mathematical content.&lt;br /&gt;&lt;br /&gt;The thesis makes the following research contributions.&lt;br /&gt;&lt;br /&gt;It provides the most extensive literature review on plagiarism detection technology to date. The study presents the weaknesses of current detection approaches for identifying strongly disguised plagiarism. Moreover, the survey identifies a significant research gap regarding methods that analyze features other than text.&lt;br /&gt;&lt;br /&gt;Subsequently, the thesis summarizes work that initiated the research on analyzing non-textual content elements to detect academic plagiarism by studying citation patterns in academic documents.&lt;br /&gt;&lt;br /&gt;To enable plagiarism checks of figures in academic documents, the thesis introduces an image-based detection process that adapts itself to the forms of image similarity typically found in academic work. The process includes established image similarity assessments and newly proposed use-case-specific methods.&lt;br /&gt;&lt;br /&gt;To improve the identification of plagiarism in disciplines like mathematics, physics, and engineering, the thesis presents the first plagiarism detection approach that analyzes the similarity of mathematical expressions.&lt;br /&gt;&lt;br /&gt;To demonstrate the benefit of combining non-textual and text-based detection methods, the thesis describes the first plagiarism detection system that integrates the analysis of citation-based, image-based, math-based, and text-based document similarity. The system’s user interface employs visualizations that significantly reduce the effort and time users must invest in examining content similarity.&lt;br /&gt;&lt;br /&gt;To validate the effectiveness of the proposed detection approaches, the thesis presents five evaluations that use real cases of academic plagiarism and exploratory searches for unknown cases. Real plagiarism is committed by expert researchers with strong incentives to disguise their actions. Therefore, I consider the ability to identify such cases essential for assessing the benefit of any new plagiarism detection approach. The findings of these evaluations are as follows.&lt;br /&gt;&lt;br /&gt;Citation-based plagiarism detection methods considerably outperformed text-based detection methods in identifying translated, paraphrased, and idea plagiarism instances. Moreover, citation-based detection methods found nine previously undiscovered cases of academic plagiarism.&lt;br /&gt;&lt;br /&gt;The image-based plagiarism detection process proved effective for identifying frequently observed forms of image plagiarism for image types that authors typically use in academic documents.&lt;br /&gt;&lt;br /&gt;Math-based plagiarism detection methods reliably retrieved confirmed cases of academic plagiarism involving mathematical content and identified a previously undiscovered case. Math-based detection methods offered advantages for identifying plagiarism cases that text-based methods could not detect, particularly in combination with citation-based detection methods.&lt;br /&gt;&lt;br /&gt;These results show that non-textual content elements contain a high degree of semantic information, are language-independent, and largely immutable to the alterations that authors typically perform to conceal plagiarism. Analyzing non-textual content complements text-based detection approaches and increases the detection effectiveness, particularly for disguised forms of academic plagiarism.</dcterms:abstract>
  </rdf:Description>
</rdf:RDF>
Interner Vermerk
xmlui.Submission.submit.DescribeStep.inputForms.label.kops_note_fromSubmitter
Kontakt
URL der Originalveröffentl.
Prüfdatum der URL
Prüfungsdatum der Dissertation
March 5, 2021
Hochschulschriftenvermerk
Konstanz, Univ., Diss., 2021
Finanzierungsart
Kommentar zur Publikation
Allianzlizenz
Corresponding Authors der Uni Konstanz vorhanden
Internationale Co-Autor:innen
Universitätsbibliographie
Nein
Begutachtet
Link zu Forschungsdaten
Beschreibung der Forschungsdaten
Data and source code for the experiments in the thesis.
Diese Publikation teilen