SMLSE

Supervised Machine Learning Similarity Estimate

The project is concerned with the measurement of rhetorical similarity of texts through supervised machine learning. This site contains the reproducible code and a draft paper. The data stems from the ParlSpeech V2 data set developed by Christian Rauh and Jan Schwalbach (2020). The analyses should be fully reproducible, please open an issue if that is not the case.

Each country has its own folder, which contains the code for pre-processing, producing the estimates, and their visualisation. The German data contains additional files for classifier selection, wordfish estimation, estimation of cosine and jaccard similarity, and visualisation of these estimates. The NL-folder contains an additional file exploring Geert Wilders' rhetoric in 2004.

The plot below shows similarity to/distinctiveness of AfD-rhetoric for all speakers in the current German Bundestag.

Resources

Rauh, Christian; Schwalbach, Jan, 2020, "The ParlSpeech V2 data set: Full-text corpora of 6.3 million parliamentary speeches in the key legislative chambers of nine representative democracies", https://doi.org/10.7910/DVN/L4OAKN, Harvard Dataverse, V1

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
AT		AT
DE		DE
NL		NL
Presentation		Presentation
.gitignore		.gitignore
.here		.here
README.md		README.md
appendix.pdf		appendix.pdf
appendix.tex		appendix.tex
main.pdf		main.pdf
main.tex		main.tex
refs.bib		refs.bib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SMLSE

Supervised Machine Learning Similarity Estimate

Resources

About

Releases

Packages

Languages

nicolaiberk/SMLSE

Folders and files

Latest commit

History

Repository files navigation

SMLSE

Supervised Machine Learning Similarity Estimate

Resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages