Added encoding argument to TextReuseCorpus and TextReuseTextDocument #89

davidfuhry · 2020-01-15T20:10:46Z

When using Windows readLines will expect text files to be encoded in Windows-1252.

This would add an optional encoding argument to TextReuseCorpus as well as TextReuseTextDocument which can be used to explicitly specify the encoding of the input files (mostly UTF-8).

As it defaults to "unknown" which is the default for readLines this should maintain backward compatability.

Edit: I forgot to mention that this specific issue can be worked around by setting options(encoding = "UTF-8") before creating the corpus however this has some side effects so I still think having an encoding argument is the better way to deal with this.

codecov-io · 2020-01-15T20:24:27Z

Codecov Report

Merging #89 into master will increase coverage by 0.04%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master      #89      +/-   ##
==========================================
+ Coverage   86.77%   86.81%   +0.04%     
==========================================
  Files          25       25              
  Lines         658      660       +2     
==========================================
+ Hits          571      573       +2     
  Misses         87       87

Impacted Files	Coverage Δ
R/TextReuseTextDocument.R	`91.52% <100%> (ø)`	⬆️
R/TextReuseCorpus.R	`82.69% <100%> (+0.33%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a96fec3...5d68a16. Read the comment docs.

David Fuhry added 2 commits January 15, 2020 21:00

added encoding argument to TextReuseCorpus and TextReuseTextDocument

63f480f

reverted roxygennote version

5d68a16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added encoding argument to TextReuseCorpus and TextReuseTextDocument #89

Added encoding argument to TextReuseCorpus and TextReuseTextDocument #89

davidfuhry commented Jan 15, 2020 •

edited

Loading

codecov-io commented Jan 15, 2020 •

edited

Loading

Added encoding argument to TextReuseCorpus and TextReuseTextDocument #89

Are you sure you want to change the base?

Added encoding argument to TextReuseCorpus and TextReuseTextDocument #89

Conversation

davidfuhry commented Jan 15, 2020 • edited Loading

codecov-io commented Jan 15, 2020 • edited Loading

Codecov Report

davidfuhry commented Jan 15, 2020 •

edited

Loading

codecov-io commented Jan 15, 2020 •

edited

Loading