Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding issues with citations (Windows only) #103

Closed
lcolladotor opened this issue Oct 2, 2017 · 3 comments
Closed

Encoding issues with citations (Windows only) #103

lcolladotor opened this issue Oct 2, 2017 · 3 comments

Comments

@lcolladotor
Copy link

Hi,

I've been using knitcitations for a while to handle citations in HTML vignettes. I had been using knitcitations::read.bibtex() until I realized that it no longer reads the entries in the order that were given in the bib file**. So I made a change and it all works... except on Windows. I finally updated my R installation in a Windows laptop and saw that the problem is with encoding.

This short code reproduces the issue:

## Load package
library('knitcitations')

## Tries to cite, prints package name and error when it fails
check_bib <- function() {
xx <- sapply(bib, function(x) {
tryCatch(citep(x), error = function(e) {
message(paste('found an error attempting to cite', names(x)))
print(e)
})
})
}


## list of citations
bib <- c(knitcitations = citation('knitcitations'),
    IRanges = citation('IRanges'),
    S4Vectors = citation('S4Vectors'))
check_bib()
citep(bib[['S4Vectors']])

## Error message:
Error in nchar(aut) : invalid multibyte string, element 1

## Entry that fails
> bib[['S4Vectors']]
Pag<U+653C><U+3E38>s H, Lawrence M and Aboyoun P (2017). _S4Vectors: S4 implementation of vector-like and list-like objects_. R package version 0.15.10.

I see that knitcitations::write.bibtex() uses a "?" in authors in situations like this which is why I didn't notice this issue before. From https://cran.r-project.org/doc/manuals/R-exts.html#The-DESCRIPTION-file I see that 'Encoding' in the DESCRIPTION file is used for the citation and I do see "Encoding: UTF-8" in the S4Vectors DESCRIPTION file.

I get this error with GenomeInfoDb, AnnotationDbi, S4Vectors and SummarizedExperiment (details and reproducibility info at https://gist.github.com/anonymous/a8c6374b381dc9c27f55487756cb4e1b) across the different vignettes I maintain. But I don't get it with IRanges, GenomicRanges and other packages where Hervé Pagès is an author (those packages cite the 2013 PLoS paper). For example, the IRanges package has a inst/CITATION file that uses citEntry( , textVersion = "Pag\\es"). So, specifying an inst/CITATION file works.

> citep(bib[['IRanges']])
[1] "(Lawrence, Huber, Pagès, et al., 2013)"

I imagine that there is a way to deal with the encoding problem properly but I haven't been able to find it. If you have ideas on how I can fix this please let me know.

Thanks!
Leo

** As you can see below read.bibtex() changes the order of the citations, so I can't cite them later using citep().

> write.bibtex(bib, file = 'test.bib')
Writing 3 Bibtex entries ... OK
Results written to file 'test.bib'
## test.bib contents
@Manual{boettiger2017knitcitations,
  title = {knitcitations: Citations for 'Knitr' Markdown Files},
  author = {Carl Boettiger},
  year = {2017},
  note = {R package version 1.0.8},
  url = {https://CRAN.R-project.org/package=knitcitations},
}

@Article{lawrence2013software,
  title = {Software for Computing and Annotating Genomic Ranges},
  author = {Michael Lawrence and Wolfgang Huber and Herv\'e Pag\`es and Patrick Aboyoun and Marc Carlson and Robert Gentleman and Martin Morgan and Vincent Carey},
  year = {2013},
  journal = {{PLoS} Computational Biology},
  volume = {9},
  issue = {8},
  doi = {10.1371/journal.pcbi.1003118},
  url = {http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118},
}

@Manual{pags2017s4vectors,
  title = {S4Vectors: S4 implementation of vector-like and list-like objects},
  author = {?},
  year = {2017},
  note = {R package version 0.15.10},
}
## read.bibtex() changes the order

> read.bibtex('test.bib')
[1] ? _S4Vectors: S4 implementation of vector-like and list-like objects_. R package version 0.15.10. 2017.

[2] C. Boettiger. _knitcitations: Citations for 'Knitr' Markdown Files_. R package version 1.0.8. 2017. <URL:
https://CRAN.R-project.org/package=knitcitations>.

[3] M. Lawrence, W. Huber, H. Pagès, et al.Software for Computing and Annotating Genomic Ranges. In: _PLoS Computational Biology_ 9 (8 2013). DOI:
10.1371/journal.pcbi.1003118. <URL: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1003118}.>

Extra info for GitHub issue

@lcolladotor
Copy link
Author

I also posted the above message at bioc-devel mailing list https://stat.ethz.ch/pipermail/bioc-devel/2017-October/011800.html cc'ing Hervé Pagès.

@cboettig
Copy link
Owner

cboettig commented Oct 3, 2017

Yeah, encoding issues are tough. Looks like this is coming from bibtex::read.bib()? Maybe file an issue there? Or take a look at the RefMangeR package which may do a better job with all this.

@lcolladotor
Copy link
Author

lcolladotor commented Oct 17, 2017

Hi,

Explicitly adding the citation using RefManageR::BibEntry() worked just like in leekgroup/derfinderHelper@b63f8c4.

Thanks,
Leo

Citations I used

S4Vectors = RefManageR::BibEntry(bibtype = 'manual', key = 'S4Vectors',
    author = 'Hervé Pagès and Michael Lawrence and Patrick Aboyoun',
    title = "S4Vectors: S4 implementation of vector-like and list-like objects",
    year = 2017, doi = '10.18129/B9.bioc.S4Vectors')


GenomeInfoDb = RefManageR::BibEntry(bibtype = 'manual',
    key = 'GenomeInfoDb',
    author = 'Sonali Arora and Martin Morgan and Marc Carlson and H. Pagès',
    title = "GenomeInfoDb: Utilities for manipulating chromosome and other 'seqname' identifiers",
    year = 2017, doi = '10.18129/B9.bioc.GenomeInfoDb')
    
AnnotationDbi = RefManageR::BibEntry(bibtype = 'manual',
    key = 'AnnotationDbi',
    author = 'Hervé Pagès and Marc Carlson and Seth Falcon and Nianhua Li',
    title = 'AnnotationDbi: Annotation Database Interface',
    year = 2017, doi = '10.18129/B9.bioc.AnnotationDbi')
    
SummarizedExperiment = RefManageR::BibEntry(bibtype = 'manual',
    key = 'SummarizedExperiment',
    author = 'Martin Morgan and Valerie Obenchain and Jim Hester and Hervé Pagès',
    title = 'SummarizedExperiment: SummarizedExperiment container',
    year = 2017, doi = '10.18129/B9.bioc.SummarizedExperiment')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants