Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Bug: Change NAs in genlight objects when subsetting to zeros #83

Closed
green-striped-gecko opened this issue Aug 17, 2015 · 3 comments
Labels

Comments

@green-striped-gecko
Copy link

Hi,
I tried to contact Thibaut directly, but maybe this is the better way.

When working together with my colleague on a script using genlight object within adegenet 2 he came across this strange behaviour that when you subset a genlight object all NAs in the SNPs except for the first are converted to zeros.

Can you please advise if this is a bug or if we are doing something wrong. I attach a small working example together with an r script that shows it.

Cheers, Bernd

RSCRIPT

error when subsetting creating 0's for NA's (not in the first column)
load("xx.rdata")
xx
xx@pop
data.frame(xx)

#now subsetting it

xxx <- xx[xx@pop=="A",]
data.frame(xxx)

#now all the NAs from the second column onwards are converted to 0

xxx <- xx[xx@pop=="B",]
data.frame(xxx)

Console Output:

> ##### error when subsetting creating 0's for NA's (not in the first column)
> 
> load("xx.rdata")
> xx
/// GENLIGHT OBJECT /////////

// 8 genotypes,  5 binary SNPs, size: 15.5 Mb
4 (0.1 %) missing data

// Basic content
   @gen: list of 8 SNPbin
   @ploidy: ploidy of each individual  (range: 2-2)

// Optional content
   @ind.names:  8 individual labels
   @loc.names:  5 locus labels
   @loc.all:  5 alleles
   @position: integer storing positions of the SNPs
   @pop: population of each individual (group size range: 4-4)
   @other: a list containing: metrics  latlong  covariates 

> xx@pop
[1] A A A A B B B B
Levels: A B
> data.frame(xx)
        X13049 X13050 X13051 X13052 X13053
AA36881      2     NA      2      2      2
AA36883      2      2      2      2      2
AA36884      2      2      2      2      2
AA36802     NA      2      2      2      2
AA36803      2      2      2      2      2
AA36804      2     NA      2      2      2
AA36181      2     NA      2      2      2
AA36183      2      2      2      2      2
> 
> #now subsetting it
> 
> xxx <- xx[xx@pop=="A",] #here the conversion of NAs seems to occur
> data.frame(xxx)
        X13049 X13050 X13051 X13052 X13053
AA36881      2      0      2      2      2   #here the NA is now converted to zero!!!
AA36883      2      2      2      2      2
AA36884      2      2      2      2      2
AA36802     NA      2      2      2      2
> 
> #now all the NAs from the second column onwards are converted to 0
> 
> xxx <- xx[xx@pop=="B",]
> data.frame(xxx)
        X13049 X13050 X13051 X13052 X13053
AA36803      2      2      2      2      2
AA36804      2      0      2      2      2   #here the NA is now converted to zero!!!
AA36181      2      0      2      2      2  #here the NA is now converted to zero!!!
AA36183      2      2      2      2      2

Please advise

@zkamvar
Copy link
Collaborator

zkamvar commented Aug 17, 2015

This is indeed a bug. Here's a minimum example that I will use:

x <- "
        X13049 X13050 X13051 X13052 X13053
AA36881      2     NA      2      2      2
AA36883      2      2      2      2      2
AA36884      2      2      2      2      2
AA36802     NA      2      2      2      2
AA36803      2      2      2      2      2
AA36804      2     NA      2      2      2
AA36181      2     NA      2      2      2
AA36183      2      2      2      2      2"

xxdf <- read.table(text = x)
library("adegenet")
xx <- new("genlight", xxdf)
NA.posi(xx)
# $AA36881
# [1] 2
# 
# $AA36883
# integer(0)
# 
# $AA36884
# integer(0)
# 
# $AA36802
# [1] 1
# 
# $AA36803
# integer(0)
# 
# $AA36804
# [1] 2
# 
# $AA36181
# [1] 2
# 
# $AA36183
# integer(0)
NA.posi(xx[])
# $AA36881
# integer(0)
# 
# $AA36883
# integer(0)
# 
# $AA36884
# integer(0)
# 
# $AA36802
# [1] 1 1 1 1 1
# 
# $AA36803
# integer(0)
# 
# $AA36804
# integer(0)
# 
# $AA36181
# integer(0)
# 
# $AA36183
# integer(0)

The issue is in the fact that logicals and negative subscripts are not being treated correctly in the internal subsetter for SNPbin objects. The problem code lies in lines 23 - 32 of glHandle.R. I will fix this in the morning. Thanks for the report.

@zkamvar zkamvar added the bug label Aug 17, 2015
@zkamvar
Copy link
Collaborator

zkamvar commented Aug 17, 2015

Until this is fixed, a workaround is to always specify the loci you want to keep:

For example, this should work in the current situation.

(myloci <- seq(nLoc(xx)))
# [1] 1 2 3 4 5
xxx <- xx[xx@pop=="A",  myloci]

@thibautjombart
Copy link
Owner

Thanks for spotting this. @zkamvar thanks for fixing it!

zkamvar added a commit that referenced this issue Aug 17, 2015
@zkamvar zkamvar mentioned this issue Aug 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants