-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Summary seppop number of alleles per locus incorrect #234
Comments
Hello @apaijmans, Thank you for the detailed description and clear example. I was able to re-create this behavior with the "nancycats" data set: library("adegenet")
#> Loading required package: ade4
#>
#> /// adegenet 2.1.1 is loaded ////////////
#>
#> > overview: '?adegenet'
#> > tutorials/doc/questions: 'adegenetWeb()'
#> > bug reports/feature requests: adegenetIssues()
data("nancycats", package = "adegenet")
summary(nancycats)
#>
#> // Number of individuals: 237
#> // Group sizes: 10 22 12 23 15 11 14 10 9 11 20 14 13 17 11 12 13
#> // Number of alleles per locus: 16 11 10 9 12 8 12 12 18
#> // Number of alleles per group: 36 53 50 67 48 56 42 54 43 46 70 52 44 61 42 40 35
#> // Percentage of missing data: 2.34 %
#> // Observed heterozygosity: 0.67 0.67 0.68 0.71 0.63 0.57 0.65 0.62 0.45
#> // Expected heterozygosity: 0.87 0.79 0.8 0.76 0.87 0.69 0.82 0.76 0.61
summary(seppop(nancycats)[[8]])
#>
#> // Number of individuals: 10
#> // Group sizes: 10
#> // Number of alleles per locus: 16 11 10 9 12 8 12 12 18
#> // Number of alleles per group: 54
#> // Percentage of missing data: 0 %
#> // Observed heterozygosity: 0.5 0.9 0.7 0.7 0.4 0.5 0.7 0.7 0.4
#> // Expected heterozygosity: 0.66 0.73 0.75 0.66 0.69 0.82 0.66 0.75 0.7 Created on 2018-06-14 by the reprex package (v0.2.0). Immediate SolutionThe immediate solution is to specify summary(seppop(nancycats, drop = TRUE)[[8]])
#>
#> // Number of individuals: 10
#> // Group sizes: 10
#> // Number of alleles per locus: 6 5 7 5 7 7 4 6 7
#> // Number of alleles per group: 54
#> // Percentage of missing data: 0 %
#> // Observed heterozygosity: 0.5 0.9 0.7 0.7 0.4 0.5 0.7 0.7 0.4
#> // Expected heterozygosity: 0.66 0.73 0.75 0.66 0.69 0.82 0.66 0.75 0.7 Why this happened (gory details)I think this falls in the grey area between "feature" and "bug", though I'm leaning toward "bug". One of the features of the genind object is that it stores each allele as a separate column of a matrix. The boundaries of loci are denoted by the The problem is how this is computed. Currently, It's calculated by the above method: Line 19 in ac436e9
Instead, it could be calculated by counting up the number of non-zero columns per locus like so: present_alleles <- colSums(tab(x), na.rm = TRUE) > 0L
x@loc.n.all <- vapply(split(present_alleles, locFac(x)), sum, integer(1)) My only concern is that it may change people's validation workflows if they use I'll propose a change in this behavior, but @thibautjombart needs to approve it before it can be included in the next version. |
Thanks for the detailed explanation! That really helps! All the best, |
Hi,
I have 8 different populations, each with different number of alleles per locus. I used the total data file, including all populations, to make a GENIND object. I then used
seppop()
to be able to run some analyses on each population separately.However, when I use
summary()
to look at the basic statistics of a single population, the number of alleles per locus are incorrect. It displays the same numbers as for the total data file, instead of the numbers of alleles per locus for only the specific population.I have included an example below, which hopefully makes it more clear. The first summary is made using the GENIND object for the complete dataset 'all_data', which included 8 populations. I then used
seppop()
, and made a summary of only the 8th population. The number of alleles per locus are exactly the same as for the first summary, which is not correct. When I made a GENIND object with only the individuals of the 8th population, and usesummary()
on that, the correct number of alleles per locus are given. All other information, such as Hobs, Hexp and missing data seem to be correct, so the problem seems only to be with how the number of alleles per locus are calculated.I think this is a bug?
Thanks,
Anneke
The text was updated successfully, but these errors were encountered: