Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read.genepop does return individual names #117

Closed
andersgs opened this issue Dec 10, 2015 · 13 comments
Closed

read.genepop does return individual names #117

andersgs opened this issue Dec 10, 2015 · 13 comments

Comments

@andersgs
Copy link

Hello Thibaut. Great package, really useful.

The new read.genepop() does not return individual names. It seems that the df2genind call at the end is missing the ind.names parameter:

 res <- df2genind(X = X, pop = pop, ploidy = 2, ncode = ncode)

Simple fix would be to:

res <- df2genind(X = X, pop = pop, ploidy = 2, ncode = ncode, ind.names = ind.names)

Thank you.

Anders.

@thibautjombart
Copy link
Owner

Hi Anders,

I don't think genepop file store individual labels actually.

Cheers

Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology
Imperial College London
https://sites.google.com/site/thibautjombart/
Twitter: @thibautjombart https://twitter.com/thibautjombart?lang=en-gb

On 10 December 2015 at 07:01, Anders Goncalves da Silva <
notifications@github.com> wrote:

Hello Thibaut. Great package, really useful.

The new read.genepop() does not return individual names. It seems that the
df2genind call at the end is missing the ind.names parameter:

res <- df2genind(X = X, pop = pop, ploidy = 2, ncode = ncode)

Simple fix would be to:

res <- df2genind(X = X, pop = pop, ploidy = 2, ncode = ncode, ind.names = ind.names)

Thank you.

Anders.


Reply to this email directly or view it on GitHub
#117.

@romunov
Copy link
Collaborator

romunov commented Dec 10, 2015

Small world. I'm importing a genepop file and we noticed the same behavior. I checked the genepop documentation on the file and it would appear individual names are optional. In fact, last individual name defines population name.

Here's an example from the website.

------------- The file starts here ---------------- 
Microsat on Chiracus radioactivus, a pest species 
     Loc1, Loc2, Loc3, Y-linked, Loc4 
POP 
AA8, 0405 0711 0304 0000      0505 
AA9, 0405 0609 0208 0000      0505 
A10, 0205 0609 0101 0000      0305 
A11, 0405 0606 0102 0000      0504 
A12, 0202 0609 0105 0000      0507 
A13, 0505 0909 0107 0000      0505 
A14, 0202 0609 0207 0000      0503 
A15, 0405 0609 0101 0000      0505 
Pop
AF, 0000 0000 0000 0000      0505 
AF, 0205 0307 0102 0000      0505 
AF, 0202 0609 0202 0000      0505 
AF, 0205 0909 0000 0000      0505 
AF, 0205 0307 0202 0000      0505 
AF, 0505 0303 0102 0000      0505 
AF, 0205 0700 0000 0000      0505 
AF, 0505 0900 0000 0000      0405 
AF, 0205 0600 0000 0000      0505 
AF, 0505 0606 0202 0000      0505 
pop 
C45, 0505 0606 0202 0000      0505 
C45, 0505 0909 0202 0000      0505 
C45, 0505 0306 0202 0000      0505 
C45, 0505 0909 0102 0000      0405 
C45, 0205 0303 0202 0000      0505 
C45, 0205 0909 0202 0000      0405 
------------- The file ends here -------------------

@thibautjombart
Copy link
Owner

Ha, this is why I hate this format..

The example above is a good example of terrible practice. We basically
have, in the same field, two different types of information recorded. The
best I could offer would be handling individual labels whenever they are
unique, but that wouldn't help with this example.

Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology
Imperial College London
https://sites.google.com/site/thibautjombart/
Twitter: @thibautjombart https://twitter.com/thibautjombart?lang=en-gb

On 10 December 2015 at 12:05, Roman Luštrik notifications@github.com
wrote:

Small world. I'm importing a genepop file and we noticed the same
behavior. I checked the genepop documentation
http://genepop.curtin.edu.au/help_input.html on the file and it would
appear individual names are optional. In fact, last individual name defines
population name.

Here's an example from the website.

------------- The file starts here ----------------
Microsat on Chiracus radioactivus, a pest species
Loc1, Loc2, Loc3, Y-linked, Loc4
POP
AA8, 0405 0711 0304 0000 0505
AA9, 0405 0609 0208 0000 0505
A10, 0205 0609 0101 0000 0305
A11, 0405 0606 0102 0000 0504
A12, 0202 0609 0105 0000 0507
A13, 0505 0909 0107 0000 0505
A14, 0202 0609 0207 0000 0503
A15, 0405 0609 0101 0000 0505
Pop
AF, 0000 0000 0000 0000 0505
AF, 0205 0307 0102 0000 0505
AF, 0202 0609 0202 0000 0505
AF, 0205 0909 0000 0000 0505
AF, 0205 0307 0202 0000 0505
AF, 0505 0303 0102 0000 0505
AF, 0205 0700 0000 0000 0505
AF, 0505 0900 0000 0000 0405
AF, 0205 0600 0000 0000 0505
AF, 0505 0606 0202 0000 0505
pop
C45, 0505 0606 0202 0000 0505
C45, 0505 0909 0202 0000 0505
C45, 0505 0306 0202 0000 0505
C45, 0505 0909 0102 0000 0405
C45, 0205 0303 0202 0000 0505
C45, 0205 0909 0202 0000 0405
------------- The file ends here -------------------


Reply to this email directly or view it on GitHub
#117 (comment)
.

@romunov
Copy link
Collaborator

romunov commented Dec 10, 2015

The function already isolates individual names but they're not used. They're replaced by numbers. This can be solved in two ways. Either changing line 638 or add argument ind.names to line 651.

If we decide to change this function, I would vote for the first option. I happen to have the patch ready in a separate branch (not yet thoroughly tested or pushed). :)

@thibautjombart
Copy link
Owner

Yes, we could decide to condition
https://github.com/thibautjombart/adegenet/blob/master/R/import.R#L638 on
the duplication of individual labels, and use them if there is no duplicate.

Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology
Imperial College London
https://sites.google.com/site/thibautjombart/
Twitter: @thibautjombart https://twitter.com/thibautjombart?lang=en-gb

On 10 December 2015 at 12:21, Roman Luštrik notifications@github.com
wrote:

The function already isolates
https://github.com/thibautjombart/adegenet/blob/master/R/import.R#L628
individual names but they're not used. They're replaced by numbers
https://github.com/thibautjombart/adegenet/blob/master/R/import.R#L638.
This can be solved in two ways. Either changing line 638
https://github.com/thibautjombart/adegenet/blob/master/R/import.R#L638
or add argument ind.names to line 651
https://github.com/thibautjombart/adegenet/blob/master/R/import.R#L651.

If we decide to change this function, I would vote for the first option.


Reply to this email directly or view it on GitHub
#117 (comment)
.

@romunov
Copy link
Collaborator

romunov commented Dec 10, 2015

And it brings us to #116. :)

@thibautjombart
Copy link
Owner

Yup. Bootstrap is a particular case though. As a rule, I would really like
to avoid tolerating label duplication anywhere.

Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology
Imperial College London
https://sites.google.com/site/thibautjombart/
Twitter: @thibautjombart https://twitter.com/thibautjombart?lang=en-gb

On 10 December 2015 at 12:31, Roman Luštrik notifications@github.com
wrote:

And it brings us to #116
#116. :)


Reply to this email directly or view it on GitHub
#117 (comment)
.

@romunov
Copy link
Collaborator

romunov commented Dec 10, 2015

I've made a push to a branch. I tested this one nancycats and my own file and it appears to work.

@thibautjombart
Copy link
Owner

Great, thanks

Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology
Imperial College London
https://sites.google.com/site/thibautjombart/
Twitter: @thibautjombart https://twitter.com/thibautjombart?lang=en-gb

On 10 December 2015 at 13:12, Roman Luštrik notifications@github.com
wrote:

I've made a push to a branch
https://github.com/thibautjombart/adegenet/tree/read.genepop_with_individual_names.
I tested this one nancycats and my own file and it appears to work.


Reply to this email directly or view it on GitHub
#117 (comment)
.

@andersgs
Copy link
Author

Thank you Roman and Thibaut. Yeah, I know it is an edge case, but it is an important one for me: github.com/andersgs/irelr.

I have tried running it with my proposed solution, and because it calls df2genind, it will automatically issue the warning and substitute the names for row numbers if there are duplicates.

Best. Anders.

@romunov
Copy link
Collaborator

romunov commented Dec 11, 2015

Can we merge the changes to master, then?

@thibautjombart
Copy link
Owner

Err..I can't see any pull request?
;)

Dr Thibaut Jombart
Lecturer, Department of Infectious Disease Epidemiology
Imperial College London
https://sites.google.com/site/thibautjombart/
Twitter: @thibautjombart https://twitter.com/thibautjombart?lang=en-gb

On 11 December 2015 at 08:18, Roman Luštrik notifications@github.com
wrote:

Can we merge the changes to master, then?


Reply to this email directly or view it on GitHub
#117 (comment)
.

zkamvar added a commit that referenced this issue Dec 11, 2015
This tests that the feature in #117 works.
@romunov romunov closed this as completed Dec 14, 2015
@andersgs
Copy link
Author

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants