Fasta id should split after ANY whitespace #16

nh13 · 2023-11-29T17:49:35Z

No description provided.

markschl · 2024-02-17T11:52:51Z

Thanks for the pull request. I apologize for not yet looking at this. The reason is that, even though I agree that this is a good idea, I would prefer to treat byte strings (&[u8]) and &str types in a consistent way. Right now, the implementation mostly recognizes spaces and tabs, but id_desc() recognizes additional ASCII and Unicode characters by its use of char::is_whitespace. This may also have performance implications.

It may be worth doing some investigation on how other parsers handle the problem. Biopython uses title.split(None, 1)[0], which is similar to u8::is_ascii_whitespace resp. char::is_whitespace according to my (superficial) research.

In addition, str.split in Biopython recognizes runs of consecutive whitespace, so there will not be any descriptions starting with whitespace. However, I'm inclined not to follow that behaviour, since it is computationally more intensive and users wanting that behaviour can still trim the description by themselves using str::trim [unfortunately, slice::trim_ascii is not stable yet...].

Fasta id should split after ANY whitespace

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired

Verified
Learn about vigilant mode

f56314f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fasta id should split after ANY whitespace #16

Fasta id should split after ANY whitespace #16

nh13 commented Nov 29, 2023

markschl commented Feb 17, 2024

Fasta id should split after ANY whitespace #16

Are you sure you want to change the base?

Fasta id should split after ANY whitespace #16

Conversation

nh13 commented Nov 29, 2023

markschl commented Feb 17, 2024