You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In main.rs the following optional inputs should be added and, if given by the user, replace the respective default set in default.rs.
Implementation
For all not required, non mandatory options, have the default value been used from default.rs. In main overwrite these default values, if and only if the user provides a custom value with the command line call.
Multiple vs single options
Please note, that prot-scriber makes use of clap which supports the usage of certain command line options multiple times. In prot-scriber these are among others e.g. seq-sim-table. This means that a user can call prot-scriber with multiple Blast result tables like this:
Inside prot-scriber the order of appearance of these multiple options is kept. Because in prot-scriber we also want to enable to also specify some options specific for a given input, it is important to note, that several input options can relate to each other by there order of appearance. See blacklist regex list below for a clear example.
Allow keyword default for all non mandatory options
Any not required option should be able to be provided with the default keyword. This is in order to enable any combination of positions sensitive custom and default options. See above "Multiple vs single options" for more details. So, make sure, that if a command-line options is provided with default the default value from default.rs is used for that specific option.
regex to split genes in gene families
This option should be named family-gene-separator-regex, the default has been discussed in another issue #6 .
regex to split gene identifier from gene-list in gene family input files
This option should be named family-id-gene-list-separator, the default has been discussed in the respective issue #6. Enable the user to provide her/his custom regular expression.
blacklist regex list
Imagine you want to provide a custom blacklist for your blast input table where you ran a search of your query proteins against a non standard database, e.g. a genome which does not adhere to the Uniprot stitle line standards. In order to provide such blacklists particular only for the blast input table of the same position this blacklist is stated, the user would do this:
Because the argument --blacklist-regex-list only appears once, while the argument --seq-sim-table appears twice the first --blacklist-regex-list argument will be applied on the --seq-sim-table my_proteins_vs_alien_genome.txt but not on the second --seq-sim-table my_proteins_vs_Swissprot_blast_tableout.txt.
filter regex list
Enable custom lists of regular expression which with the stitles of Blast Hits are filtered. Remember that filtering cuts out undesired parts of stitle strings. See model_funs.rsfilter_stitle for more details.
Note that this option is multiple and position sensitive like the above blacklist-regex-list. So, depending on the position, i.e. the times this option is provided with a value (valid file path), it applies to the corresponding --seq-sim-table.
informative regex list
Currently being implemented as specified here is a method to distinguish informative from un-informative words. Un-informative words are not removed from phrases, but are not scored either. They are detected by applying a list of regular expressions in sequence on each word. If any regex matches that indicates un-informativeness. Enable users to supply their own regex lists of un-informative words. Again, this options is position context sensitive. Its order of appearance ties it to the respective --seq-sim-table input at the same position (see above examples for more details on this).
The text was updated successfully, but these errors were encountered:
Since the informative regex list "is position context-sensitive" and it's optional for the user to provide a list of uninformative words, then what should be in the defaults? Should it be empty or are there obvious uninformative word examples that could be included?
In
main.rs
the following optional inputs should be added and, if given by the user, replace the respective default set indefault.rs
.Implementation
For all not required, non mandatory options, have the default value been used from
default.rs
. In main overwrite these default values, if and only if the user provides a custom value with the command line call.Multiple vs single options
Please note, that prot-scriber makes use of
clap
which supports the usage of certain command line options multiple times. In prot-scriber these are among others e.g.seq-sim-table
. This means that a user can call prot-scriber with multiple Blast result tables like this:Inside prot-scriber the order of appearance of these multiple options is kept. Because in prot-scriber we also want to enable to also specify some options specific for a given input, it is important to note, that several input options can relate to each other by there order of appearance. See blacklist regex list below for a clear example.
Allow keyword
default
for all non mandatory optionsAny not required option should be able to be provided with the
default
keyword. This is in order to enable any combination of positions sensitive custom and default options. See above "Multiple vs single options" for more details. So, make sure, that if a command-line options is provided withdefault
the default value fromdefault.rs
is used for that specific option.regex to split genes in gene families
This option should be named
family-gene-separator-regex
, the default has been discussed in another issue #6 .regex to split gene identifier from gene-list in gene family input files
This option should be named
family-id-gene-list-separator
, the default has been discussed in the respective issue #6. Enable the user to provide her/his custom regular expression.blacklist regex list
Imagine you want to provide a custom blacklist for your blast input table where you ran a search of your query proteins against a non standard database, e.g. a genome which does not adhere to the Uniprot
stitle
line standards. In order to provide such blacklists particular only for the blast input table of the same position this blacklist is stated, the user would do this:Because the argument
--blacklist-regex-list
only appears once, while the argument--seq-sim-table
appears twice the first--blacklist-regex-list
argument will be applied on the--seq-sim-table my_proteins_vs_alien_genome.txt
but not on the second--seq-sim-table my_proteins_vs_Swissprot_blast_tableout.txt
.filter regex list
Enable custom lists of regular expression which with the
stitle
s of Blast Hits are filtered. Remember that filtering cuts out undesired parts ofstitle
strings. Seemodel_funs.rs
filter_stitle
for more details.Note that this option is multiple and position sensitive like the above
blacklist-regex-list
. So, depending on the position, i.e. the times this option is provided with a value (valid file path), it applies to the corresponding--seq-sim-table
.informative regex list
Currently being implemented as specified here is a method to distinguish informative from un-informative words. Un-informative words are not removed from phrases, but are not scored either. They are detected by applying a list of regular expressions in sequence on each word. If any regex matches that indicates un-informativeness. Enable users to supply their own regex lists of un-informative words. Again, this options is position context sensitive. Its order of appearance ties it to the respective
--seq-sim-table
input at the same position (see above examples for more details on this).The text was updated successfully, but these errors were encountered: