Type
./clamsa.py train -h
to obtain the current usage:
./clamsa.py train -h
usage: clamsa.py [-h] [--basenames BASENAMES [BASENAMES ...]] [--clades CLADES [CLADES ...]] [--merge_behaviour MERGE_BEHAVIOUR [MERGE_BEHAVIOUR ...]] [--tuple_length TUPLE_LENGTH] [--split_specifications SPLIT_SPECIFICATIONS] [--use_amino_acids]
[--use_codons] [--model_hyperparameters MODEL_HYPERPARAMETERS] [--model_training_callbacks MODEL_TRAINING_CALLBACKS] [--batch_size BATCH_SIZE] [--batches_per_epoch BATCHES_PER_EPOCH] [--epochs BATCH_SIZE]
[--log_basedir LOG_BASEDIR] [--saved_weights_basedir SAVED_WEIGHTS_BASEDIR] [--verbose]
INPUT_DIR
Train a series of models and hyperparameter configurations on an input multiple sequence alignment dataset generated by clamsa.
positional arguments:
INPUT_DIR Folder in which the converted MSA database should be stored. By default the folder "msa/" is used.
optional arguments:
-h, --help show this help message and exit
--basenames BASENAMES [BASENAMES ...]
The base name of the input files.
--clades CLADES [CLADES ...]
Path(s) to the clades files (.nwk files, with branch lengths) used in the converting process. CAUTION: The same ordering as in the converting process must be used!
--merge_behaviour MERGE_BEHAVIOUR [MERGE_BEHAVIOUR ...]
In which ratio the respective splits for each basename shall be merged. The possible modes are: "evenly", "w_1 ... w_n". Where "evenly" means all basenames have the same weight. A set of costum weights can be given
directly. Default is "evenly".
--tuple_length TUPLE_LENGTH
The MSAs will be exported as n-tupel-aligned sequences instead of nucleotide alignments where n is the tuple_length. If n = 3, you can use the flag --use_codons instead.
--split_specifications SPLIT_SPECIFICATIONS
see test/train.sh for an example
--use_amino_acids Use amino acids instead of nucleotides as alphabet.
--use_codons The MSAs were exported as codon-aligned codon sequences instead of nucleotide alignments.
--model_hyperparameters MODEL_HYPERPARAMETERS
see test/train.sh for an example
--model_training_callbacks MODEL_TRAINING_CALLBACKS
--batch_size BATCH_SIZE
Number of MSAs per training batch.
--batches_per_epoch BATCHES_PER_EPOCH
Number of training batches in each epoch.
--epochs BATCH_SIZE Number of epochs per hyperparameter configuration.
--log_basedir LOG_BASEDIR
Folder in which the Tensorboard training logs should be stored. Defaults to "./logs/"
--saved_weights_basedir SAVED_WEIGHTS_BASEDIR
Folder in which the weights for the best performing models should be stored. Defaults "./saved_weights/"
--verbose Whether training information should be printed to console.