Skip to content

Latest commit

 

History

History
48 lines (44 loc) · 3.01 KB

usage-train.md

File metadata and controls

48 lines (44 loc) · 3.01 KB

Usage of Training

Type

./clamsa.py train -h

to obtain the current usage:

./clamsa.py train -h
usage: clamsa.py [-h] [--basenames BASENAMES [BASENAMES ...]] [--clades CLADES [CLADES ...]] [--merge_behaviour MERGE_BEHAVIOUR [MERGE_BEHAVIOUR ...]] [--tuple_length TUPLE_LENGTH] [--split_specifications SPLIT_SPECIFICATIONS] [--use_amino_acids]
                 [--use_codons] [--model_hyperparameters MODEL_HYPERPARAMETERS] [--model_training_callbacks MODEL_TRAINING_CALLBACKS] [--batch_size BATCH_SIZE] [--batches_per_epoch BATCHES_PER_EPOCH] [--epochs BATCH_SIZE]
                 [--log_basedir LOG_BASEDIR] [--saved_weights_basedir SAVED_WEIGHTS_BASEDIR] [--verbose]
                 INPUT_DIR

Train a series of models and hyperparameter configurations on an input multiple sequence alignment dataset generated by clamsa.

positional arguments:
  INPUT_DIR             Folder in which the converted MSA database should be stored. By default the folder "msa/" is used.

optional arguments:
  -h, --help            show this help message and exit
  --basenames BASENAMES [BASENAMES ...]
                        The base name of the input files.
  --clades CLADES [CLADES ...]
                        Path(s) to the clades files (.nwk files, with branch lengths) used in the converting process. CAUTION: The same ordering as in the converting process must be used!
  --merge_behaviour MERGE_BEHAVIOUR [MERGE_BEHAVIOUR ...]
                        In which ratio the respective splits for each basename shall be merged. The possible modes are: "evenly", "w_1 ... w_n". Where "evenly" means all basenames have the same weight. A set of costum weights can be given
                        directly. Default is "evenly".
  --tuple_length TUPLE_LENGTH
                        The MSAs will be exported as n-tupel-aligned sequences instead of nucleotide alignments where n is the tuple_length. If n = 3, you can use the flag --use_codons instead.
  --split_specifications SPLIT_SPECIFICATIONS
                        see test/train.sh for an example
  --use_amino_acids     Use amino acids instead of nucleotides as alphabet.
  --use_codons          The MSAs were exported as codon-aligned codon sequences instead of nucleotide alignments.
  --model_hyperparameters MODEL_HYPERPARAMETERS
                        see test/train.sh for an example
  --model_training_callbacks MODEL_TRAINING_CALLBACKS
  --batch_size BATCH_SIZE
                        Number of MSAs per training batch.
  --batches_per_epoch BATCHES_PER_EPOCH
                        Number of training batches in each epoch.
  --epochs BATCH_SIZE   Number of epochs per hyperparameter configuration.
  --log_basedir LOG_BASEDIR
                        Folder in which the Tensorboard training logs should be stored. Defaults to "./logs/"
  --saved_weights_basedir SAVED_WEIGHTS_BASEDIR
                        Folder in which the weights for the best performing models should be stored. Defaults "./saved_weights/"
  --verbose             Whether training information should be printed to console.