To provide the metadata to the pipeline, you should create a TSV (tab-separated values) formatted file including the data and specify it as an argument. TSV file must include the header line indicating entries, and respectively ordered metadata of the genomes one per line.
There are sample metadata TSV files prepared for your guidance. Visit tutorial page to find out how to use them.
sample/meta_full.tsv contains fully constructed metadata of genomes in sample/seq/ directory.
sample/meta_simple.tsv contains minumum data of the genomes.
Running profile module
Run following command on your terminal to run UFCG pipeline interactively. Interactive mode will guide you through the options that pipeline requires, and automatically create the command to run the pipeline.
Single line command
You can also run the pipeline with a classic one-liner with options and arguments.
-i <DIR> : Locate the path of the input file/directory with fungal genome(s)
-o <DIR> : Locate the path of the output directory to store result files
Following options are not mandatory, but maybe useful to configurate your run.
-m <FILE> : Locate the path to the TSV file containing metadata
-c <INT> : Number of CPU thread(s) to use
-f <BOOL> : Force to overwrite the result files in output directory
-v : Make program verbose
To check the entire available options, run the pipeline with -h option.
The pipeline will extract the core gene profiles of given genome assemblies and store them as .ucg files.
.ucg file format
Files with the extension .ucg are JSON-formatted profiles containing extracted sequences of core genes, along with the metadata of the genome. These files can also be read and edited via any text editor.
Note the path to the output directory, or copy its contents into the other directory, to use it as an input of tree module.
With 32 CPU threads, profile module requires about 55 seconds to extract the UFCG marker genes from a fungal whole genome assembly.
Running tree module
Align genes and infer tree
Run following command on your terminal to align the genes and infer phylogenetic tree with UFCG pipeline.
-i <DIR> : Locate the path of the input .ucg profiles to align and infer tree
-l <LIST> : Name the leaves of the phylogenetic tree from the metadata
Select at least one from the following options and concatenate them with comma:
uid : Include unique integer ID
acc : Include accession number
label : Include full label
taxon : Include taxon name
strain : Include strain name
taxonomy : Include taxonomic relationship
Note that the options given must be included in the .ucg profile as a metadata.
-l uid : Include unique IDs only
-l acc,label,taxon : Include accession, label and taxon names
-l uid,acc,label,taxon,strain,taxonomy : Include all metadata
-o <DIR> : Locate the path of the directory for results
-n <STR> : Runtime name for this analysis
-a <STR> : Select sequence to align
nucleotide : Use given nucleotide sequence (default)
protein : Use given amino acid sequence
codon : Use codons encrypting given amino acid sequence
codon12 : Use codons without third bases
-t <INT> : Number of CPU threads to use
-p <BINARY> : Use different tree building program
To check the entire available options, run the module with -h option.
tree module will produce following result files. You may further analyze the trees with various phylogenetic tools that can handle Newick files. (MEGA, ETE, ape, etc.)
UFCG tree inferred from concatenated alignment
UFCG tree with GSI values computed with [N] genes, replacing bootstrap values
Gene tree inferred from each single gene
Concatenated sequences of entire core gene alignments
Aligned sequences of each single gene
Log file containing information about this run
JSON file containing entire trees and their metadata
With 32 CPU threads, tree module requires about 413 seconds to produce the trees from 30 UFCG profiles.
Replace labels of tree
Run following command to replace the name of leaves with a different format.
-i <TRM> : Locate the path of the .trm file from your run
-g <STR> : Specify a gene name to replace
Put UFCG to replace UFCG tree.
Put the name of gene (ex. RPB2) to replace the corresponding single gene tree.
-l : Leaf option identical to tree module
To replace the UFCG tree by using taxon names and strain names as leaves from the run myRun, execute:
To replace the RPB2 gene tree by using accessions and taxonomic relationships as leaves from the run Lorem_ipsum, execute:
Q. UFCG emits error "AUGUSTUS_CONFIG_PATH undefined or improperly defined". How can I solve it?
You have to define a local variable $AUGUSTUS_CONFIG_PATH to run AUGUSTUS properly. Run following code to define the variable.
If you are using bash, run this to semi-permanently add the variable on your system.