Example-driven guide
Here we show usage examples of the lovis4u command-line interface. Through this guide we will show you step-by-step how to optimise your visualisation starting from default parameters.
Before start
Data preparation
The necessary sample data as well as adjustable tool configuration files are provided by lovis4u at the post-install step with the command:
lovis4u --data
This copies the lovis4u_data folder to your working directory.
If you work on a Linux machine: after installation you should run: lovis4u --linux
This command replaces the tool path (for MMseqs2) in the pre-made config files from the MacOS version (default) to the Linux version.
If you run this command for fun and want to change it back you can use lovis4u --mac
.
Downloading HMM models: LoVis4u uses pyhmmer for additional functional annotation of proteins with hmmscan versus a set of databases. You can download these database from our server (data-sharing.atkinson-lab.com/LoVis4u) by running the following command:
lovis4u --get-hmms
List of databases: AMR: AMRFinderPlus (v. 02.05.2024.2); Anti-defence: dbAPIS_Acr (v. 19.09.2023); Defence: DefenceFinder (v. 1.2.4), CasFinder (v. 3.1.0), PADLOC (v. 22.10.2024); Virulence: VFDB (v. 10.05.2024).
For demonstration we will use pharokka generated gff files with the sequences of five Enterobacteria P2-like phages.
Gff files are stored at: lovis4u_data/guide/gff_files.
The main difference between pharokka generated gff files and regular gff3 (for ex. which you can download from the NCBI) is that in addition to the annotation rows the annotation contains the corresponding nucleotide sequence in fasta format.
Building compatible gff files based on nucleotide sequences
If your query set of sequences for visualisation contains only nucleotide fasta files, below we will provide the efficient way of using pharokka (phage annotation pipeline) and prokka (prokaryotic genome annotation pipeline) for preparing gff files compatible with LoVis4u.
Using pharokka for annotation of phage genomes
Before running pharokka you need to install pharokka databases (and pharokka itself, of course). Due to the number of non-python dependencies we recommend using a conda environment for this task. See pharokka documentation page for clear instructions.
Firstly, you can merge multiple sequences into one fasta file (could be done simply by using cat: cat folder_with_fasta_files/*.fa > merged_fasta.fa
.
We recommend to merge multiple sequences to one file and using meta mode since in that case we do not load databases for each contig while annotating.
Then, you can use pharokka in meta mode with one command:
pharokka.py -i merged_fasta.fa -o pharokka_output --meta --split -t Num_of_threads \
--skip_mash --dnaapler -database path_do_pharokka_database
Gff files for each query contig will be stored at: pharokka_output/single_gffs
Using prokka for annotation of prokaryotic genomes
Prokka does not have an equivalent of meta mode which we used in pharokka in the example above. Therefore, for any number of input sequences we prefer running prokka independently for each contig instead of dividing more complex gff files, which requires an additional step. Again, for prokka installation instruction and parameter description see prokka documentation page.
In case your input fasta query files (one sequence per file) are located in single_records folders you can run:
for f in single_records/*.fa; do fb=$(basename $f); nm=${fb//.fa/}; \
echo prokka --outdir prokka/$nm --prefix $nm --quiet --cpus 1 $f ; \
done | parallel -j num_of_available_threads
Configuration files
LoVis4u has a set of editable configuration files with parameters already adjusted for different tasks and page layout:
- A4p1 - adjusted for one-column portrait A4 page layout. Figure width - 90mm.
- A4p2 - adjusted for two-column portrait A4 page layout. Figure width - 190mm.
- A4L - adjusted for landscape A4 page layout. Figure width - 240mm.
- standard - flexible width length, scale - 0.05 mm per nucleotide with minimal width - 80mm.
A4 page layout examples
By default, standard configuration file is used. You can change it simply by using -c\--config
parameter. For example, to use A4p2 config file use: lovis4u [other arguments] -c A4p2
. Below you can find examples with sample data.
If you want to edit some config parameters you can first download lovis4u_data folder (see above) to your working directory with lovis4u --data
command. In the folder you can find all configuration files. You can then edit any parameters and its name there and specify path to it using the same -c\--config path_to_new_file.cfg
parameter. Even if you break something there, you can always download default config files again with the same lovis4u --data
command.
Example run with default parameters
Let's start with running lovis4u without using any optional arguments. The only mandatory argument is a folder path containing pharokka generated gff (-gff
) files or genbank files (-gb
).
Running this command will create an output folder named lovis4u_{current_date} (e.g. lovis4u_2024_04_28-16_36) containing the results files. The name of the output folder can be changed with -o <output_folder_name>
.
Output folder structure
- lovis4u.pdf - vector graphic output (file name can be changed with
--pdf-name <filename>
parameter) - locus_annotation_table.tsv - table containing annotation (sequence_id, length, coordinates, etc.) for each locus.
- feature_annotation_table.tsv - table containing annotation (feature_id, locus_id, coordinates, etc.) for each feature (e.g. CDS)
- mmseqs (folder)
- DB - folder with MMseqs databases.
- mmmseqs_clustering.tsv - table with proteomes clustering results.
- mmmseqs_(stdout/stderr).txt - mmseqs logs.
- input_proteins.fa - fasta file with all annotated protein sequences (input to mmseqs).
- proteome_similarity_matrix.tsv - pairwise proteome similarity scores indicating the fraction of shared proteins homologues.
lovis4u data pre-processing steps
By default, lovis4u uses the following data pre-processing steps:
(For details see the Pipeline description page)
- The full length of each input sequence is taken for analysis (this can be adjusted using the locus annotation table, see below)..
- All proteins annotated to be encoded on input sequences are used as input for MMseqs2 clustering (can be deactivated with --mmseqs-off parameter). MMseqs2 arguments can be adjusted using config file. Proteins clustered together are considered to be a set of homologues. Based these results, the "group" attribute of each CDS is set.
- Taking into account the information about the set of homologues from the previous step, lovis4u applies similarity based hierarchical clustering of proteomes, which it uses to find the optimal order for visualisation and sets the "group" attribute for each locus. The purpose is to group together only related proteomes (keeping the average proteome set similarity > ~80%). This step can be skipped with the
-cl-off, --clust_loci-off
parameter. - Defining the feature attribute "group_type". This allows the use of parameters that refer to a particular set of feature groups (e.g. set colour or show labels only for "group_type" = "variable"). By default it sets group_type "variable" for CDS features that are found in less than 0.25 of loci within the locus group, and "conserved" for proteins encoded in more than 0.75 of loci within a group. Others are "intermediate"
- Setting feature colour based on the feature "group" attribute (can be deactivated with -sgc-off or --set-group- colour-off). By default loVis4u sets distinct colours only for features with group_type "variable". You can change this with -sgcf or --set-group-colour-for. For example, if you want to set colour only for features with group_type "conserved", run
--set-group-colour-for conserved
. - Defining labels to be shown. By default, lovis4u shows all labels for "variable" features and only the first occurrence for "conserved" features. You can show all labels with
--show-all-feature-labels
or specify group types for which all labels will be shown with-sflf, --show-feature-label-for <classes>
. Additionally, by default lovis4u ignores the following labels: hypothetical protein, unknown protein. The list of ignored labels can be set with- ifl, --ignored-feature-labels <feature_label1 [feature_label2 ...]>
. The list with this argument can be left empty to not filter out labels by their name.
Useful parameters for basic runs
The following parameters are particularly useful for basic runs:
--reorient_loci
- Auto re-orient loci (set new strands) if they are not matched. This function tries to maximise co-orientation of homologous features.-hl
,--homology-links
- Draw homology link track (Sankey graph).-o <name>
- Output dir name.-c <name>
- Name of the configuration file.-scc
,--set-category-colour
- Set category colour for features and plot category colour legend (Initially designed for pharokka generated gff files, see more detailed description below).-w, --windows <locus_id1:start1:end1:strand [locus_id1:start1:end1:strand ...]>
- Specify window of visualisation (coordinates) for a locus or multiple loci.
While loci in our test set are already correctly orientated, let's add -hl
parameter to draw homology line track and select configuration file for two-column A4 page layout with -c A4p2
parameter. Output will be adjusted for publication figure (for instance, in terms of font sizes and figure width set as 190mm). In addition we will plot functional categories of CDSs with --set-category-colour
parameter.
Alternatively, we can select A4L configuration file adjusted for A4 page landscape layout:
As you can see, the style is slightly different since more available space is expected and font size and object sizes adjusted for 240 mm width. For instance, we have individual x-axis for each track (which can be activated in any run with -sxa
, --show-x-axis
parameter). Also, position of sequence label is on the left, not in the bottom, which also can be easily changed with -llp
, --locus-label-position
parameter (e.g. -llp left
or -llp bottom
).
Using the locus annotation table
As mentioned above, the full length of each locus is taken for visualisation by default. However, you can also specify the sub-coordinates of multiple regions for each locus to be shown. The coordinates together with other information about each locus can be specified in the locus annotation table and used as input with -laf
or --locus-annotation-file
parameter.
Additionally, after each run lovis4u saves the locus_annotation_table.tsv with annotation parameters used in this particular run. If no table was specified by input then all annotation columns are set with default values.
Default table generated from previous runs:
sequence_id | length | coordinates | circular | description | order | group | initial_order |
---|---|---|---|---|---|---|---|
NC_005056.1 | 32684 | 1:32684:1 | 1 | Escherichia phage Wphi | 0 | 1 | 4 |
NC_049457.1 | 33807 | 1:33807:1 | 1 | Escherichia phage vB_EcoM-12474III | 1 | 1 | 2 |
NC_001895.1 | 33593 | 1:33593:1 | 1 | Enterobacteria phage P2 | 2 | 1 | 1 |
NC_049461.1 | 31834 | 1:31834:1 | 1 | Bacteriophage R18C | 3 | 1 | 0 |
NC_028943.1 | 29237 | 1:29237:1 | 1 | Escherichia phage pro483 | 4 | 1 | 3 |
The locus annotation table can also be used as input for customising the regions and orientations of the sequences shown. After the default run we can take the output locus_annotation_table and edit the information we want to change.
Important to note: it is possible to input a table containing only a subset of columns (only the sequence_id column is essential). For other columns or empty cells, lovis4u will set the default values.
For example, let's use this table as input:
sequence_id | coordinates | order | group |
---|---|---|---|
NC_005056.1 | 18320:24320:1,31684:32684:1,1:9360:1 | 0 | 1 |
NC_049461.1 | 16745:22745:1,30834:31834:1,1:7016:1 | 1 | 1 |
NC_001895.1 | 18294:24294:1,32593:33593:1,1:8460:1 | 2 | 1 |
NC_049457.1 | 19308:25308:1,32807:33807:1,1:9474:1 | 3 | 1 |
NC_028943.1 | 14839:20839:1,28237:29237:1,1:6321:1 | 4 | 1 |
Here we specified only the coordinates, order and group for each locus. Order and group are also specified and kept since it is logical to use clustering results defined during the full locus length run, and to turn off a new attempt to cluster sequences. This is done with the -cl-off, --clust_loci-off
parameter.
Format for coordinates specification: comma-separated list of start:end:strand. Start and end are in 1-based format, strand: 1 for plus strand and -1 for minus.
The table can be found in the guide folder: lovis4u_data/guide/locus_annotation_table_demo_1.tsv
Important to note: in order to define classes of proteins ("conserved" (core) or "variable" (accessory)) LoVis4u clusters encoded proteins using MMseqs2 as was described above. When the full length sequences are plotted it is obvious that all proteins are used. However, when you visualise only subsequences of contigs then you have two options: 1) cluster only proteins that are overlapped with the selected region; 2) cluster all proteins including those that are outside of selected regions. By default the second option is used in order to avoid a situation where an ORF is labelled "variable", when in reality it is encoded in each proteome, but outside of the coordinates shown. However, if you need to consider only proteins overlapped with visualisation window, you can force to cluster only them by using -cl-owp
, --cluster-only-window-proteins
or cluster_all_proteins config parameter setting its value as "False".
Now we can run:
lovis4u -gff lovis4u_data/guide/gff_files -hl -o lovis4u_output \
--locus-annotation-file lovis4u_data/guide/locus_annotation_table_demo_1.tsv -cl-off
Using the feature annotation table
Similarly to the locus annotation table, we can use the feature annotation table to adjust the feature visualisation parameters. Here, feature usually refers to an open reading frame, each of which has its own ID. After each run, a feature_annotation_table.tsv file is saved in your output folder. Below you can see header of default table created with default parameter run.
feature_id | locus_id | coordinates | feature_type | name | group | group_type | category | fill_colour | stroke_colour | show_label |
---|---|---|---|---|---|---|---|---|---|---|
NC_001895.1_CDS_0001 | NC_001895.1 | 64:2286:1 | CDS | nicking at origin of replication | NC_005056.1_CDS_0001 | conserved | "DNA, RNA and nucleotide metabolism" | default | default | 0 |
NC_001895.1_CDS_0002 | NC_001895.1 | 2283:2612:1 | CDS | hypothetical protein | NC_005056.1_CDS_0002 | intermediate | unknown function | default | default | 0 |
NC_001895.1_CDS_0003 | NC_001895.1 | 2651:3412:-1 | CDS | hypothetical protein | NC_001895.1_CDS_0003 | variable | unknown function | #f165db | default | 0 |
NC_001895.1_CDS_0004 | NC_001895.1 | 3587:5347:-1 | CDS | hypothetical protein | NC_001895.1_CDS_0004 | variable | unknown function | #95a134 | default | 0 |
NC_001895.1_CDS_0005 | NC_001895.1 | 5347:5517:-1 | CDS | hypothetical protein | NC_001895.1_CDS_0005 | variable | unknown function | #9e9e34 | default | 0 |
As with the locus annotation table, we can tweak the parameters and use this as input in subsequent runs. This is done with the parameter -faf, --feature-annotation-file
, a table that contains only the subset of annotation columns that are of interest. For example, you can specify a new label and colour for those particular coding sequences (CDS) that you are interested in.
feature_id | name | fill_colour | show_label |
---|---|---|---|
NC_001895.1_CDS_0001 | Updated Name :) | #e54e86 | 1 |
This table can be found in the guide folder: lovis4u_data/guide/feature_annotation_table_one_row.tsv
lovis4u -gff lovis4u_data/guide/gff_files -hl -o lovis4u_output \
--locus-annotation-file lovis4u_data/guide/locus_annotation_table_demo_1.tsv -cl-off \
--feature-annotation-file lovis4u_data/guide/feature_annotation_table_one_row.tsv
Functional annotation of proteins with pyhmmer
Starting with version 0.0.11, LoVis4u has a module for functional annotation of proteins with pyhmmer hmmscan. By default, the following set of hmm model databases is available: defence (DefenseFinder), defence (PADLOC),virulence (VFDB), anti-defence (dbAPIS), AMR (AMRFinderPlus). They can be downloaded with lovis4u --get-hmms
command.
By default, LoVis4u runs pyhmmer hmmscan versus the provided set of hmm models if -hmmscan
, --run-hmmscan
option is specified:
lovis4u -gff lovis4u_data/guide/gff_files -hl --set-category-colour -c A4p2 --run-hmmscan -o lovis4u_hmmscan
In addition to visualisation, hmmscan folder with search results is saved to the output directory. As you can see, LoVis4u replaces category and name attributes of CDSs that have hits with search. You can keep default names (labels) using -kdn
, --keep-default-name
parameter and default category with -kdc
, --keep-default-category
option. Also, if you want to show all labels for proteins with hits (for instance, DruM2 label is shown only for the first occurrence in the figure above) you can use -salq
, --show-all-labels-for-query
parameter.
In addition to visualisation, the hmmscan folder with search results is saved to the output directory. LoVis4u replaces the category and name attributes of CDSs that have hits. You can keep the default names (labels) using the -kdn
, --keep-default-name
parameter and the default category with -kdc
, --keep-default-category
option. Also, if you want to show all labels for the proteins with hits (for instance, the DruM2 label is shown only for the first occurrence in the figure above) you can use the -salq
, --show-all-labels-for-query
parameter.
Selecting defence system database
Since for the defence systems we have two databases: PADLOC and DefenseFinder, a user can specify which one to use for annotation, while by default both are used. To do that you can use -dm
, --defence-models
parameter with one of the three option: PADLOC, DefenseFinder or both. In case a protein has a hit to both databases, target with lowest e-value is kept.
P2 phage is most suitable for demonstration of this parameter since Tin proteins model can be found only in PADLOC database, while Old protein has a lowest e-value for DefenseFinder database model. To choose only PADLOC database models for search you can use -dm PADLOC
:
Since we have two defence system databases: PADLOC and DefenseFinder, it is possible to specify which to use for annotation (by default both are used). This is done with the -dm
, --defence-models
parameter with one of the three options: PADLOC, DefenseFinder or both. In case a protein has a hit to both databases, the target with the lowest e-value is kept.
The P2 phage and its Tin/Old hotspot defence island is a good example for demonstrating this parameter since the Tin protein model is only in the PADLOC database, while the Old protein has a lowest e-value for the DefenseFinder database model. To choose only PADLOC database models for search you can use -dm PADLOC
:
lovis4u -gff lovis4u_data/guide/gff_files/NC_001895.1.gff --set-category-colour -c A4p2 \
--run-hmmscan -dm PADLOC -o lovis4u_hmmscan_PADLOC
Similarly, you can choose to use DefenseFinder models only with -dm DefenseFinder
. As mentioned above, in that case the Tin annotation is absent.
lovis4u -gff lovis4u_data/guide/gff_files/NC_001895.1.gff --set-category-colour -c A4p2 \
--run-hmmscan -dm DefenseFinder -o lovis4u_hmmscan_DF
How to use your own HMM models
LoVis4u also allows the use of your own HMM models. You can specify these using -hmm, --add-hmm-models <folder_path [name]>
. The folder should contain files in HMMER format (one file per model). Usage: -hmm path [name]
. Specifying the name is optional; by default it will be taken from the folder name. If you want to add multiple HMM databases you can use this argument mulitple times: -hmm path1 [name1] -hmm path2 [name2] ...
.
Finally, if you want to search only against your models excluding default set, you can add -omh, --only-mine-hmms
parameter in addition to -hmm
option.
Visualisation of genomic signal tracks from sequencing experiments
Basic usage
Starting with version 0.1.0, LoVis4u supports visualisation of genomic signal tracks from sequencing experiments using bedGraph file as input for a single track. We will demonstrate this functionality using three DNA sequencing coverage profiles for BASEL collection phage Bas01. To specify path to bedgraph files you can use argument -bg, --bedgraphs <bedgraph_file1 [bedgraph_file2 ...]>
which takes space separated list of file paths as input.
lovis4u -gff lovis4u_data/guide/BaselCollection/Bas01.gff -c A4p2 \
-bg lovis4u_data/guide/bedgraphs/Bas01_DNA-Seq_WT.bedGraph lovis4u_data/guide/bedgraphs/Bas01_DNA-Seq_1.bedGraph lovis4u_data/guide/bedgraphs/Bas01_DNA-Seq_2.bedGraph
User-defined labels and colours and GC content tracks
By default, basename of the corresponding file is used as track label, but you can manually specify labels using -bgl, --bedgraph-labels <bedgraph_label1 [bedgraph_label2 ...]>
which takes as input space separated list of labels with the same order as input files. In addition, it can be useful to visualise GC content and GC skew in parallel with signal tracks. You can add GC content and GC skew tracks by using parameters -gc
and -gc_skew
. Finally, we will add visualisation of functional categories using -scc
argument.
lovis4u -gff lovis4u_data/guide/BaselCollection/Bas01.gff -c A4p2 -scc -gc -gc_skew \
-bg lovis4u_data/guide/bedgraphs/Bas01_DNA-Seq_WT.bedGraph lovis4u_data/guide/bedgraphs/Bas01_DNA-Seq_1.bedGraph lovis4u_data/guide/bedgraphs/Bas01_DNA-Seq_2.bedGraph \
-bgl "Bas01 WT | DNA-Seq" "Bas01 E1 | DNA-Seq" "Bas01 E2 | DNA-Seq"
You can also specify colours for your coverage profiles using -bgc, --bedgraph-colours <bedgraph_colour1 [bedgraph_colour2 ...]>
argument (which acts similarly to the bedgraph label argument. Here you can specify colours either by HEX code or by name which present in the palette file. If length of colour list is less than number of bedgraph profiles then they will be used circularly (particularly, if one colour is specified then only it will be used). Below we will demonstrate usage with specified HEX colour codes with -bgc #CDEE40 #E96C0C #8BD4F7
.
lovis4u -gff lovis4u_data/guide/BaselCollection/Bas01.gff -c A4p2 -scc -gc -gc_skew \
-bg lovis4u_data/guide/bedgraphs/Bas01_DNA-Seq_WT.bedGraph lovis4u_data/guide/bedgraphs/Bas01_DNA-Seq_1.bedGraph lovis4u_data/guide/bedgraphs/Bas01_DNA-Seq_2.bedGraph \
-bgl "Bas01 WT | DNA-Seq" "Bas01 E1 | DNA-Seq" "Bas01 E2 | DNA-Seq" \
-bgc #CDEE40 #E96C0C #8BD4F7
User-defined window
In addition to locus annotation table, you can specify window(s) for visualisation using command-line argument -w, --windows
. Format: -w, --windows <locus_id1:start1:end1:strand [locus_id1:start1:end1:strand ...]>
. For instance, if you want to show single region for our locus you can use: -w Bas01:1:15000:1
. If you want to show multiple regions for this locus you can specify them within one comma separated string; for instance: -w Bas01:1:15000:1,Bas01:16000:31000:1
. Space separation in this argument can be used if you have multiple loci in visualisation and want to define window for several of them.
lovis4u -gff lovis4u_data/guide/BaselCollection/Bas01.gff -c A4p2 -scc -gc -gc_skew \
-bg lovis4u_data/guide/bedgraphs/Bas01_DNA-Seq_WT.bedGraph lovis4u_data/guide/bedgraphs/Bas01_DNA-Seq_1.bedGraph lovis4u_data/guide/bedgraphs/Bas01_DNA-Seq_2.bedGraph \
-bgl "Bas01 WT | DNA-Seq" "Bas01 E1 | DNA-Seq" "Bas01 E2 | DNA-Seq" \
-w Bas01:1:15000:1,Bas01:16000:31000:1
Other LoVis4u features
Category colour and annotation
Using the parameter --set-category-colour
you can use the functional category annotation column for features. The functional category is shown by coloured lines below each ORF. By default it was designed to parse the PHROGs category annotation for proteins and retrieve information about the category in the "function" qualifiers in Genbank or GFF files (qualifiers used can be changed in config file) that is part of default output with pharokka annotation. However, you can set the category for each CDS using the above feature annotation table and the "category" column. Additionally, you can set the colour codes for your categories using --category-colour-table
. For categories that are not found in a table, a random colour will be set. By default, lovis4u uses a pre-made colour table which can be found in lovis4u_data/category_colours.tsv.
Individual x-axis and locus label position
In most configuration files (except the A4L for A4 landscape layout) LoVis4u indicates coordinates for visualisation inline with locus label. However, if you prefer a locus label to be placed on the left side you can use the following parameter: -llp left
or --locus-label-position left
. When left position for label is selected you can choose to draw individual x-axis line for each locus track using the parameter -sxa
, --show-x-axis
. For example:
Alternatively, if you use, for instance, A4L configuration file for A4 landscape layout in which individual x-axis are shown by default you can change it. Using the parameter --hide-x-axis
you can deactivate visualisation of the x-axis for each locus track and instead, if you wish, draw a scale line track below. This is done using with -slt, --scale-line-track
. Similarly, you can change locus label position to the bottom with -llp bottom
parameter.
Highlighting conserved genes instead of variable
For many analysis purposes (e.g. conserved neighbourhood visualisation), it is more useful to colourise conserved gene clusters instead of the variable genes. This can be easily switched in lovis4u using the --set-group-colour-for
parameter. Te default is to colour by variable but using --set-group-colour-for conserved
will change it to the opposite mode.
Note that if you have other feature group set in your feature annotation table and you want to set auto-colourising for them as well, you can specify them in a space separated list with this argument (e.g. --set-group-colour-for conserved your_group_1 your_group_2
).
Note:By default, the colours for groups are randomly set for each group using the seaborn husl palette. In the config file you can change to a more intense hsl palette or change the desaturation parameter.
Visualisation of a single sequence
LoVis4u is able to visualise only one sequence. In that case no special parameters are required. However, if you want to highlight each protein group of homologues with a particular colour (protein sequences still will be clustered) you can use the following parameter: --set-group-colour-for undefined
which will highlight all genes for single sequence (default value for protein group class is "undefined" which can be changed after clustering and calculating fractions for each gene group within a cluster of proteomes if multiple loci are used for input).
lovis4u -gff lovis4u_data/guide/gff_files/NC_001895.1.gff -hl --set-category-colour -c A4p2 \
--set-group-colour-for undefined
Starting with version 0.1.0, new tracks for visualisation of GC and GC skew were added. They can be added simply by using parameters -gc
and -gc_skew
.
lovis4u -gff lovis4u_data/guide/gff_files/NC_001895.1.gff -hl --set-category-colour -c A4p2 \
-gc -gc_skew
Specifying window for visualisation (coordinates)
In addition to locus annotation table, you can specify window(s) for visualisation using command-line argument -w, --windows
. Format: -w, --windows <locus_id1:start1:end1:strand [locus_id1:start1:end1:strand ...]>
. For instance, if you want to show single region for our locus from the last example you can use: -w NC_001895.1:1:15000:1
. If you want to show multiple regions for this locus you can specify them within one comma separated string. For instance: -w NC_001895.1:1:15000:1,NC_001895.1:16000:30000:1
lovis4u -gff lovis4u_data/guide/gff_files/NC_001895.1.gff -hl --set-category-colour -c A4p2 \
-gc -gc_skew \
-w NC_001895.1:1:15000:1,NC_001895.1:16000:30000:1
This argument works even if you use miltople loci. In that case you can specify their coordinate separating arguments by space. For instance: -w NC_001895.1:1:15000:1 Locus_id_1:1:20000:1 Locus_id_2:1000:20000:1 ...
Visualisation of non-coding features and control of their labels
Starting with 0.0.10 version, LoVis4u is able to parse non-coding features like tRNAs, tmRNAs, and pseudogenes. They do not contribute to the sequence clustering, however, tRNAs with the same anti-codon are considered as members of the same group and can be connected by homology lines. In addition, by default, non-coding features are not labeled. These are shown with a CDS-like polygon with no fill colour, embedded in a box.
For demonstration of such feature in the example below we show tRNA-encoding regions of Bas01 and Bas03 phages.
lovis4u -gff lovis4u_data/guide/BaselSubset -laf lovis4u_data/guide/locus_annotation_table_trnas.tsv \
-hl --set-category-colour -c A4p1
By default we do not show tRNA name labels since these genes are often found in arrays and visualisation of many labels on a very short locus is challenging. Instead, there is a category colour for each type of non-coding feature. However, if required, tRNA name labels can be turned on. Firstly, if you want to show labels for all non-coding features you can use -snl
. --show-noncoding-labels
command.
lovis4u -gff lovis4u_data/guide/BaselSubset -laf lovis4u_data/guide/locus_annotation_table_trnas.tsv \
-hl --set-category-colour -c A4p1 --show-noncoding-labels
Both phages have tRNAs with the same anticodon and then have the same "group" (analogue of homologous group for proteins). In that case or similar situation it can be logical to show labels only on first occurrence as it's done for conserved proteins. To do that, you can use -sfnl, --show-first-noncoding-label
lovis4u -gff lovis4u_data/guide/BaselSubset -laf lovis4u_data/guide/locus_annotation_table_trnas.tsv \
-hl --set-category-colour -c A4p1 --show-first-noncoding-label
Specifying figure width and more compact visualisation.
For the "standard" configuration file Lovis4u tries to set an optimal figure width taking into account the nucleotide size of visualisation window. At the same time, LoVis4u has a set of adjusted configuration file with fixed figure width for one-, two-column, and landscape A4 page layout figures. However, in any case you still can adjust figure width if needed. You can do it in two ways:
1) Using the --mm-per-nt <float value>
argument changing scale which defines given space for each nt cell on canvas. Default: 0.005.
2) With -fw, --figure-width <float value [mm]>
parameter which defines the total output figure width in mm.
Here we demonstrate usage by plotting a compact visualisation of full loci together with --show-first-feature-label-for
argument with an empty list, so not labelling the first occurrence of conserved genes. Note that most of the coloured, variable genes are not labeled in this figure as they are annotated as ‘hypothetical protein’, which LoVis4u ignores to make figures more compact and informative. You can deactivate ignoring of these labels by using -ifl
, --ignored-feature-labels
argument, similarly providing an empty list an argument.
lovis4u -gff lovis4u_data/guide/gff_files -hl -o width_test --show-first-feature-label-for --figure-width 70