Command-line VEP via Docker, Demo
Running command-line VEP via Docker
If you don’t have root privileges, you can run VEP in a virtualised container via Docker. You can download and install Docker on Mac, Windows and Linux.
Preparing and running Docker
You will need to install or update Rosetta 2 to run Docker:
softwareupdate --install-rosetta
Create a directory for the course:
mkdir /Desktop/VEP/Plants cd /Desktop/VEP/Plants
Start Docker:
open –a Docker
Download the Ensembl Plants VEP Docker image:
docker pull csicunam/bioinformatics_iamz
Create a Docker working directory to mount to the Docker image (you can save your input files in here and all outputs will be written in this directory):
mkdir vep_data
Some operating systems may require root privileges:
chmod 777 vep_data
Run the Docker image and mount the directory you just created:
docker run -t -i -v $HOME/Desktop/VEP/Plants/vep_data:/data csicunam/bioinformatics_iamz:latest
Exit Docker
exit
Docker options
The following options are available for docker run
:
--interactive
or-i
Keep STDIN open even if not attached
--tty
or-t
Allocate a pseudo-TTY
--volume
or-v
Bind mount a volume (this is your local working directory)
--env
or-e
Set environment variables
Running VEP via Docker
Download the indexed cache file to your working directory and unpack in your local directory. You can find all available VEP index files on the Ensembl Genomes FTP site:
curl -O http://ftp.ensemblgenomes.org/pub/plants/current/variation/indexed_vep_cache/oryza_sativa_vep_57_IRGSP-1.0.tar.gz tar xzf oryza_sativa_vep_57_IRGSP-1.0.tar.gz
Run VEP within your Docker image. The directory /data
within your Docker image is equivalent to your local working directory:
docker run -t -i -v $HOME/Desktop/VEP/Plants/vep_data:/data csicunam/bioinformatics_iamz:latest \ vep -i variant_data/rice_variants.vcf -o /data/output.txt --dir /data --cache \ --cache_version 57 --genomes --species oryza_sativa --force_overwrite --check_existing -offline
View output.txt
here.
If you are already within a Docker session, only run the vep code (see an example below). The directory /data
within your Docker image is equivalent to your local working directory:
vep -i variant_data/rice_variants.vcf -o /data/output.txt --dir /data --cache \ --cache_version 57 --genomes --species oryza_sativa --force_overwrite --check_existing --offline
Basic VEP options
You can view a list of all available VEP options here.
Annotation source options (select one):
--cache
Use local data (uses database connections for certain functions)
--offline
Use local data only (forbids external database connections)
--database
Use remote database (default isensembldb.ensembl.org
Input / output options:
--input_file
or-i
Will try to read from STDIN if absent
--output_file
or-o
Defaults tovariant_effect_output.txt
--force_overwrite
Overwrite existing output file
--tab
,--vcf
,--json
Different output formats, customise with--fields
Known variants:
--check_existing
Enables checking for variants
--filter_common
Excludes variants that have a co-located existing variant with global allele frequency > 0.01
Pathogenicity predictions:
--sift b
Predicts whether an amino acid substitution affects protein function based on sequence homology and the physical properties of amino acids
VEP plugins
Plugins extend VEP functionality by allowing you to:
- run algorithms
- fetch Ensembl data
- modify parameters
Simply add the --plugin
option to your script. You can find a list of all available VEP plugins in this documentation page and on GitHub.
VEP filtering
VEP comes with its own filter tool filter_vep
and works with default and VCF output formats. It uses simple query notations, e.g. [field] [operator] [value]
. E.g.
filter_vep -i output.txt --filter “consequence is missense_variant”
Queries can be combined with and
/ or
and nested with parentheses. You can resolve consequences types in ontology (--ontology
or -y
):
[...] -y -f “consequence is coding_sequence_variant or (EXON is 1 and BIOTYPE is protein_coding)”
Let’s filter the variants from the previous output. Filter by using VEP’s filter_vep
option and open the file on your local machine:
docker run -t -i -v $HOME/Desktop/VEP/Plants/vep_data:/data csicunam/bioinformatics_iamz:latest \ filter_vep -i /data/output.txt –o /data/output_filtered.txt \ --filter "consequence is missense_variant"
View output_filtered.txt
here.