Ensembl TrainingEnsembl Home

Command-line VEP via Docker, Demo

Running command-line VEP via Docker

If you don’t have root privileges, you can run VEP in a virtualised container via Docker. You can download and install Docker on Mac, Windows and Linux.


Preparing and running Docker

You will need to install or update Rosetta 2 to run Docker:

softwareupdate --install-rosetta

Create a directory for the course:

mkdir /Desktop/VEP/Plants
cd /Desktop/VEP/Plants

Start Docker:

open –a Docker

Download the Ensembl Plants VEP Docker image:

docker pull csicunam/bioinformatics_iamz

Create a Docker working directory to mount to the Docker image (you can save your input files in here and all outputs will be written in this directory):

mkdir vep_data

Some operating systems may require root privileges:

chmod 777 vep_data

Run the Docker image and mount the directory you just created:

docker run -t -i -v $HOME/Desktop/VEP/Plants/vep_data:/data csicunam/bioinformatics_iamz:latest

Exit Docker

exit


Docker options

The following options are available for docker run:

--interactive or -i   Keep STDIN open even if not attached
--tty or -t   Allocate a pseudo-TTY
--volume or -v   Bind mount a volume (this is your local working directory)
--env or -e   Set environment variables


Running VEP via Docker

Download the indexed cache file to your working directory and unpack in your local directory. You can find all available VEP index files on the Ensembl Genomes FTP site:

curl -O http://ftp.ensemblgenomes.org/pub/plants/current/variation/indexed_vep_cache/oryza_sativa_vep_57_IRGSP-1.0.tar.gz
tar xzf oryza_sativa_vep_57_IRGSP-1.0.tar.gz

Run VEP within your Docker image. The directory /data within your Docker image is equivalent to your local working directory:

docker run -t -i -v $HOME/Desktop/VEP/Plants/vep_data:/data csicunam/bioinformatics_iamz:latest \
    vep -i variant_data/rice_variants.vcf -o /data/output.txt --dir /data --cache \
    --cache_version 57 --genomes --species oryza_sativa --force_overwrite --check_existing -offline

View output.txt here.

If you are already within a Docker session, only run the vep code (see an example below). The directory /data within your Docker image is equivalent to your local working directory:

vep -i variant_data/rice_variants.vcf -o /data/output.txt --dir /data --cache \
    --cache_version 57 --genomes --species oryza_sativa --force_overwrite --check_existing --offline


Basic VEP options

You can view a list of all available VEP options here.

Annotation source options (select one):

--cache   Use local data (uses database connections for certain functions)
--offline   Use local data only (forbids external database connections)
--database   Use remote database (default is ensembldb.ensembl.org

Input / output options:

--input_file or -i   Will try to read from STDIN if absent
--output_file or -o   Defaults to variant_effect_output.txt
--force_overwrite   Overwrite existing output file
--tab, --vcf, --json   Different output formats, customise with --fields

Known variants:

--check_existing   Enables checking for variants
--filter_common   Excludes variants that have a co-located existing variant with global allele frequency > 0.01

Pathogenicity predictions:

--sift b   Predicts whether an amino acid substitution affects protein function based on sequence homology and the physical properties of amino acids


VEP plugins

Plugins extend VEP functionality by allowing you to:

  • run algorithms
  • fetch Ensembl data
  • modify parameters

Simply add the --plugin option to your script. You can find a list of all available VEP plugins in this documentation page and on GitHub.


VEP filtering

VEP comes with its own filter tool filter_vep and works with default and VCF output formats. It uses simple query notations, e.g. [field] [operator] [value]. E.g.

filter_vep -i output.txt --filter “consequence is missense_variant”

Queries can be combined with and / or and nested with parentheses. You can resolve consequences types in ontology (--ontology or -y):

[...] -y -f “consequence is coding_sequence_variant or (EXON is 1 and BIOTYPE is protein_coding)”

Let’s filter the variants from the previous output. Filter by using VEP’s filter_vep option and open the file on your local machine:

docker run -t -i -v $HOME/Desktop/VEP/Plants/vep_data:/data csicunam/bioinformatics_iamz:latest \
    filter_vep -i /data/output.txt –o /data/output_filtered.txt \
    --filter "consequence is missense_variant"

View output_filtered.txt here.