Variant Effect Predictor
About |
Data formats |
Web version |
Perl script |
Frequently asked questions
About
Ensembl provides the facility to predict the functional consequences of variants
using the Variant Effect Predictor (VEP). There are three primary ways to use
the functionality of the VEP:
The web version is suitable for users with small volumes of data or those who
prefer not to use command-line utilities. The script version is the most
flexible of the VEP, and allows users to process large volumes of data using
their own compute resources. The API is suitable for perl programmers looking to
incorporate features of the VEP into their own code.
Both the web and script versions of the VEP use the same data formats as input
and output. For input, the default is a simple tab-delimited format, as is the
output format. The popular
VCF (version 4.0) and pileup formats are also supported as input.
Web version
The web version of the VEP can be accessed via the Tools link at the top of each Ensembl web page, or via
the "Manage your data" link on any species-specific page.
Upload form
When you reach the VEP web interface, you will be presented with a form to
enter your data. Data can be uploaded in one of three
ways:
- File upload - click the "Choose file" button and locate the file on your system
- Paste file - simply copy and paste the contents of your file into the large text box
- File URL - point the VEP to a file hosted on a publically
accessible address. This can be either a 'http://' or 'ftp://' address.
It is then possible to configure other options. The VEP will run fine with the
default options; click the blue "Next" button at the bottom of the panel to
continue when you are happy with the options.
- Species - ensure you have selected the correct species for your data!
- Get regulatory region consequences - the VEP can check for
overlaps with known regulatory features, and also check if a variation falls
in a high information part of a transcription factor binding site.
- Type of consequences to display - the VEP can output consequences
as described by Ensembl, the
Sequence Ontology (SO) or the
NCBI.
- Check for existing co-located variants - in species with an
Ensembl Variation database, the VEP can check for existing variants
co-located with your input - the identifiers of these variants appear in the
output
- Return results in coding regions only - by selecting this
checkbox the VEP will filter out any results that do not fall in a protein
coding region of a transcript, for example those in the introns or upstream
of a transcript.
- Show HGNC identifier for genes where available - add the HGNC
identifier for the overlapping gene to the Extra column of the output (the
default output shows only the Ensembl Gene ID e.g. ENSG00000000345).
- Show Ensembl protein identifiers where available - adds the
Ensembl protein identifier for the transcript (e.g. ENSP00000411206) to the
Extra column of the output.
- Show HGVS identifiers for variants where available - adds HGVS nomenclature
based on Ensembl stable identifiers to the output. Coding and/or protein
sequence names can be added where appropriate.
The following options are currently available for human only.
- For non-synonymous SNPs, the VEP can provide additional predictions on
protein productst using the following external tools (all tools output the
prediction term, score or both).
- SIFT predictions - SIFT predicts whether an amino acid substitution
affects protein function based on sequence homology and the physical
properties of amino acids.
- PolyPhen predictions - PolyPhen is a tool which predicts possible
impact of an amino acid substitution on the structure and function
of a human protein using straightforward physical and comparative
considerations.
- Condel consensus predictions - Condel computes a
weighed average of the scores (WAS) of several computational tools aimed
at classifying missense mutations as likely deleterious or likely
neutral. The VEP currently presents a Condel WAS from SIFT and
PolyPhen.
Select output format
After clicking "Next", you are then asked to select either HTML or Text output.
Both formats contain the same information:
- Text format is useful if you wish to use the output as the input for any
other tools.
- HTML presents the same information as the Text format, but formatted for
the web with links to genes, transcripts and locations in the Ensembl
browser. Links are also provided to genes, transcripts and co-located
variations. SIFT, PolyPhen and Condel predictions are coloured according to
severity, with red representing high severity, green low severity and blue
unknown.
Viewing your results in the browser
Any data uploaded via the VEP web tool can be viewed on the Ensembl location
view; to view your data, either click a link in the Location column of the HTML
output, or switch on the track on location view (click "Configure this page",
then "User attached data" on Region in detail view to see uploaded tracks).