Ensembl Core Schema Documentation
Introduction
This document gives a high-level description of the tables that
make up the EnsEMBL core schema. Tables are grouped into logical
groups, and the purpose of each table is explained. It is intended to
allow people to familiarise themselves with the schema when
encountering it for the first time, or when they need to use some
tables that they've not used before. Note that while some of the more
important columns in some of the tables are discussed, this document
makes no attempt to enumerate all of the names, types and contents of
every single table. Some concepts which are referred to in the table
descriptions are given at the end of this document; these are linked
to from the table description where appropriate.
Different tables are populated throughout the gene build process:
Step |
Process |
0 |
Create empty schema, populate meta table |
1 |
Load DNA - populates dna, clone, contig, chromosome, assembly tables |
2 |
Analyze DNA (raw computes) - populates genomic feature/analysis tables |
3 |
Build genes - populates exon, transcript,etc. gene-related tables |
4a |
Analyze genes - populate protein_feature, xref tables, interpro |
4b |
ID mapping |
This document refers to version 63 of the EnsEMBL
core schema.
List of the tables:
Fundamental Tables
Features and Analyses
ID Mapping
External References
Miscellaneous
Fundamental Tables
A PDF document of the schema is available here.
The assembly table states, which parts of seq_regions are exactly equal. It enables to transform coordinates between seq_regions. Typically this contains how chromosomes are made of contigs, clones out of contigs, and chromosomes out of supercontigs. It allows you to artificially chunk chromosome sequence into smaller parts. The data in this table defines the "static golden path", i.e. the best effort draft full genome sequence as determined by the UCSC or NCBI (depending which assembly you are using). Each row represents a component, e.g. a contig, (comp_seq_region_id, FK from seq_region table) at least part of which is present in the golden path. The part of the component that is in the path is delimited by fields cmp_start and cmp_end (start < end), and the absolute position within the golden path chromosome (or other appropriate assembled structure) (asm_seq_region_id) is given by asm_start and asm_end.
Column | Type | Default value | Description | Index |
asm_seq_region_id | INT(10) | | Assembly sequence region id. Primary key, internal identifier. Foreign key references to the seq_region table. | key: asm_seq_region_idx unique key: all_idx |
cmp_seq_region_id | INT(10) | | Component sequence region id. Foreign key references to the seq_region table. | key: cmp_seq_region_idx unique key: all_idx |
asm_start | INT(10) | | Start absolute position within the golden path chromosome. | key: asm_seq_region_idx unique key: all_idx |
asm_end | INT(10) | | End absolute position within the golden path chromosome. | unique key: all_idx |
cmp_start | INT(10) | | Component start position within the golden path chromosome. | unique key: all_idx |
cmp_end | INT(10) | | Component start position within the golden path chromosome. | unique key: all_idx |
ori | TINYINT | | Orientation: 1 - sense; -1 - antisense. | unique key: all_idx |
See also:
assembly_exception |
Show columns |
Allows multiple sequence regions to point to the same sequence, analogous to a symbolic link in a filesystem pointing to the actual file. This mechanism has been implemented specifically to support haplotypes and PARs, but may be useful for other similar structures in the future.
Column | Type | Default value | Description | Index |
assembly_exception_id | INT(10) | | Assembly exception sequence region id. Primary key, internal identifier. | primary key |
seq_region_id | INT(10) | | Sequence region id. Foreign key references to the seq_region table. | key: sr_idx |
seq_region_start | INT(10) | | Sequence start position. | key: sr_idx |
seq_region_end | INT(10) | | Sequence end position. | |
exc_type | ENUM('HAP', 'PAR', 'PATCH_FIX', 'PATCH_NOVEL') | | Exception type, e.g. PAR, HAP - haplotype. | |
exc_seq_region_id | INT(10) | | Exception sequence region id. Foreign key references to the seq_region table. | key: ex_idx |
exc_seq_region_start | INT(10) | | Exception sequence start position. | key: ex_idx |
exc_seq_region_end | INT(10) | | Exception sequence end position. | |
ori | INT | | Orientation: 1 - sense; -1 - antisense. | |
See also:
Provides codes, names and desctriptions of attribute types.
Column | Type | Default value | Description | Index |
attrib_type_id | SMALLINT(5) | | Primary key, internal identifier. | primary key |
code | VARCHAR(15) | '' | Attribute code, e.g. 'GapExons'. | unique key: code_idx |
name | VARCHAR(255) | '' | Attribute name, e.g. 'gap exons'. | |
description | TEXT | | Attribute description, e.g. 'number of gap exons'. | |
See also:
coord_system |
Show columns |
Stores information about the available co-ordinate systems for the species identified through the species_id field. Note that for each species, there must be one co-ordinate system that has the attribute "top_level" and one that has the attribute "sequence_level".
Column | Type | Default value | Description | Index |
coord_system_id | INT(10) | | Primary key, internal identifier. | |
species_id | INT(10) | 1 | Indentifies the species for multi-species databases. | key: species_idx |
name | VARCHAR(40) | | Co-oridinate system name, e.g. 'chromosome', 'contig', 'scaffold' etc. | |
version | VARCHAR(255) | NULL | Assembly. | |
rank | INT | | Co-oridinate system rank. | |
attrib | SET('default_version', 'sequence_level') | _version' | Co-oridinate system attrib (e.g. "top_level", "sequence_level"). | |
See also:
Contains DNA sequence. This table has a 1:1 relationship with the contig table.
Column | Type | Default value | Description | Index |
seq_region_id | INT(10) | | Primary key, internal identifier. Foreign key references to the seq_region table. | primary key |
sequence | LONGTEXT | | DNA sequence. | |
See also:
Contains equivalent data to dna table, but 4 letters of DNA code are represented by a single binary character, based on 2 bit encoding.
Column | Type | Default value | Description | Index |
seq_region_id | INT(10) | | Primary key, internal identifier. Foreign key references to the seq_region table. | primary key |
sequence | MEDIUMBLOB | | Compressed DNA sequence. | |
n_line | TEXT | | Contains start-end pairs of coordinates in the string that are Ns. | |
Stores data about exons. Associated with transcripts via exon_transcript. Allows access to contigs seq_regions. Note seq_region_start is always less that seq_region_end, i.e. when the exon is on the other strand the seq_region_start is specifying the 3prime end of the exon.
Column | Type | Default value | Description | Index |
exon_id | INT(10) | | Primary key, internal identifier. | primary key |
seq_region_id | INT(10) | | Foreign key references to the seq_region table. | key: seq_region_idx |
seq_region_start | INT(10) | | Sequence start position. | key: seq_region_idx |
seq_region_end | INT(10) | | Sequence end position. | |
seq_region_strand | TINYINT(2) | | Sequence region strand: 1 - forward; -1 - reverse. | |
phase | TINYINT(2) | | The place where the intron lands inside the codon - 0 between codons, 1 between the 1st and second base, 2 between the second and 3rd base. Exons therefore have a start phase anda end phase, but introns have just one phase. | |
end_phase | TINYINT(2) | | Usually, end_phase = (phase + exon_length)%3 but end_phase could be -1 if the exon is half-coding and its 3 prime end is UTR. | |
is_current | BOOLEAN | 1 | | |
is_constitutive | BOOLEAN | 0 | | |
See also:
exon_stable_id |
Show columns |
Relates exon IDs in this release to release-independent stable identifiers.
Column | Type | Default value | Description | Index |
exon_id | INT(10) | | Primary key, internal identifier. | primary key |
stable_id | VARCHAR(128) | | Stable identifier. | key: stable_id_idx |
version | INT(10) | 1 | Version number. | key: stable_id_idx |
created_date | DATETIME | | Date created. | |
modified_date | DATETIME | | Date modified. | |
See also:
exon_transcript |
Show columns |
Relationship table linking exons with transcripts. The rank column indicates the 5' to 3' position of the exon within the transcript, i.e. a rank of 1 means the exon is the 5' most within this transcript.
Column | Type | Default value | Description | Index |
exon_id | INT(10) | | Composite key. Foreign key references to the exon table. | primary key key: exon |
transcript_id | INT(10) | | Composite key. Foreign key references to the transcript table. | primary key key: transcript |
rank | INT(10) | | Composite key. | primary key |
See also:
Allows transcripts to be related to genes.
Column | Type | Default value | Description | Index |
gene_id | INT(10) | | Primary key, internal identifier. | primary key |
biotype | VARCHAR(40) | | Biotype, e.g. protein_coding. | |
analysis_id | SMALLINT | | Foreign key references to the analysis table. | key: analysis_idx |
seq_region_id | INT(10) | | Foreign key references to the seq_region table. | key: seq_region_idx |
seq_region_start | INT(10) | | Sequence start position. | key: seq_region_idx |
seq_region_end | INT(10) | | Sequence end position. | |
seq_region_strand | TINYINT(2) | | Sequence region strand: 1 - forward; -1 - reverse. | |
display_xref_id | INT(10) | | External reference for EnsEMBL web site. Foreign key references to the xref table. | key: xref_id_index |
source | VARCHAR(20) | | e.g ensembl, havana etc. | |
status | ENUM('KNOWN', 'NOVEL', 'PUTATIVE', 'PREDICTED', 'KNOWN_BY_PROJECTION', 'UNKNOWN') | | Status, e.g.'KNOWN', 'NOVEL', 'PUTATIVE', 'PREDICTED', 'KNOWN_BY_PROJECTION', 'UNKNOWN'. | |
description | TEXT | | Gene description | |
is_current | BOOLEAN | 1 | | |
canonical_transcript_id | INT(10) | | Foreign key references to the transcript table. | |
canonical_annotation | VARCHAR(255) | NULL | Canonical annotation. | |
See also:
Enables storage of attributes that relate to genes.
Column | Type | Default value | Description | Index |
gene_id | INT(10) | '0' | Foreign key references to the gene table. | key: gene_idx |
attrib_type_id | SMALLINT(5) | '0' | Foreign key references to the attribute_type table. | key: type_val_idx |
value | TEXT | | Attribute value. | key: type_val_idx key: val_only_idx |
See also:
gene_stable_id |
Show columns |
Relates gene IDs in this release to release-independent stable identifiers.
Column | Type | Default value | Description | Index |
gene_id | INT | | Primary key, internal identifier. Foreign key references to the gene table. | primary key |
stable_id | VARCHAR(128) | | Stable identifier. | key: stable_id_idx |
version | INT(10) | 1 | Version number. | key: stable_id_idx |
created_date | DATETIME | | Date created. | |
modified_date | DATETIME | | Date modified. | |
See also:
Describes bands that can be stained on the chromosome.
Column | Type | Default value | Description | Index |
karyotype_id | INT(10) | | Primary key, internal identifier. | primary key |
seq_region_id | INT(10) | | Foreign key references to the seq_region table. | key: region_band_idx |
seq_region_start | INT(10) | | Sequence start position. | |
seq_region_end | INT(10) | | Sequence end position. | |
band | VARCHAR(40) | | Band. | key: region_band_idx |
stain | VARCHAR(40) | | Stain. | |
Stores data about the data in the current schema. Taxonomy information, version information and the default value for the type column in the assembly table are stored here. Unlike other tables, data in the meta table is stored as key-value pairs. Also stores (via assembly.mapping keys) the relationships between co-ordinate systems in the assembly table. The species_id field of the meta table is used in multi-species databases and makes it possible to have species-specific meta key-value pairs. The species-specific meta key-value pairs needs to be repeated for each species_id. Entries in the meta table that are not specific to any one species, such as the schema.version key and any other schema-related information must have their species_id field set to NULL. The default species_id, and the only species_id value allowed in single-species databases, is 1.
See also:
Describes which co-ordinate systems the different feature tables use.
See also:
Stores information about sequence regions. The primary key is used as a pointer into the dna table so that actual sequence can be obtained, and the coord_system_id allows sequence regions of multiple types to be stored. Clones, contigs and chromosomes are all now stored in the seq_region table. Contigs are stored with the co-ordinate system 'contig'. The relationship between contigs and clones is stored in the assembly table. The relationships between contigs and chromosomes, and between contigs and supercontigs, are stored in the assembly table.
Column | Type | Default value | Description | Index |
seq_region_id | INT(10) | | Primary key, internal identifier. | primary key |
name | VARCHAR(40) | | Sequence region name. | unique key: name_cs_idx |
coord_system_id | INT(10) | | Foreign key references to the coord_system table. | unique key: name_cs_idx key: cs_idx |
length | INT(10) | | Sequence length. | |
See also:
seq_region_attrib |
Show columns |
Allows "attributes" to be defined for certain seq_regions. Provides a way of storing extra information about particular seq_regions without adding extra columns to the seq_region table. e.g.
Column | Type | Default value | Description | Index |
seq_region_id | INT(10) | '0' | Foreign key references to the seq_region table. | key: seq_region_idx |
attrib_type_id | SMALLINT(5) | '0' | Foreign key references to the attribute_type table. | key: type_val_idx |
value | TEXT | | Attribute value. | key: type_val_idx key: val_only_idx |
See also:
Stores information about transcripts. Has seq_region_start, seq_region_end and seq_region_strand for faster retrieval and to allow storage independently of genes and exons. Note that a transcript is usually associated with a translation, but may not be, e.g. in the case of pseudogenes and RNA genes (those that code for RNA molecules).
Column | Type | Default value | Description | Index |
transcript_id | INT(10) | | Primary key, internal identifier. | primary key |
gene_id | INT(10) | | Foreign key references to the gene table. | key: gene_index |
analysis_id | SMALLINT | | Foreign key references to the analysis table. | key: analysis_idx |
seq_region_id | INT(10) | | Foreign key references to the seq_region table. | key: seq_region_idx |
seq_region_start | INT(10) | | Sequence start position. | key: seq_region_idx |
seq_region_end | INT(10) | | Sequence end position. | |
seq_region_strand | TINYINT(2) | | Sequence region strand: 1 - forward; -1 - reverse. | |
display_xref_id | INT(10) | | External reference for EnsEMBL web site. Foreign key references to the xref table. | key: xref_id_index |
biotype | VARCHAR(40) | | Biotype, e.g. protein_coding. | |
status | ENUM('KNOWN', 'NOVEL', 'PUTATIVE', 'PREDICTED', 'KNOWN_BY_PROJECTION', 'UNKNOWN') | | Status, e.g.'KNOWN', 'NOVEL', 'PUTATIVE', 'PREDICTED', 'KNOWN_BY_PROJECTION', 'UNKNOWN'. | |
description | TEXT | | Transcript description. | |
is_current | BOOLEAN | 1 | | |
canonical_translation_id | INT(10) | | Foreign key references to the canonical_translation table. | |
transcript_attrib |
Show columns |
Enables storage of attributes that relate to transcripts.
Column | Type | Default value | Description | Index |
transcript_id | INT(10) | '0' | Foreign key references to the transcript table. | key: transcript_idx |
attrib_type_id | SMALLINT(5) | '0' | Foreign key references to the attribute_type table. | key: type_val_idx |
value | TEXT | | Attribute value. | key: type_val_idx key: val_only_idx |
See also:
transcript_stable_id |
Show columns |
Relates transcript IDs in this release to release-independent stable identifiers.
Column | Type | Default value | Description | Index |
transcript_id | INT(10) | | Primary key, internal identifier. Foreign key references to the transcript table. | primary key |
stable_id | VARCHAR(128) | | Stable identifier. | key: stable_id_idx |
version | INT(10) | 1 | Version number. | key: stable_id_idx |
created_date | DATETIME | | Date created. | |
modified_date | DATETIME | | Date modified. | |
See also:
Describes which parts of which exons are used in translation. The seq_start and seq_end columns are 1-based offsets into the relative coordinate system of start_exon_id and end_exon_id. i.e, if the translation starts at the first base of the exon, seq_start would be 1. Transcripts are related to translations by the transcript_id key in this table.
Column | Type | Default value | Description | Index |
translation_id | INT(10) | | Primary key, internal identifier. | primary key |
transcript_id | INT(10) | | Foreign key references to the transcript table. | key: transcript_idx |
seq_start | INT(10) | | 1-based offset into the relative coordinate system of start_exon_id. | |
start_exon_id | INT(10) | | Foreign key references to the exon table. | |
seq_end | INT(10) | | 1-based offset into the relative coordinate system of end_exon_id. | |
end_exon_id | INT(10) | | Foreign key references to the exon table. | |
translation_attrib |
Show columns |
Enables storage of attributes that relate to translations.
Column | Type | Default value | Description | Index |
translation_id | INT(10) | '0' | Foreign key references to the transcript table. | key: translation_idx |
attrib_type_id | SMALLINT(5) | '0' | Foreign key references to the attribute_type table. | key: type_val_idx |
value | TEXT | | Attribute value. | key: type_val_idx key: val_only_idx |
See also:
translation_stable_id |
Show columns |
Relates translation IDs in this release to release-independent stable identifiers.
Column | Type | Default value | Description | Index |
translation_id | INT(10) | | Primary key, internal identifier. Foreign key references to the translation table. | primary key |
stable_id | VARCHAR(128) | | Stable identifier. | key: stable_id_idx |
version | INT(10) | 1 | Version number. | key: stable_id_idx |
created_date | DATETIME | | Date created. | |
modified_date | DATETIME | | Date modified. | |
See also:
unconventional_transcript_association |
Show columns |
Describes transcripts that do not link to a single gene in the normal way.
Column | Type | Default value | Description | Index |
transcript_id | INT(10) | | Foreign key references to the transcript table. | key: transcript_idx |
gene_id | INT(10) | | Foreign key references to the gene table. | key: gene_idx |
interaction_type | ENUM("antisense","sense_intronic","sense_overlaping_exonic","chimeric_sense_exonic") | | Type of interaction: 'antisense','sense_intronic','sense_overlaping_exonic','chimeric_sense_exonic'. | |
Features and Analyses
A PDF document of the schema is available here.
Stores information about genes on haplotypes that may be orthologous.
Column | Type | Default value | Description | Index |
alt_allele_id | INT(10) | | Primary key, internal identifier. | unique key: allele_idx |
gene_id | INT(10) | | Foreign key references to the gene table. | unique key: gene_idx unique key: allele_idx |
Usually describes a program and some database that together are used to create a feature on a piece of sequence. Each feature is marked with an analysis_id. The most important column is logic_name, which is used by the webteam to render a feature correctly on contigview (or even retrieve the right feature). Logic_name is also used in the pipeline to identify the analysis which has to run in a given status of the pipeline. The module column tells the pipeline which Perl module does the whole analysis, typically a RunnableDB module.
Column | Type | Default value | Description | Index |
analysis_id | SMALLINT | | Primary key, internal identifier. | primary key |
created | datetime | '0000-00-00 | Date to distinguish newer and older versions off the same analysis. | |
logic_name | VARCHAR(128) | | String to identify the analysis. Used mainly inside pipeline. | unique key: logic_name_idx |
db | VARCHAR(120) | | Database name. | |
db_version | VARCHAR(40) | | Database version. | |
db_file | VARCHAR(120) | | File system location of the database. | |
program | VARCHAR(80) | | The binary used to create a feature. | |
program_version | VARCHAR(40) | | The binary version. | |
program_file | VARCHAR(80) | | File system location of the binary. | |
parameters | TEXT | | A parameter string which is processed by the perl module. | |
module | VARCHAR(80) | | Perl module names (RunnableDBS usually) executing this analysis. | |
module_version | VARCHAR(40) | | Perl module version. | |
gff_source | VARCHAR(40) | | How to make a gff dump from features with this analysis. | |
gff_feature | VARCHAR(40) | | How to make a gff dump from features with this analysis. | |
See also:
analysis_description |
Show columns |
Allows the storage of a textual description of the analysis, as well as a "display label", primarily for the EnsEMBL web site.
Column | Type | Default value | Description | Index |
analysis_id | SMALLINT | | Primary key, internal identifier. Foreign key references to the analysis table. | unique key: analysis_idx |
description | TEXT | | Textual description of the analysis. | |
display_label | VARCHAR(255) | | Display label for the EnsEMBL web site. | |
displayable | BOOLEAN | 1 | Flag indicating if the analysis description is to be displayed on the EnsEMBL web site. | |
web_data | TEXT | | Other data used by the EnsEMBL web site. | |
See also:
density_feature |
Show columns |
Describes features representing a density, or precentage coverage etc. in a given region.
Column | Type | Default value | Description | Index |
density_feature_id | INT(10) | | Primary key, internal identifier. | primary key |
density_type_id | INT(10) | | Density type. Foreign key references to the density_type table. | key: seq_region_idx |
seq_region_id | INT(10) | | Sequence region. Foreign key references to the seq_region table. | key: seq_region_idx key: seq_region_id_idx |
seq_region_start | INT(10) | | Sequence start position. | key: seq_region_idx |
seq_region_end | INT(10) | | Sequence end position. | |
density_value | FLOAT | | Density value. | |
See also:
density_type |
Show columns |
Describes type representing a density, or percentage coverage etc. in a given region.
Column | Type | Default value | Description | Index |
density_type_id | INT(10) | | Primary key, internal identifier. | primary key |
analysis_id | SMALLINT | | Foreign key references to the analysis table. | unique key: analysis_idx |
block_size | INT | | Block size. | unique key: analysis_idx |
region_features | INT | | The number of features per sequence region inside this density type. | unique key: analysis_idx |
value_type | ENUM('sum','ratio') | | Value type, e.g. 'sum', 'ratio'. | |
See also:
Represents a ditag object in the EnsEMBL database. Corresponds to original tag containing the full sequence. This can be a single piece of sequence like CAGE tags or a ditag with concatenated sequence from 5' and 3' end like GIS or GSC tags. This data is available as a DAS track in ContigView on the EnsEMBL web site.
Column | Type | Default value | Description | Index |
ditag_id | INT(10) | | Primary key, internal identifier. | primary key |
name | VARCHAR(30) | | Ditag name. | |
type | VARCHAR(30) | | Ditag type. | |
tag_count | smallint(6) | 1 | Tag count. | |
sequence | TINYTEXT | | Sequence. | |
See also:
ditag_feature |
Show columns |
Describes where ditags hit on the genome. Represents a mapped ditag object in the EnsEMBL database. These are the original tags separated into start ("L") and end ("R") parts if applicable, successfully aligned to the genome. Two DitagFeatures usually relate to one parent Ditag. Alternatively there are CAGE tags e.g. which only have a 5\'tag ("F").
Column | Type | Default value | Description | Index |
ditag_feature_id | INT(10) | | Primary key, internal identifier. | primary key |
ditag_id | INT(10) | '0' | Foreign key references to the ditag table. | key: ditag_idx |
ditag_pair_id | INT(10) | '0' | Ditag pair id. | key: ditag_pair_idx |
seq_region_id | INT(10) | '0' | Foreign key references to the seq_region table. | key: seq_region_idx |
seq_region_start | INT(10) | '0' | Sequence start position. | key: seq_region_idx |
seq_region_end | INT(10) | '0' | Sequence end position. | key: seq_region_idx |
seq_region_strand | TINYINT(1) | '0' | Sequence region strand: 1 - forward; -1 - reverse. | |
analysis_id | SMALLINT | '0' | Foreign key references to the analysis table. | |
hit_start | INT(10) | '0' | Alignment hit start position. | |
hit_end | INT(10) | '0' | Alignment hit end position. | |
hit_strand | TINYINT(1) | '0' | Alignment hit strand: 1 - forward; -1 - reverse. | |
cigar_line | TINYTEXT | | Used to encode gapped alignments. | |
ditag_side | ENUM('F', 'L', 'R') | | Ditag side: L - start, R - end, F - 5\'tag only | |
See also:
dna_align_feature |
Show columns |
Stores DNA sequence alignments generated from Blast (or Blast-like) comparisons.
Column | Type | Default value | Description | Index |
dna_align_feature_id | INT(10) | | Primary key, internal identifier. | primary key |
seq_region_id | INT(10) | | Foreign key references to the seq_region table. | key: seq_region_idx key: seq_region_idx_2 |
seq_region_start | INT(10) | | Sequence start position. | key: seq_region_idx key: seq_region_idx_2 |
seq_region_end | INT(10) | | Sequence end position. | |
seq_region_strand | TINYINT(1) | | Sequence region strand: 1 - forward; -1 - reverse. | |
hit_start | INT | | Alignment hit start position. | |
hit_end | INT | | Alignment hit end position. | |
hit_strand | TINYINT(1) | | Alignment hit strand: 1 - forward; -1 - reverse. | |
hit_name | VARCHAR(40) | | Alignment hit name. | key: hit_idx |
analysis_id | SMALLINT | | Foreign key references to the analysis table. | key: seq_region_idx key: analysis_idx |
score | DOUBLE | | Alignment score. | key: seq_region_idx |
evalue | DOUBLE | | Alignment e-value. | |
perc_ident | FLOAT | | Alignment percentage identity. | |
cigar_line | TEXT | | Used to encode gapped alignments. | |
external_db_id | SMALLINT | | Foreign key references to the external_db table. | key: external_db_idx |
hcoverage | DOUBLE | | Hit coverage. | |
external_data | TEXT | | External data. | |
pair_dna_align_feature_id | INT(10) | | The id of the dna feature aligned. | key: pair_idx |
See also:
Stores the names of different genetic or radiation hybrid maps, for which there is marker map information.
Column | Type | Default value | Description | Index |
map_id | INT(10) | | Primary key, internal identifier. | primary key |
map_name | VARCHAR(30) | | Map name. | |
See also:
Stores data about the marker itself. A marker in Ensembl consists of a pair of primer sequences, an expected product size and a set of associated identifiers known as synonyms.
Column | Type | Default value | Description | Index |
marker_id | INT(10) | | Primary key, internal identifier. | primary key key: marker_idx |
display_marker_synonym_id | INT(10) | | Marker synonym. | key: display_idx |
left_primer | VARCHAR(100) | | Left primer sequence. | |
right_primer | VARCHAR(100) | | Right primer sequence. | |
min_primer_dist | INT(10) | | Minimum primer distance. | |
max_primer_dist | INT(10) | | Maximum primer distance. | |
priority | INT | | Priority. | key: marker_idx |
type | ENUM('est', 'microsatellite') | | Type, e.g. 'est', 'microsatellite'. | |
See also:
marker_feature |
Show columns |
Used to describe positions of markers on the assembly. Markers are placed on the genome electronically using an analysis program.
Column | Type | Default value | Description | Index |
marker_feature_id | INT(10) | | Primary key, internal identifier. | primary key |
marker_id | INT(10) | | Foreign key references to the marker table. | |
seq_region_id | INT(10) | | Foreign key references to the seq_region table. | key: seq_region_idx |
seq_region_start | INT(10) | | Sequence start position. | key: seq_region_idx |
seq_region_end | INT(10) | | Sequence end position. | |
analysis_id | SMALLINT | | Foreign key references to the analysis table. | key: analysis_idx |
map_weight | INT(10) | | The number of times that this marker has been mapped to the genome, e.g. a marker with map weight 3 has been mapped to 3 locations in the genome. | |
See also:
marker_map_location |
Show columns |
Stores map locations (genetic, radiation hybrid and in situ hybridization) for markers obtained from experimental evidence.
Column | Type | Default value | Description | Index |
marker_id | INT(10) | | Primary key, internal identifier. | primary key |
map_id | INT(10) | | Foreign key references to the map table. | primary key key: map_idx |
chromosome_name | VARCHAR(15) | | Chromosome name | key: map_idx |
marker_synonym_id | INT(10) | | Foreign key references to the marker_synonym table. | |
position | VARCHAR(15) | | Position of the map location. | key: map_idx |
lod_score | DOUBLE | | LOD score for map location. | |
See also:
marker_synonym |
Show columns |
Stores alternative names for markers, as well as their sources.
Column | Type | Default value | Description | Index |
marker_synonym_id | INT(10) | | Primary key, internal identifier. | primary key key: marker_synonym_idx |
marker_id | INT(10) | | Foreign key references to the marker table. | key: marker_idx |
source | VARCHAR(20) | | Marker source. | |
name | VARCHAR(50) | | Alternative name for marker. | key: marker_synonym_idx |
See also:
Stores arbitrary attributes about the features in the misc_feature table.
Column | Type | Default value | Description | Index |
misc_feature_id | INT(10) | '0' | Foreign key references to the misc_feature table. | key: misc_feature_idx |
attrib_type_id | SMALLINT(5) | '0' | Foreign key references to the attribute_type table. | key: type_val_idx |
value | TEXT | | Attribute value. | key: type_val_idx key: val_only_idx |
See also:
misc_feature |
Show columns |
Allows for storage of arbitrary features.
Column | Type | Default value | Description | Index |
misc_feature_id | INT(10) | | Primary key, internal identifier. | primary key |
seq_region_id | INT(10) | '0' | Foreign key references to the seq_region table. | key: seq_region_idx |
seq_region_start | INT(10) | '0' | Sequence start position. | key: seq_region_idx |
seq_region_end | INT(10) | '0' | Sequence end position. | |
seq_region_strand | TINYINT(4) | '0' | Sequence region strand: 1 - forward; -1 - reverse. | |
See also:
misc_feature_misc_set |
Show columns |
This table classifies features into distinct sets.
Column | Type | Default value | Description | Index |
misc_feature_id | INT(10) | '0' | Primary key, internal identifier. Foreign key references to the misc_feature table. | primary key key: reverse_idx |
misc_set_id | SMALLINT(5) | '0' | Primary key, internal identifier. Foreign key references to the misc_feature table. | primary key key: reverse_idx |
See also:
Defines "sets" that the features held in the misc_feature table can be grouped into.
Column | Type | Default value | Description | Index |
misc_set_id | SMALLINT(5) | | Primary key, internal identifier. | primary key |
code | VARCHAR(25) | '' | Set code, e.g. bac_map | unique key: code_idx |
name | VARCHAR(255) | '' | Code name, e.g. BAC map | |
description | TEXT | | Code description, e.g. Full list of mapped BAC clones | |
max_length | INT | | Longest feature, e.g. 500000 | |
See also:
prediction_exon |
Show columns |
Stores exons that are predicted by ab initio gene finder programs. Unlike EnsEMBL exons they are not supported by any evidence.
Column | Type | Default value | Description | Index |
prediction_exon_id | INT(10) | | Primary key, internal identifier. | primary key |
prediction_transcript_id | INT(10) | | Foreign key references to the prediction_transcript table. | key: transcript_idx |
exon_rank | SMALLINT | | Exon rank | |
seq_region_id | INT(10) | | Foreign key references to the seq_region table. | key: seq_region_idx |
seq_region_start | INT(10) | | Sequence start position. | key: seq_region_idx |
seq_region_end | INT(10) | | Sequence end position. | |
seq_region_strand | TINYINT | | Sequence region strand: 1 - forward; -1 - reverse. | |
start_phase | TINYINT | | Exon start phase. | |
score | DOUBLE | | Prediction score. | |
p_value | DOUBLE | | Prediction p-value. | |
prediction_transcript |
Show columns |
Stores transcripts that are predicted by ab initio gene finder programs (e.g. genscan, SNAP). Unlike EnsEMBL transcripts they are not supported by any evidence.
Column | Type | Default value | Description | Index |
prediction_transcript_id | INT(10) | | Primary key, internal identifier. | primary key |
seq_region_id | INT(10) | | Foreign key references to the seq_region table. | key: seq_region_idx |
seq_region_start | INT(10) | | Sequence start position. | key: seq_region_idx |
seq_region_end | INT(10) | | Sequence end position. | |
seq_region_strand | TINYINT | | Sequence region strand: 1 - forward; -1 - reverse. | |
analysis_id | SMALLINT | | Foreign key references to the analysis table. | key: analysis_idx |
display_label | VARCHAR(255) | | Display label for the EnsEMBL web site. | |
protein_align_feature |
Show columns |
Stores translation alignments generated from Blast (or Blast-like) comparisons.
Column | Type | Default value | Description | Index |
protein_align_feature_id | INT(10) | | Primary key, internal identifier. | primary key |
seq_region_id | INT(10) | | Foreign key references to the seq_region table. | key: seq_region_idx key: seq_region_idx_2 |
seq_region_start | INT(10) | | Sequence start position. | key: seq_region_idx key: seq_region_idx_2 |
seq_region_end | INT(10) | | Sequence end position. | |
seq_region_strand | TINYINT(1) | '1' | Sequence region strand: 1 - forward; -1 - reverse. | |
hit_start | INT(10) | | Alignment hit start position. | |
hit_end | INT(10) | | Alignment hit end position. | |
hit_name | VARCHAR(40) | | Alignment hit name. | key: hit_idx |
analysis_id | SMALLINT | | Foreign key references to the analysis table. | key: seq_region_idx key: analysis_idx |
score | DOUBLE | | Alignment score. | key: seq_region_idx |
evalue | DOUBLE | | Alignment e-value. | |
perc_ident | FLOAT | | Alignment percentage identity. | |
cigar_line | TEXT | | Used to encode gapped alignments. | |
external_db_id | SMALLINT | | Foreign key references to the external_db table. | key: external_db_idx |
hcoverage | DOUBLE | | | |
See also:
protein_feature |
Show columns |
Describes features on the translations (as opposed to the DNA sequence itself), i.e. parts of the peptide. In peptide co-ordinates rather than contig co-ordinates.
Column | Type | Default value | Description | Index |
protein_feature_id | INT(10) | | Primary key, internal identifier. | primary key |
translation_id | INT(10) | | Foreign key references to the translation table. | key: translation_idx |
seq_start | INT(10) | | Sequence start position. | |
seq_end | INT(10) | | Sequence end position. | |
hit_start | INT(10) | | Alignment hit start position. | |
hit_end | INT(10) | | Alignment hit end position. | |
hit_name | VARCHAR(40) | | Alignment hit name. | key: hitname_idx |
analysis_id | SMALLINT | | Foreign key references to the analysis table. | key: analysis_idx |
score | DOUBLE | | Alignment score. | |
evalue | DOUBLE | | Alignment E-value. | |
perc_ident | FLOAT | | Alignment percentage identity. | |
external_data | TEXT | | External data for protein feature. | |
See also:
Describes the markers (of which there may be up to three) which define Quantitative Trait Loci. Note that QTL is a statistical technique used to find links between certain expressed traits and regions in a genetic map. A QTL is defined by three markers, two flanking and one peak (optional) marker. Its a region (or more often a group of regions) which is likely to affect the phenotype (trait) described in this Qtl.
Column | Type | Default value | Description | Index |
qtl_id | INT(10) | | Primary key, internal identifier. | primary key |
trait | VARCHAR(255) | | Expressed trait. | key: trait_idx |
lod_score | FLOAT | | LOD score for QTL. | |
flank_marker_id_1 | INT(10) | | Flanking marker 1. | |
flank_marker_id_2 | INT(10) | | Flanking marker 2. | |
peak_marker_id | INT(10) | | Peak marker. | |
See also:
Describes Quantitative Trail Loci (QTL) positions as obtained from inbreeding experiments. Note the values in this table are in chromosomal co-ordinates. Also, this table is not populated for all schemas.
Column | Type | Default value | Description | Index |
seq_region_id | INT(10) | | Foreign key references to the seq_region table. | key: loc_idx |
seq_region_start | INT(10) | | Sequence start position. | key: loc_idx |
seq_region_end | INT(10) | | Sequence end position. | |
qtl_id | INT(10) | | | key: qtl_idx |
analysis_id | SMALLINT | | Foreign key references to the analysis table. | key: analysis_idx |
See also:
Describes alternative names for Quantitative Trait Loci (QTLs).
Column | Type | Default value | Description | Index |
qtl_synonym_id | INT(10) | | Primary key, internal identifier. | primary key |
qtl_id | INT(10) | | Foreign key references to the qtl table. | key: qtl_idx |
source_database | ENUM("rat genome database", "ratmap") | | Synonym source database. | |
source_primary_id | VARCHAR(255) | | Source database primary ID. | |
repeat_consensus |
Show columns |
Stores consensus sequences obtained from analysing repeat features.
Column | Type | Default value | Description | Index |
repeat_consensus_id | INT(10) | | Primary key, internal identifier. | primary key |
repeat_name | VARCHAR(255) | | Repeat name. | key: name |
repeat_class | VARCHAR(100) | | E.g. 'Satellite', 'tRNA', 'LTR'. | key: class |
repeat_type | VARCHAR(40) | | E.g. 'Satellite repeats', 'Tandem repeats', 'Low complexity regions'. | key: type |
repeat_consensus | TEXT | | Repeat consensus sequence. | key: consensus |
repeat_feature |
Show columns |
Describes sequence repeat regions.
Column | Type | Default value | Description | Index |
repeat_feature_id | INT(10) | | Primary key, internal identifier. | primary key |
seq_region_id | INT(10) | | Foreign key references to the seq_region table. | key: seq_region_idx |
seq_region_start | INT(10) | | Sequence start position. | key: seq_region_idx |
seq_region_end | INT(10) | | Sequence end position. | |
seq_region_strand | TINYINT(1) | '1' | Sequence region strand: 1 - forward; -1 - reverse. | |
repeat_start | INT(10) | | Repeat sequence start. | |
repeat_end | INT(10) | | Repeat sequence end | |
repeat_consensus_id | INT(10) | | Foreign key references to the repeat_consensus table. | key: repeat_idx |
analysis_id | SMALLINT | | Foreign key references to the analysis table. | key: analysis_idx |
score | DOUBLE | | Analysis score. | |
simple_feature |
Show columns |
Describes general genomic features that don't fit into any of the more specific feature tables.
Column | Type | Default value | Description | Index |
simple_feature_id | INT(10) | | Primary key, internal identifier. | primary key |
seq_region_id | INT(10) | | Foreign key references to the seq_region table. | key: seq_region_idx |
seq_region_start | INT(10) | | Sequence start position. | key: seq_region_idx |
seq_region_end | INT(10) | | Sequence end position. | |
seq_region_strand | TINYINT(1) | | Sequence region strand: 1 - forward; -1 - reverse. | |
display_label | VARCHAR(255) | | Display label for the EnsEMBL web site. | key: hit_idx |
analysis_id | SMALLINT | | Foreign key references to the analysis table. | key: analysis_idx |
score | DOUBLE | | Analysis score. | |
splicing_event |
Show columns |
The splicing event table contains alternative splicing events and constitutive splicing events as reported by the AltSpliceFinder program. Multiple alternative splicing events can be observed on a gene. The location of the splicing event on the seq_region is reported. The type of event is stored in the @link attrib_type table.
Column | Type | Default value | Description | Index |
splicing_event_id | INT(10) | | Primary key, internal identifier. | primary key |
name | VARCHAR(134) | | Splicing event name. | |
gene_id | INT(10) | | Foreign key references to the gene table. | key: gene_idx |
seq_region_id | INT(10) | | Foreign key references to the seq_region table. | key: seq_region_idx |
seq_region_start | INT(10) | | Sequence start position. | key: seq_region_idx |
seq_region_end | INT(10) | | Sequence end position. | |
seq_region_strand | TINYINT(2) | | Sequence region strand: 1 - forward; -1 - reverse. | |
attrib_type_id | SMALLINT(5) | 0 | Foreign key references to the attrib_type table. | |
splicing_event_feature |
Show columns |
Represents alternative splicing event features. If the event is a constitutive exon, the constitutive exon and the transcript it belongs to is reported in this table. If the event is a cassette exon, the cassette exon and the transcript it belongs to is represented in this table. The transcript association field associates a sequence number with a transcript id. Thus, several exons skipped in an event can be attached to the same transcript. The features are ordered according to their genomic location and this is reflected in the feature order field value.
Column | Type | Default value | Description | Index |
splicing_event_feature_id | INT(10) | | Primary key, internal identifier. | primary key |
splicing_event_id | INT(10) | | Foreign key references to the splicing_event table. | key: se_idx |
exon_id | INT(10) | | Foreign key references to the exon table. | primary key |
transcript_id | INT(10) | | Foreign key references to the transcript table. | primary key key: transcript_idx |
feature_order | INT(10) | | Feature order number according to genomic location. | |
transcript_association | INT(10) | | Transcript sequence. | |
type | ENUM('constitutive_exon','exon','flanking_exon') | | E.g. 'constitutive_exon','exon','flanking_exon'. | |
start | INT(10) | | Sequence start. | |
end | INT(10) | | Sequence end. | |
splicing_transcript_pair |
Show columns |
Describes a pair of spliced transcripts in a splicing event. A splicing event is an observation of a change of splice sites between two isoforms. To avoid redundancy, some events, like a skipped exon observed between different pairs of transcripts are reported only once. The splicing transcript pair table contains a list of all the combinations of 2 isoforms relating to the same event.
Column | Type | Default value | Description | Index |
splicing_transcript_pair_id | INT(10) | | Primary key, internal identifier. | primary key |
splicing_event_id | INT(10) | | Foreign key references to the splicing_event table. | key: se_idx |
transcript_id_1 | INT(10) | | Foreign key references to the transcript table. | |
transcript_id_2 | INT(10) | | Foreign key references to the transcript table. | |
supporting_feature |
Show columns |
Describes the exon prediction process by linking exons to DNA or protein alignment features. As in several other tables, the feature_id column is a foreign key; the feature_type column specifies which table feature_id refers to.
Column | Type | Default value | Description | Index |
exon_id | INT(10) | '0' | Foreign key references to the exon table. | unique key: all_idx |
feature_type | ENUM('dna_align_feature','protein_align_feature') | | Feature type: 'dna_align_feature' or 'protein_align_feature' | unique key: all_idx key: feature_idx |
feature_id | INT(10) | '0' | Foreign key references to the dna_align_feature or @link protein_align_feature table depending on the feature type. | unique key: all_idx key: feature_idx |
transcript_supporting_feature |
Show columns |
Describes the exon prediction process by linking transcripts to DNA or protein alignment features. As in several other tables, the feature_id column is a foreign key; the feature_type column specifies which table feature_id refers to.
Column | Type | Default value | Description | Index |
transcript_id | INT(10) | '0' | Foreign key references to the transcript table. | unique key: all_idx |
feature_type | ENUM('dna_align_feature','protein_align_feature') | | Feature type: 'dna_align_feature' or 'protein_align_feature' | unique key: all_idx key: feature_idx |
feature_id | INT(10) | '0' | Foreign key references to the dna_align_feature or @link protein_align_feature table depending on the feature type. | unique key: all_idx key: feature_idx |
ID Mapping
A PDF document of the schema is available here.
gene_archive |
Show columns |
Contains a snapshot of the stable IDs associated with genes deleted or changed between releases. Includes gene, transcript and translation stable IDs.
Column | Type | Default value | Description | Index |
gene_stable_id | VARCHAR(128) | | Foreign key references to the gene_stable_id table. | key: gene_idx |
gene_version | SMALLINT | 1 | Gene version. | key: gene_idx |
transcript_stable_id | VARCHAR(128) | | Foreign key references to the transcript_stable_id table. | key: transcript_idx |
transcript_version | SMALLINT | 1 | Transcript version. | key: transcript_idx |
translation_stable_id | VARCHAR(128) | | Foreign key references to the transcript_stable_id table. | key: translation_idx |
translation_version | SMALLINT | 1 | Translation version. | key: translation_idx |
peptide_archive_id | INT(10) | | Foreign key references to the peptide archive table. | key: peptide_archive_id_idx |
mapping_session_id | INT(10) | | Foreign key references to the mapping_session table. | |
See also:
mapping_session |
Show columns |
Stores details of ID mapping sessions - a mapping session represents the session when stable IDs where mapped from one database to another. Details of the "old" and "new" databases are stored.
Column | Type | Default value | Description | Index |
mapping_session_id | INT(10) | | Primary key, internal identifier. | primary key |
old_db_name | VARCHAR(80) | '' | Old Ensembl database name. | |
new_db_name | VARCHAR(80) | '' | New Ensembl database name. | |
old_release | VARCHAR(5) | '' | Old Ensembl database release. | |
new_release | VARCHAR(5) | '' | New Ensembl database release. | |
old_assembly | VARCHAR(20) | '' | Old assembly. | |
new_assembly | VARCHAR(20) | '' | New assembly. | |
created | DATETIME | | Date created. | |
See also:
Table structure for seq_region mapping between releases.
Column | Type | Default value | Description | Index |
mapping_set_id | INT(10) | | Primary key, internal identifier. | |
schema_build | VARCHAR(20) | | E.g. 61_37f | primary key |
stable_id_event |
Show columns |
Represents what happened to all gene, transcript and translation stable IDs during a mapping session. This includes which IDs where deleted, created and related to each other. Each event is represented by one or more rows in the table.
Column | Type | Default value | Description | Index |
old_stable_id | VARCHAR(128) | | Gene/transcript/translation stable id for the previous release. | unique key: uni_idx key: old_idx |
old_version | SMALLINT | | Stable id version. | |
new_stable_id | VARCHAR(128) | | Gene/transcript/translation stable id for the current release. | unique key: uni_idx key: new_idx |
new_version | SMALLINT | | Stable id version. | |
mapping_session_id | INT(10) | '0' | Foreign key references to the mapping_session table. | unique key: uni_idx |
type | ENUM('gene', 'transcript', 'translation') | | ENUM('gene', 'transcript', 'translation') NOT NULL, | unique key: uni_idx |
score | FLOAT | 0 | Combined mapping score. | |
See also:
peptide_archive |
Show columns |
Contains the peptides for deleted or changed translations.
Column | Type | Default value | Description | Index |
peptide_archive_id | INT(10) | | Primary key, internal identifier. | primary key |
md5_checksum | VARCHAR(32) | | MD5 checksum hexadecimal digest of the sequence. | key: checksum |
peptide_seq | MEDIUMTEXT | | Peptide sequence. | |
seq_region_mapping |
Show columns |
Describes how the core seq_region_id have changed from release to release.
Column | Type | Default value | Description | Index |
external_seq_region_id | INT(10) | | Foreign key references to the seq_region table. | |
internal_seq_region_id | INT(10) | | Foreign key references to the seq_region table. | |
mapping_set_id | INT(10) | | Foreign key references to the mapping_set table. | key: mapping_set_idx |
External References
A PDF document of the schema is available here.
dependent_xref |
Show columns |
Describes dependent external references which can't be directly mapped to Ensembl entities. They are linked to primary external references instead.
Column | Type | Default value | Description | Index |
object_xref_id | | | Primary key, internal identifier. Foreign key references to the object_xref table. | |
master_xref_id | | | Foreign key references to the xref table. | |
dependent_xref_id | | | Foreign key references to the xref table. | |
See also:
Stores data about the external databases in which the objects described in the xref table are stored.
Column | Type | Default value | Description | Index |
external_db_id | SMALLINT | | Primary key, internal identifier. | primary key |
db_name | VARCHAR(100) | | Database name. | |
db_release | VARCHAR(255) | | Database release. | |
status | ENUM('KNOWNXREF','KNOWN','XREF','PRED','ORTH', 'PSEUDO') | | Status, e.g. 'KNOWNXREF','KNOWN','XREF','PRED','ORTH','PSEUDO'. | |
priority | INT | | Determines which one of the xrefs will be used as the gene name. | |
db_display_name | VARCHAR(255) | | Database display name. | |
type | ENUM('ARRAY', 'ALT_TRANS', 'ALT_GENE', 'MISC', 'LIT', 'PRIMARY_DB_SYNONYM', 'ENSEMBL') | | Type, e.g. 'ARRAY', 'ALT_TRANS', 'ALT_GENE', 'MISC', 'LIT', 'PRIMARY_DB_SYNONYM', 'ENSEMBL'. | |
secondary_db_name | VARCHAR(255) | NULL | Secondary database name. | |
secondary_db_table | VARCHAR(255) | NULL | Secondary database table. | |
description | TEXT | | Description. | |
See also:
external_synonym |
Show columns |
Some xref objects can be referred to by more than one name. This table relates names to xref IDs.
Column | Type | Default value | Description | Index |
xref_id | INT(10) | | Primary key, internal identifier. | primary key |
synonym | VARCHAR(100) | | Synonym | primary key key: name_index |
See also:
identity_xref |
Show columns |
Describes how well a particular xref object matches the EnsEMBL object.
Column | Type | Default value | Description | Index |
object_xref_id | INT(10) | | Primary key, internal identifier. Foreign key references to the object_xref table. | primary key |
xref_identity | INT(5) | | Percentage identity. | |
ensembl_identity | INT(5) | | Percentage identity. | |
xref_start | INT | | Xref sequence start. | |
xref_end | INT | | Xref sequence end. | |
ensembl_start | INT | | Ensembl sequence start. | |
ensembl_end | INT | | Ensembl sequence end. | |
cigar_line | TEXT | | Used to encode gapped alignments. | |
score | DOUBLE | | Match score. | |
evalue | DOUBLE | | Match evalue. | |
See also:
Describes links between EnsEMBL objects and objects held in external databases. The EnsEMBL object can be one of several types; the type is held in the ensembl_object_type column. The ID of the particular EnsEMBL gene, translation or whatever is given in the ensembl_id column. The xref_id points to the entry in the xref table that holds data about the external object. Each EnsEMBL object can be associated with zero or more xrefs. An xref object can be associated with one or more EnsEMBL objects.
Column | Type | Default value | Description | Index |
object_xref_id | INT(10) | | Primary key, internal identifier. | primary key |
ensembl_id | INT(10) | | Foreign key references to the seq_region, @link transcript, @link gene, @translation tables depending on ensembl_object_type. | key: ensembl_idx |
ensembl_object_type | ENUM('RawContig', 'Transcript', 'Gene', 'Translation') | | Ensembl object type: 'RawContig', 'Transcript', 'Gene','Translation'. | key: ensembl_idx |
xref_id | ensembl_object_type | | Foreign key references to the xref table. | |
linkage_annotation | VARCHAR(255) | NULL | Additional annotation on the linkage. | |
analysis_id | SMALLINT | NULL | Foreign key references to the analysis table. | key: analysis_idx |
See also:
ontology_xref |
Show columns |
This table associates Evidence Tags to the relationship between EnsEMBL objects and ontology accessions (primarily GO accessions). The relationship to GO that is stored in the database is actually derived through the relationship of EnsEMBL peptides to SwissProt peptides, i.e. the relationship is derived like this: ENSP -> SWISSPROT -> GO And the evidence tag describes the relationship between the SwissProt Peptide and the GO entry. In reality, however, we store this in the database like this: ENSP -> SWISSPROT ENSP -> GO and the evidence tag hangs off of the relationship between the ENSP and the GO identifier. Some ENSPs are associated with multiple closely related Swissprot entries which may both be associated with the same GO identifier but with different evidence tags. For this reason a single Ensembl - external db object relationship in the object_xref table can be associated with multiple evidence tags in the ontology_xref table.
Column | Type | Default value | Description | Index |
object_xref_id | INT(10) | '0' | Composite key. Foreign key references to the object_xref table. | unique key: object_source_type_idx |
source_xref_id | INT(10) | NULL | Composite key. Foreign key references to the xref table. | key: source_idx unique key: object_source_type_idx |
linkage_type | ENUM('IC', 'IDA', 'IEA', 'IEP', 'IGI', 'IMP', 'IPI', 'ISS', 'NAS', 'ND' , 'TAS', 'NR' , 'RCA', 'EXP', 'ISO', 'ISA', 'ISM', 'IGC') | | Composite key. Evidence tags | unique key: object_source_type_idx |
See also:
seq_region_synonym |
Show columns |
Allows for storing multiple names for sequence regions.
Column | Type | Default value | Description | Index |
seq_region_synonym_id | INT | | Primary key, internal identifier. | primary key |
seq_region_id | INT(10) | | Foreign key references to the seq_region table. | key: seq_region_idx |
synonym | VARCHAR(40) | | Alternative name for sequence region. | unique key: syn_idx |
external_db_id | SMALLINT | | Foreign key references to the external_db table. | |
unmapped_object |
Show columns |
Describes why a particular external entity was not mapped to an ensembl one.
Column | Type | Default value | Description | Index |
unmapped_object_id | INT(10) | | Primary key, internal identifier. | primary key |
type | ENUM('xref', 'cDNA', 'Marker') | | Object type: 'xref', 'cDNA', 'Marker'. | |
analysis_id | SMALLINT | | Foreign key references to the analysis table. | key: anal_exdb_idx |
external_db_id | SMALLINT | | Foreign key references to the external_db table. | unique key: unique_unmapped_obj_idx key: anal_exdb_idx key: ext_db_identifier_idx |
identifier | VARCHAR(255) | | External database identifier. | unique key: unique_unmapped_obj_idx key: id_idx key: ext_db_identifier_idx |
unmapped_reason_id | SMALLINT(5) | | Foreign key references to the unmapped_reason table. | unique key: unique_unmapped_obj_idx |
query_score | DOUBLE | | Actual mapping query score. | |
target_score | DOUBLE | | Target mapping query score. | |
ensembl_id | INT(10) | '0' | Foreign key references to the seq_region, @link transcript, @link gene, @translation tables depending on ensembl_object_type. | unique key: unique_unmapped_obj_idx |
ensembl_object_type | ENUM('RawContig','Transcript','Gene','Translation') | | Ensembl object type: 'RawContig', 'Transcript', 'Gene','Translation'. | unique key: unique_unmapped_obj_idx |
parent | VARCHAR(255) | NULL | Foreign key references to the dependent_xref table, in case the unmapped object is dependent on a primary external reference which wasn't mapped to an ensembl one. | unique key: unique_unmapped_obj_idx |
unmapped_reason |
Show columns |
Describes the reason why a mapping failed.
Column | Type | Default value | Description | Index |
unmapped_reason_id | SMALLINT(5) | | Primary key, internal identifier. | primary key |
summary_description | VARCHAR(255) | | Summarised description. | |
full_description | VARCHAR(255) | | Full description. | |
Holds data about objects which are external to EnsEMBL, but need to be associated with EnsEMBL objects. Information about the database that the external object is stored in is held in the external_db table entry referred to by the external_db column.
Column | Type | Default value | Description | Index |
xref_id | INT(10) | | Primary key, internal identifier. | primary key |
external_db_id | SMALLINT | | Foreign key references to the external_db table. | unique key: id_index |
dbprimary_acc | VARCHAR(40) | | Primary accession number. | unique key: id_index |
display_label | VARCHAR(128) | | Display label for the EnsEMBL web site. | key: display_index |
version | VARCHAR(10) | '0' | Object version. | |
description | TEXT | | Object description. | |
info_type | ENUM( 'PROJECTION', 'MISC', 'DEPENDENT', 'DIRECT', 'SEQUENCE_MATCH', 'INFERRED_PAIR', 'PROBE', 'UNMAPPED', 'COORDINATE_OVERLAP' ) | | 'PROJECTION', 'MISC', 'DEPENDENT','DIRECT', 'SEQUENCE_MATCH','INFERRED_PAIR', 'PROBE','UNMAPPED', 'COORDINATE_OVERLAP'. | unique key: id_index key: info_type_idx |
info_text | VARCHAR(255) | | Text | unique key: id_index |
See also:
Miscellaneous
InterPro - The InterPro website
Column | Type | Default value | Description | Index |
interpro_ac | VARCHAR(40) | | InterPro protein accession number. | unique key: accession_idx |
id | VARCHAR(40) | | InterPro protein id. @info | unique key: accession_idx key: id_idx |
Concepts
- co-ordinates
-
There are several different co-ordinate systems used in the EnsEMBL database and API. For every co-ordinate system, the fundamental
unit is one base. The differences between co-ordinate systems lie in where a particular numbered base lies, and the start
position it is relative to. CONTIG co-ordinates, also called 'raw contig' co-ordinates or 'clone fragments' are relative to
the first base of the first contig of a clone. Note that the numbering is from 1, i.e. the very first base of the first contig
of a clone is numbered 1, not 0. In CHROMOSOMAL co-ordinates, the co-ordinates are relative to the first base of the chromosome.
Again, numbering is from 1. The seq_region table can store sequence regions in any of the co-ordinate systems defined in the
coord_system table.
- supercontigs
-
A supercontig is made up of a group of adjacent or overlapping contigs.
- sticky_rank
-
The sticky_rank differentiates between fragments of the same exon; i.e for exons that span multiple contigs, all the fragments
would have the same ID, but different sticky_rank values
- stable_id
-
Gene predictions have changed over the various releases of the EnsEMBL databases. To allow the user to track particular gene
predictions over changing co-ordinates, each gene-related prediction is given a 'stable identifier'. If a prediction looks
similar between two releases, we try to give it the same name, even though it may have changed position and/or had some sequence
changes.
- cigar_line
-
This allows the compact storage of gapped alignments by storing the maximum extent of the matches and then a text string which
encodes the placement of gaps inside the alignment. Colloquially inside EnsEMBL this is called a and its adoption has shrunk
the number of rows in the feature table around 4-fold.