Ensembl families are determined through clustering of all Ensembl proteins along with metazoan sequences from UniProtKB. It therefore provides a way of exploring orthologues and closely related homologues across a range of animal species.
The protein family database is generated by running the Markov Clustering (MCL) algorithm [1, 2, 3, 4] as initially proposed by A.J. Enright, S. van Dongen and C.A. Ouzounis [5]. Prior to the clustering process, an all-against-all BLASTP sequence similarity search is run on the super-set of all Ensembl protein predictions of all species, together with all metazoan sequences from UniProt/Swiss-Prot and UniProt/TrEMBL, to establish similarities. Using these similarities, protein family clusters are established running the MCL algorithm.
1.
Stijn van Dongen
Graph Clustering by Flow Simulation.
PhD thesis, University of Utrecht, May 2000
Full text
2.
Stijn van Dongen
A cluster algorithm for graphs.
Technical Report INS-R0010,
National Research Institute for Mathematics and Computer Science in the Netherlands,
Amsterdam,
May 2000
Record 4463
urn:NBN:nl:ui:18-4463
3.
Stijn van Dongen
A stochastic uncoupling process for graphs.
Technical Report INS-R0011,
National Research Institute for Mathematics and Computer Science in the Netherlands,
Amsterdam,
May 2000
Record 4462
urn:NBN:nl:ui:18-4462
4.
Stijn van Dongen
Performance criteria for graph clustering and Markov cluster experiments.
Technical Report INS-R0012,
National Research Institute for Mathematics and Computer Science in the Netherlands,
Amsterdam,
May 2000
Record 4461
urn:NBN:nl:ui:18-4461
5.
Anton J. Enright,
Stijn van Dongen and
Christos A. Ouzounis
An efficient algorithm for large-scale detection of protein families.
Nucleic Acids Res. 2002 Apr 1;30,7,:1575-1584.
Abstract
Full text