Training data sets


 
 
Training data sets Exonomy GeneSplicer GlimmerM Unveil
Arabidopsis.thaliana.tar.gz Yes Yes Yes Yes
Aspergillus.fumigatus.tar.gz Yes - Yes Yes
Aspergillus.spp.tar.gz Yes - Yes Yes
Homo.sapiens.tar.gz Yes Yes - Yes
Mus.musculus.tar.gz Yes - - Yes
Plasmodium.falciparum.tar.gz - Yes Yes -
chrom-coords-build31.tar.gz : (Human). Chromosome coordinates of 7709 high-confidence CDSs obtained through EnsMart with the generous help of Ewan Birney. GFF format. Coordinates are relative to Human build 31 chromosome files. Yes - - Yes
train.tar.gz : (Human). One multi-FASTA file containing 1500 high-confidence CDSs from EnsMart plus 2000bp on each side; also, one GFF file containing exon coordinates for all 1500 CDSs within those sequences. Yes - - Yes
test.tar.gz :(Human). 483 FASTA files and 483 GFF files containing coordinates of CDSs and their exons in those fasta files. Disjoint set from train.tar.gz. Yes - - Yes