help  | faq  | software  | ARAPORT

Data

ThaleMine integrates biological data from a wide array of public sources into a data warehouse. This page lists the datasets that are included in the current release.

Please contact us if there are any particular data you would like to suggest.

Arabidopsis thaliana Col-0 genome

Data Category Feature Count Data Source Data Set Data Set Description PubMed
Chromosomes 7 TAIR Genome Assembly - TAIR9 TAIR9 Genome assembly (5 chromosomes plus chloroplast and mitochondrial assemblies) Arabidopsis Genome Initiative, 2000
PubMed: 11130711
Genes 27655 Araport Genome Annotation - Araport11 (06/2016) Protein-coding genes Cheng et al., 2016
http://dx.doi.org/10.1101/047308
Genes 952 Araport Genome Annotation - Araport11 (06/2016) Pseudogenes Cheng et al., 2016
http://dx.doi.org/10.1101/047308
Genes 508 Araport Genome Annotation - Araport11 (06/2016) Novel Transcribed Regions Cheng et al., 2016
http://dx.doi.org/10.1101/047308
Genes 5178 Araport Genome Annotation - Araport11 (06/2016) Non-coding genes Cheng et al., 2016
http://dx.doi.org/10.1101/047308
Genes 3901 TAIR Genome Annotation - TAIR10 Transposable Element genes Lamesch et al., 2012
PubMed: 22140109
Genomic Features 34856 TAIR Genome Annotation - TAIR10 Transposable Elements Lamesch et al., 2012
PubMed: 22140109
Genomic Features 111 Araport Genome Annotation - Araport11 (06/2016) Upstream Open Reading Frames Cheng et al., 2016
http://dx.doi.org/10.1101/047308

Public datasets

Data Category Gene Count Feature Count Data Source Data Set Data Set Description PubMed
Proteins 12295 14952  proteins UniProt TrEMBL data set - 2016_07 Computationally analysed records, enriched with automatic annotation UniProt Consortium, 2014
PubMed: 24253303
Proteins 14475 18187  proteins UniProt Swiss-Prot data set - 2016_07 High-quality, manually annotated, non-redundant protein sequence database UniProt Consortium, 2014
PubMed: 24253303
Protein Domains 22524 6857  protein domains InterPro InterPro data set - v58.0 Protein family and domain assignments to proteins Mitchell et al., 2015
PubMed: 25428371
Homology 23236 32800  homologs Panther Panther data set - 11.0 PANTHER paralogs from Arabidopsis Mi et al., 2013
PubMed: 23193289
Homology real-time real-time Phytozome Phytozome Orthologs - real-time Gene families representing clade-specific orthology/paralogy relationships via PhytoMine web services Goodstein et al., 2012
PubMed: 22110026
Gene Ontology 24784 165877  GO annotations GO GO - 8/01/2016 Gene Ontology Consortium Gene Ontology Consortium, 2015
PubMed: 25428369
Plant Ontology 22975 529800  PO annotations TAIR PO Annotation from TAIR - 06/30/2015 PO Annotation from TAIR Berardini et al., 2015
PubMed: 26201819
Interactions 5312 38378  interactions IntAct IntAct interactions data set - 8/02/2016 Curated binary and complex protein-protein interactions for Arabidopsis thaliana Kerrien et al., 2012
PubMed: 22121220
Interactions 9256 78171  interactions BioGRID BioGRID interaction data set - 3.4.139 Curated set of genetic and physical interactions for Arabidopsis thaliana Chatr-Aryamontri et al., 2015
PubMed: 25428363
Expression 24005 123  experiments BAR The Bio-Analaytic Resource for Plant Biology The Bio-Analaytic Resource for Plant Biology Winter et al., 2007
PubMed: 17684564
Expression real-time real-time BAR Arabidopsis eFP - real-time Electronic Fluorescent Pictographic representations of gene expression patterns via BAR web services Winter et al., 2007
PubMed: 17684564
Expression 38042 113 datasets from 11 tissues Araport RNA-seq expression - Araport11 (06/2016) RNA-seq based gene expression levels (Transcripts per Million, TPM) quantified by Salmon Cheng et al., 2016
http://biorxiv.org/content/early/2016/04/05/047308
Co-Expression real-time real-time ATTED-II ATTED-II Co-expression - real-time Co-regulated gene relationships deduced from microarray and RNA-seq data via ATTED-II web services Obayashi et al., 2014
PubMed: 24334350
Publications 27500 30356  publications NCBI
TAIR
UniProt
Gene to PubMed - 8/12/2016
TAIR Publications - 09/30/2015
UniProt Publications - 2015_09
Curated associations between publications and genes Maglott et al., 2007
PubMed: 17148475
GeneRIF 5747 15650  GeneRIF Annotations NCBI GeneRIF - 8/12/2016 Concise phrase describing gene function and publication associated with NCBI Gene records Maglott et al., 2007
PubMed: 17148475
Pathways 4646 132  pathways GenomeNet KEGG pathways data set - 79.0 Wiring diagrams of molecular interactions, reactions, and relations Kanehisa et al., 2014
PubMed: 24214961
Germplasms/Seed Stocks 28740 583245  stocks TAIR Germplasm/Seed Stock - 10/2013 ABRC Germplasm/Seed Stock
Phenotypes 10243 12025  phenotypes TAIR Germplasm/Stock Phenotypes - 10/2013 Germplasm/Stock Phenotypes
Mutant Alleles 28750 265030  alleles TAIR Alleles/Polymorphisms - 10/2013 Alleles/Polymorphisms
TDNA-Seq 38772 175329  insertions SIGnAL TDNA-Seq Insertions data set 146,740 insertions identified by TDNA-Seq linked to 86,262 segregating SALK, SAIL and WiscDsLox lines Alonso et al., 2003
PubMed: 12893945
TAIR 3185 TAIR GO Annotation from InterPro