Gene Data In MaizeGDB
   Simple gene/gene model search
   Advanced gene/gene model search
   Search for Gene Models by sequence
   Translate Gene Model IDs
   Download Files
   Download By Region
   Download Sequence for Gene Model List
   Gene Model Annotations and Orthologs
   Gene Models with Associated Genes
   Gene Models with UniformMu insertions
   Complete cross reference
   Older gene model downloads
   About the current gene model set
   Image of Gene Model frequency
   Gene symbol list
   NCBI B73_v4 annotation release 101
   Classical Maize Genes
   Nomenclature guidelines
   Gene Model Terms

Simple Search: This search form allows you to enter basic information (locus name, Gene Model ID, Transcipt ID, Translation ID, Gene symbol, Gene name), including partial names, to search for a gene and/or gene model.

Submit (see a sample gene model query or locus query)


More Examples: lg1, liguleless1, Zm00001d002005, GRMZM2G036297, DAA35605, Zm00001d002005_T001

Advanced Search

Check the boxes next to the fields you want to search; if you just want to find records that have any value for that attribute, check the box and leave the criteria alone.

Show only genes:
from :
of :
on :

Search for Gene Models by Sequence

Enter sequence, Genbank IDs or gene model names (Zmdddddadddddd, ZEAMMB73_xxxx or GRMZMxxxxxx): Sample
Amino acid

Translate Gene Model IDs

   Enter list - 8,000 gene model limit: (Example list)
Translate to:


Download By Region and Gene Model Set

Gene model set: Chromosome:
Model type: Data type:
Start position: End position:
(enter positions w/o commas or spaces, or leave both empty for entire chromosome)
   Enter one marker to get gene models within its prossible range, or enter two markers to get gene models within the span. Pairs must be on the same chromosome.
Enter a list of Gene Models, Transcripts, and/or Proteins to retrieve their positions on a given assembly in a tab-delimited format.
Output type:       Submit

Download Sequence for Gene Model List

When downloading sequence please specify which type of input you are entering. For genomic please use the gene model name (e.g. GRMZM2G165390). For cDNA, CDS, and mRNA please use the transcript ID (e.g. GRMZM2G165390_T01). For protein please use the translation ID (e.g. GRMZM2G165390_P01). If you enter only the gene model ID, please choose if you want to see all transcripts or only the canonical transcripts.

   Enter list - 8,000 gene model limit: (Example list)
Input type:
Output type:


About The Current Gene Model Set

The current gene model set (i.e. structural assembly annotation) is Zm00001d.1.

See the 2016 Whole-Genome Assembly and Annotation nomenclature document for an explanation of the assembly and annotation identifiers, which has been adopted for the Zm-B73-REFERENCE-GRAMENE-4.0 / Zm00001d.1 assembly and structural annotation and subsequent assemblies and annotation for B73 and other accessions.

The Zm00001d.1 gene model set for Zm-B73-REFERENCE-GRAMENE-4.0 is the current recommended set. Other gene model sets are provided for comparison. Due to the difficulty of determining when two gene models are the same (or when one represents an alternative splicing of the same genomic material), there are no plans to merge the sets. The next release of the B73 genome (RefGen_v4) is anticipated in 2016 and the new official gene model set will be based on MAKER-P annotations.

Gene model sets and assemblies:
set assembly Gramene version
Zm00001d.2   Zm-B73-REFERENCE-GRAMENE-4.0 36
Zm00001d.1   Zm-B73-REFERENCE-GRAMENE-4.0 32
5b+   B73 RefGen_v3 18-31
5b   B73 RefGen_v2 7-17
4a   B73 RefGen_v1

Gene Model Annotations and Orthologs (B73 RefGen_v3) Functional Annotations
Phytozome: Functional Annotations (log-in required)
Freeling Lab: Syntenic Orthologs (mapped to RefGen_v2)

Gene Models with Associated Genes (B73 RefGen_v3)

Classical Genes:   table   tab delimited
MaizeGDB curated genes:   table   tab delimited
Combined set:   table   tab delimited

Gene Models with UniformMu insertions (B73 RefGen_v3; version 7)

List of gene models from the RefGen_v2 Filtered Gene Set that have UniformMu insertions including 100 bp upstream or downstream: Excel spreadsheet

List of gene models from the RefGen_v2 Filtered Gene Set that have UniformMu insertions in exons: Excel spreadsheet

Zm-B73-REFERENCE-GRAMENE-4.0 / Zm00001d.2 Information

In-depth information about Zm-B73-REFERENCE-GRAMENE-4.0 is available here.

Counts for each chromosome.
Chromosome Accession Protein Coding miRNA Transposable Element Low Confidence
Chromosome 1 5905 14 2209
Chromosome 2 4737 22 2209
Chromosome 3 4737 16 1571
Chromosome 4 4115 20 1826
Chromosome 5 4480 24 1681
Chromosome 6 3290 11 1223
Chromosome 7 3108 10 1193
Chromosome 8 3561 13 1288
Chromosome 9 2973 7 1191
Chromosome 10 2684 17 1034
Unmapped 319 0 357
Nuclear Total 39,324 154 15,516

Zm-B73-REFERENCE-GRAMENE-4.0 / Zm00001d Stats

Gene Feature Value
Average protein-coding transcript size 7638 bp
Average low confidence transcript size 6981 bp
Average transposable element size unavailable
Average Exon size 156 bp
Average Number of exons per gene 4 exons
Maximum exons per gene 81 exons (Zm00001d040166)
Average Intron size 578 bp
Average Coding region size 207 bp

NCBI B73_v4 annotation release 101

The NCBI B73_v4 annotation release 100 was developed independently at NCBI using the NCBI Eukaryotic Genome Annotation Pipeline on B73 RefGen_v4. The final set of annotated features comprises, in order of preference, pre-existing RefSeq sequences and a subset of well-supported Gnomon-predicted models. It is built by evaluating together at each locus the known RefSeq transcripts, the features projected from curated RefSeq genomic alignments and the models predicted by Gnomon.

The NCBI B73_v3 annotation release 100 is available here.

Image of gene model frequency for Zm-B73-REFERENCE-GRAMENE-4.0 (Click image for larger view)

Gene Model Terms

Associated Genes: Associated Genes are genes that have been linked to a gene model by hand curation.

Canonical: The canonical transcript is defined as either the longest CDS, if the gene has translated transcripts, or the longest cDNA. Note: a canonical transcript is not always the first transcript (T01) or the longest transcript.

Non-canonical. All other transcripts for a gene model that are not the canonical transcript.

Evidence Type: The source of evidence to support the gene model.

Model Types:

Protein Coding A gene model with supporting evidence.
miRNA small, non-coding RNA.
TE Transposable elements.
Low Confidence A gene model with little or no supporting evidence.
WGS. (Versions 5a.59 and earlier) Working Gene Set. This set merges new annotations performed on RefGen_v2 with RefGen_v1 4a gene models mapped onto V2. New annotations were achieved by an evidence-based method (Gramene GeneBuilder) and complemented with de novo Fgenesh models performed on masked DNA.
FGS: (Versions 5b.60 and earlier) Filtered Gene Set. The filtered set was generated by screening the working set to remove pseudogenes, TE-encoded genes, and low-confidence hypothetical models.

Transcript Classes:

WH. With homology to a known non-transposable element in the NR (non-redundant) database at GenBank. Protein-coding gene.
NH. No homology in the NR (non-redundant) database at GenBank. Hypothetical gene or pseudogene.
TE. With homology to a known transposable element (TE) in the NR (non-redundant) database at GenBank. Transposable element.

Discussion of Gene Data