About The Current Gene Model Set

The current gene model set (i.e. structural assembly annotation) is Zm00001d.1.

See the 2016 Whole-Genome Assembly and Annotation nomenclature document for an explanation of the assembly and annotation identifiers, which has been adopted for the Zm-B73-REFERENCE-GRAMENE-4.0 / Zm00001d.1 assembly and structural annotation and subsequent assemblies and annotation for B73 and other accessions.

The Zm00001d.1 gene model set for Zm-B73-REFERENCE-GRAMENE-4.0 is the current recommended set. Other gene model sets are provided for comparison. Due to the difficulty of determining when two gene models are the same (or when one represents an alternative splicing of the same genomic material), there are no plans to merge the sets. The next release of the B73 genome (RefGen_v4) is anticipated in 2016 and the new official gene model set will be based on MAKER-P annotations.

Gene model sets and assemblies:
set assembly Gramene version
Zm00001d.2   Zm-B73-REFERENCE-GRAMENE-4.0 36
Zm00001d.1   Zm-B73-REFERENCE-GRAMENE-4.0 32
5b+   B73 RefGen_v3 18-31
5b   B73 RefGen_v2 7-17
4a   B73 RefGen_v1

Gene Model Annotations and Orthologs (B73 RefGen_v3) Functional Annotations
Phytozome: Functional Annotations (log-in required)
Freeling Lab: Syntenic Orthologs (mapped to RefGen_v2)

Gene Models with Associated Genes (B73 RefGen_v3)

Classical Genes:   table   tab delimited
MaizeGDB curated genes:   table   tab delimited
Combined set:   table   tab delimited

Gene Models with UniformMu insertions (B73 RefGen_v3; version 7)

List of gene models from the RefGen_v2 Filtered Gene Set that have UniformMu insertions including 100 bp upstream or downstream: Excel spreadsheet

List of gene models from the RefGen_v2 Filtered Gene Set that have UniformMu insertions in exons: Excel spreadsheet

Zm-B73-REFERENCE-GRAMENE-4.0 / Zm00001d.2 Information

In-depth information about Zm-B73-REFERENCE-GRAMENE-4.0 is available here.

Counts for each chromosome.
Chromosome Accession Protein Coding miRNA Transposable Element Low Confidence
Chromosome 1 5905 14 2209
Chromosome 2 4737 22 2209
Chromosome 3 4737 16 1571
Chromosome 4 4115 20 1826
Chromosome 5 4480 24 1681
Chromosome 6 3290 11 1223
Chromosome 7 3108 10 1193
Chromosome 8 3561 13 1288
Chromosome 9 2973 7 1191
Chromosome 10 2684 17 1034
Unmapped 319 0 357
Nuclear Total 39,324 154 15,516

Zm-B73-REFERENCE-GRAMENE-4.0 / Zm00001d Stats

Gene Feature Value
Average protein-coding transcript size 7638 bp
Average low confidence transcript size 6981 bp
Average transposable element size unavailable
Average Exon size 156 bp
Average Number of exons per gene 4 exons
Maximum exons per gene 81 exons (Zm00001d040166)
Average Intron size 578 bp
Average Coding region size 207 bp

NCBI B73_v4 annotation release 101

The NCBI B73_v4 annotation release 100 was developed independently at NCBI using the NCBI Eukaryotic Genome Annotation Pipeline on B73 RefGen_v4. The final set of annotated features comprises, in order of preference, pre-existing RefSeq sequences and a subset of well-supported Gnomon-predicted models. It is built by evaluating together at each locus the known RefSeq transcripts, the features projected from curated RefSeq genomic alignments and the models predicted by Gnomon.

The NCBI B73_v3 annotation release 100 is available here.

Image of gene model frequency for Zm-B73-REFERENCE-GRAMENE-4.0 (Click image for larger view)

Gene Model Terms

Associated Genes: Associated Genes are genes that have been linked to a gene model by hand curation.

Canonical: The canonical transcript is defined as either the longest CDS, if the gene has translated transcripts, or the longest cDNA. Note: a canonical transcript is not always the first transcript (T01) or the longest transcript.

Non-canonical. All other transcripts for a gene model that are not the canonical transcript.

Evidence Type: The source of evidence to support the gene model.

Model Types:

Protein Coding A gene model with supporting evidence.
miRNA small, non-coding RNA.
TE Transposable elements.
Low Confidence A gene model with little or no supporting evidence.
WGS. (Versions 5a.59 and earlier) Working Gene Set. This set merges new annotations performed on RefGen_v2 with RefGen_v1 4a gene models mapped onto V2. New annotations were achieved by an evidence-based method (Gramene GeneBuilder) and complemented with de novo Fgenesh models performed on masked DNA.
FGS: (Versions 5b.60 and earlier) Filtered Gene Set. The filtered set was generated by screening the working set to remove pseudogenes, TE-encoded genes, and low-confidence hypothetical models.

Transcript Classes:

WH. With homology to a known non-transposable element in the NR (non-redundant) database at GenBank. Protein-coding gene.
NH. No homology in the NR (non-redundant) database at GenBank. Hypothetical gene or pseudogene.
TE. With homology to a known transposable element (TE) in the NR (non-redundant) database at GenBank. Transposable element.

