Report an assembly or annotation error

The Zea mays ssp mays cv B73 Reference Genome

B73 Reference Genome Assembly Status
B73 Reference Genome Assembly Details
Genome Assembly Manifesto
B73 Reference Gene Models and Nomenclature
The Nomenclature Standards
Genome Assembly and Gene Model Issues
B73 Stock Information
Downloads
Genome Tools
FAQs


The Maize B73 Reference Genome

The maize B73 reference genome has been revised three times since its initial release as a BAC-by-BAC assembly in 2009. As of 2015, the maize nomenclature committee has adopted naming standards to accommodate multiple Zea species, multiple accessions, and multiple versions. This recommendation is available here. The B73 reference assemblies have been known by these names:
assembly assembly aka annotation annotation aka Gramene/
EnsemblPlant
version
Zm-B73-REFERENCE-GRAMENE-4.0 B73 RefGen_v4, AGPv4 Zm0001d.2 AGPv4 32/50 - 36/54
B73 RefGen_v3 AGPv3 5b+ AGPv3 18/36 - 31/49
B73 RefGen_v2 AGPv2 5b 5b.60, AGPv2 7/25 - 17/37
B73 RefGen_v1 AGPv1 4a 4a.53, AGPv1

   

B73 Reference Genome Assembly Status

The current Reference Genome for Maize is B73 RefGen_v4 (also known as Zm-B73-REFERENCE-GRAMENE-4.0).


The three previous assemblies, B73 RefGen_v1, B73 RefGen_v2, and B73 RefGen_v3 were all based on a BAC (bacterial artificial chromosome) sequencing strategy. B73 RefGen_v4 assembly utilized a new approach that relied on PacBio Single Molecule Real Time (SMRT) sequencing at Cold Spring Harbor to a depth of 60X coverage with scaffolds created via the assistance of whole genome restriction mapping (aka Optical Mapping). Error correction of PacBio sequences was facilitated by Illumina short read DNA sequencing performed at Washington University. Annotation was accomplished in the Ware laboratory at Cold Spring Harbor using the Maker pipeline (Campbell, 2014) and ~111,000 long read PacBio transcipts from six maize tissues. More complete details in the B73RefGen_v4 assembly can be found at Gramene or by reading the paper.


See the Assembly Manifesto for more information.




Next assembly version: The release date of the next assembly update is not known at this time (October 2016) Release dates will be posted here and elsewhere at MaizeGDB as they become available.

   

B73 Reference Genome Assembly Details

Chromosomes
The assembly sequence includes all 10 chromosomes, along with the mitochondrial and plastid genomes.
The sequence can be downloaded here.

Gaps
Gaps within BACs are indicated by runs of 100 N's. Gaps between contigs are indicated by runs of 1000 N's.


Current reference genome assemblies

Information and stats for Zm-B73-REFERENCE-GRAMENE-4.0.


Zm-B73-REFERENCE-GRAMENE-4.0 / Zm00001d.2 Information

In-depth information about Zm-B73-REFERENCE-GRAMENE-4.0 is available here.

Counts for each chromosome.
Chromosome Accession Protein Coding miRNA Transposable Element Low Confidence
Chromosome 1 5905 14 2209
Chromosome 2 4737 22 2209
Chromosome 3 4737 16 1571
Chromosome 4 4115 20 1826
Chromosome 5 4480 24 1681
Chromosome 6 3290 11 1223
Chromosome 7 3108 10 1193
Chromosome 8 3561 13 1288
Chromosome 9 2973 7 1191
Chromosome 10 2684 17 1034
Unmapped 319 0 357
Nuclear Total 39,324 154 15,516
Source: Zm-B73-REFERENCE-GRAMENE-4.0


Zm-B73-REFERENCE-GRAMENE-4.0 / Zm00001d Stats


Gene Feature Value
Average protein-coding transcript size 7638 bp
Average low confidence transcript size 6981 bp
Average transposable element size unavailable
Average Exon size 156 bp
Average Number of exons per gene 4 exons
Maximum exons per gene 81 exons (Zm00001d040166)
Average Intron size 578 bp
Average Coding region size 207 bp


Previous reference genome assemblies

Information and stats for B73 RefGen_v3.
Information and stats for B73 RefGen_v2.
   

B73 Reference Gene Models and Nomenclature

With increasing numbers of full reference genomes with structural annotation becoming available, it has become necessary to establish naming standards that span genomes and versions. The recommendation is available here.
Important note: Sequential gene ids do not imply relative position. Although the Zm00001d.2 gene models were called and numbered sequentially, they may move as the assembly is improved.

The current reference gene model set is named Zm0001d.2. Gene models within this set are prefixed with "Zm0001d". Associations between the new gene models and the 5b+ gene models is available here.


Gene model sets (annotations) by reference assembly version:

gene model set   description assembly version Gramene version cross reference
Zm00001d.2 Filtered Gene Set Zm-B73-GRAMENE-REFERENCE-4.0 36 xref
Zm00001d.1 Filtered Gene Set Zm-B73-GRAMENE-REFERENCE-4.0 32-33 xref
5b+ Filtered Gene Set, mostly projections of 5b RefGen_v3 18-31 xref
5a Working Gene Set (WGS) RefGen_v2 7-17
5b Filtered Gene Set (FGS) - subset of WGS RefGen_v2
4a.53 Filtered and Working gene sets RefGen_v1 7-17

The Zm0001d.2 gene model set is the recommended gene model set for Zm-B73-GRAMENE-REFERENCE-4.0.

For RefGen_v3, the 5b+ gene model set is recommended. Other gene model sets for RefGen_v3 are provided for comparison. Due to the difficulty of determining when two gene models are the same (or when one represents an alternative splicing of the same genomic material), there are no plans to merge the sets.


For more information see the Nomenclature Standards


   

B73 Reference Genome Assembly and Gene Model Issues

Report an assembly or gene model structure problem. This includes misassembled regions, evidence for closing gaps, gene models that should be merged or split, evidence supporting low-confidence gene models, et cetera.


   

B73 Stock Information

The B73 seed source for B73-REF-REFERENCE-4.0 was also descended from PI 550473, but was maintained for several generations prior to being used as the source seed. The seeds closest to those used for sequencing v4 were deposited at the NCRPIS (accession number: PI 677128).

The B73 source for the BAC libraries (BACs with prefix "b" prepared in Rod Wing's lab; BACs with prefix "c" prepared in Peter deJong's lab) was PI 550473. When requesting seed from the North Central Regional Plant Introduction Station, ask for any lot descended from the Coe PI 550473 lines.

The stock was received directly by the North Central Regional Plant Intoduction Station from Arnel Hallauer and has been maintained by the quality-maintenance procedures at the PI Station. Ed Coe reports that, "The results of QC lab checks for constancy in PI 550473 have been excellent."

The same source was used for the IBM mapping population. Maps produced at Missouri used 302 lines of this population, providing unmatched precision (resolution is at the intra-BAC level). These maps anchor the fingerprint-based contig assemblies to chromosome location.

High-Molecular-Weight DNA was prepared by Jack Gardiner in the lab at Missouri and shipped to Clemson (Wing's lab at the time) and to deJong's lab (just at the time his lab was moving to California) for BAC preparation.

NSF grant reports have documented the details, and specifics for the materials, preparation, characterization, and final assembly of the contig framework can be found in Coe E, Schaeffer ML (2005) Genetic, physical, maps, and database resources for maize. Maydica 50:285-303. Ed Coe has made a copy of that paper available here


   

B73 Reference Downloads

assembly datasest Gramene version(s)
Zm-B73-REFERENCE-GRAMENE-4.0 assembly 32-36
Zm-B73-REFERENCE-GRAMENE-4.0 gene model cDNA fasta 36
Zm-B73-REFERENCE-GRAMENE-4.0 gene model ncRNA fasta 36
Zm-B73-REFERENCE-GRAMENE-4.0 gene model CDS fasta 36
Zm-B73-REFERENCE-GRAMENE-4.0 gene model translations fasta 36
Zm-B73-REFERENCE-GRAMENE-4.0 gene model GFF3 36
B73 RefGen_v3 assembly 18-31
B73 RefGen_v3 gene model cDNA fasta 18-31
B73 RefGen_v3 gene model ncRNA fasta 18-31
B73 RefGen_v3 gene model translations fasta 18-31
B73 RefGen_v3 gene model GFF3 18-31
B73 RefGen_v3 MAKER-P gene models n/a
B73 RefGen_v2 Functional annotation from Gramene n/a



Functional annotation from Phytozome 10 (log in required)


Cross reference for GRMZM and ZEAMMB73 IDs


   

Genome Tools for B73 Reference Assembly

BLAST
Genome browser
Map V3 data to V4


   

FAQs

What is a Reference Genome?
What are the main changes between RefGen_v2 and RefGen_v3?
How can I map positions between the v2 and v3 assemblies?
Where can I find legacy resources from MaizeSequence.Org?
How can I identify the Filtered Gene Set (FGS) in RefGen_v3?
Where can I download a GFF dump of the FGS for maize genes in v3 (5b+)?


What is a Reference Genome?

A Reference Genome is a haploid representation of a genome as DNA sequence with a defined coordinate system, and accession and version identification. A Reference genome is usually assembled de novo, rather than relying on related genomes for assembly of small DNA fragments (which would be a reference guided assembly). A Reference Genome usually includes the structural annotations, or gene models, derived from the sequence assembly. A Reference Genome is almost always a work in progress that gets better with the additional new data over time. Data for improvement is collected continually, and at certain times, new Reference Genome versions come out that incorporate this data. B73 RefGen_v3 is such an updated version.


What are the main changes between RefGen_v2 and RefGen_v3?

Changes to the assembly include:

  • v3 captured missing gene space in v2 using WGS reads (v2 improved initial BAC assembly using MTP)
  • X contigs were moved or flipped.

Changes to the v3 gene models include:
  • 251 improved gene models
  • Among the improved modes, the following Fgenesh models were improved and given GRMZM IDs:
    AC147602.5_FG004 -> GRMZM6G741210
    AC190882.3_FG003 -> GRMZM6G961377
    AC192244.3_FG001 -> GRMZM6G869379
    AC194389.3_FG001 -> GRMZM6G399977
    AC204604.3_FG008 -> GRMZM6G220418
    AC210529.3_FG004 -> GRMZM6G945840
    AC232289.2_FG005 -> GRMZM6G404540
    AC233893.1_FG001 -> GRMZM6G310687
    AC233910.1_FG005 -> GRMZM6G729818
    AC235534.1_FG001 -> GRMZM6G798998
  • 213 novel gene models
  • 10 gene models were merged into new models:
    GRMZM2G000964, GRMZM2G103315 -> GRMZM2G000964
    GRMZM2G045892, GRMZM2G452386 -> GRMZM2G045892
    GRMZM2G119720, GRMZM2G518717 -> GRMZM2G119720
    GRMZM2G142383, GRMZM2G020429 -> GRMZM2G142383
    GRMZM2G319465, GRMZM2G439578 -> GRMZM2G319465
    GRMZM2G338693, GRMZM2G117517 -> GRMZM2G338693
    GRMZM5G861997, GRMZM5G864178 -> GRMZM5G861997
    GRMZM5G872800, GRMZM2G143862 -> GRMZM5G872800
    GRMZM5G891969, GRMZM5G823855 -> GRMZM5G891969
  • The 39,656 FGS gene models in 5b are now 39,475 protein-coding gene models in 5b+ (loss is due to merging; non-protein-coding gene models indicated as low confidence, and transposable elements)


How can I map positions between the v2 and v3 assemblies?

Use the Ensembl assembly converter tool at Gramene.


Where can I find legacy resources from MaizeSequence.Org?

At the Gramene ftp archive.


How can I identify the Filtered Gene Set (FGS) in RefGen_v3?

In the 5b+ gene build, the former FGS gene models are indicated as protein-coding.


Where can I download a GFF dump of the FGS for maize genes in v3 (5b+)?

From the Gramene 5b+ ftp folder.