Report an assembly or annotation error

Information about assembly Zm-B73-REFERENCE-GRAMENE-4.0    (also known as AGPv4, B73 RefGen_v4)
Click here to learn about maize genome and gene model nomenclature rules.

Genome Sequencing Project Information

   The reference genome of Zea mays sp. mays, inbred B73 was completely resequenced using PacBio Single Molecule Real-Time technology and a high-resolution genome map. Seed for the sequenced accession is available from NCRPIS (PI 677128).
   Project PI   Doreen Ware
   Project start date   2015
   Release date   2016-09-16
   Browse Genome   Genome browser at MaizeGDB
   Data download   ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/plant/Zea_mays/latest_assembly_versions/GCA_000005005.6_B73_RefGen_v4
   Publication status   Submitted
Project reference The complex sequence landscape of maize revealed by single molecule technologies. Yinping Jiao; Paul Peluso; Jinghua Shi; Tiffany Liang; Michelle C. Stitzer; Bo Wang; Michael Campbell; Joshua C. Stein; Xuehong Wei; Chen-Shan Chin; Katherine Guill; Michael Regulski; Sunita Kumari; Andrew Olson; Jonathan Gent; Kevin L. Schneider; Thomas K. Wolfgruber; Michael R. May; Nathan M. Springer; Eric Antoniou; Richard McCombie; Gernot G. Presting; Michael McMullen; Jeffrey Ross-Ibarra; Kelly Dawe; Alex Hastie; David R. Rank; Doreen Ware

Stock and Biosample Information

Stock information
   Stock name   PI 677128 (maize inbred line B73 from NCRPIS (PI550473), which was grown at University of Missouri. )
   Stock details   PI 677128 (maize inbred line B73 from NCRPIS (PI550473), which was grown at University of Missouri. )
   Stock provided by   University of Missouri
Biosample information
   GenBank BioSample   SAMN04296295  
   Sample description   The seeds used for sequencing were deposited at NCRPIS (PI 677128). Kernels were placed in a flat with Pro-Mix and allowed to grow for 4–6 days in the dark at 37°C. To eliminate chloroplast DNA, etiolated tissue was harvested
   Collection date   2015
   Collected by   University of Missouri
   Age   4-6 days
   Plant structure   whole plant
   Developmental stage   seedling

Sequencing and Assembly Information

   Assembly name   Zm-B73-REFERENCE-GRAMENE-4.0
   Sequencing description   Sequence service provider: Pacific Biosciences
Sequencing technologies: PacBio Single Molecule Real-Time sequencing
Sequencing method: PacBio Single Molecule Real-Time sequencing
   Assembly description   Assembly methods: Celera Assembler PBcR–MHAP pipeline and Falcon. Quiver from SMRT Analysis v2.3.0 was used to polish base calling of contigs.
Construction of pseudomolecules: Sequences from BACs used for v3 pseudomolecules were aligned to PacBio contigs using MUMMER. The scaffolds were then ordered and oriented into pseudochromosomes using the order of BACs as a guide. Gap filling was done with Pbjelly. The pseudomolecules were then polished using the Quiver pipeline from SMRT Analysis v2.3.0. Illumina 2500 Rapid was used to improve accuracy of base calls. These reads were aligned to the assembly using BWA-mem. SAMtools was used to generate the BAM format alignment for the Pilon pipeline.
   Browse Genome   Genome browser at MaizeGDB
   Data download   ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/plant/Zea_mays/latest_assembly_versions/GCA_000005005.6_B73_RefGen_v4
   Release date   2016-09-16
   Sequencing method   PacBio Single Molecule Real-Time sequencing
   Finishing strategy   Complete genome, 65X coverage. PBcR-MHAP assembly had the fewest contigs: 3303 contigs. That was the assembly adopted for the B73 RefGen_v4 genome
   Genome coverage   65X
Assembly statistics
   Scaff num   356
   Perc seq scaffold   99
   Perc seq unscaffold   1
   Total scaff length   2,075,000,000 bp
   N50 scaff length   9,730,000 bp
   N50 scaff count   79
   N90 scaff length   595,319 bp
   N90 scaff count   356
   Total contig length   2,104,000,000 bp
   N50 contig length   1,180,000 bp
Total number of scaffolds in assembly.
% assembly in scaffolded contigs.
% assembly in UNscaffolded contigs.
Total sequence length represented by scaffolds.
The length of scaffold which takes the sum length (summing from longest to shortest scaffold) past 50% of the total assembly size.
How many scaffolds are counted in reaching the N50 threshold.
The length of scaffold which takes the sum length (summing from longest to shortest scaffold) past 90% of the total assembly size.
How many scaffolds are counted in reaching the N90 threshold.
Total sequence length represented by contigs.
The length of contig which takes the sum length (summing from longest to shortest contig) past 50% of the total assembly size.
A contig is a contiguous consensus sequence that is derived from a collection of overlapping reads.
A scaffold is set of a ordered and orientated contigs that are linked to one another by mate pairs of sequencing reads.

Annotation

   Annotation Identifier   Zm00001d.1
   Annotation Provider   Doreen Ware, Cold Spring Harbor
   Annotation Date   9-12-2016
   Annotation Software   MAKER-P, Genome Assembly Converter, Mummer, CrossMap, Augustus, FGENESH
   Annotation Description   Gene annotation based on Genbank cDNA, PacBio Iso-seq RNA, MAKER-P gene model annotation using Arabidopsis, rice, sorghum, Setaria, and Brachypodium, Augustus and FGENESH gene model prediction, Genome Assembly Converter, Mummer, and CrossMap with known v3 gene models
   Annotation Download   ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/plant/Zea_mays/latest_assembly_versions/GCF_000005005.2_B73_RefGen_v4
   Annotation Identifier   Zm00001d.2
   Annotation Provider   Doreen Ware, Cold Spring Harbor
   Annotation Date   6-7-2017
   Annotation Software   MAKER-P, Genome Assembly Converter, Mummer, CrossMap, Augustus, FGENESH
   Annotation Description   Gene annotation based on Genbank cDNA, PacBio Iso-seq RNA, MAKER-P gene model annotation using Arabidopsis, rice, sorghum, Setaria, and Brachypodium, Augustus and FGENESH gene model prediction, Genome Assembly Converter, Mummer, and CrossMap with known v3 gene models. Corresponds to Gramene version 36.
   Annotation Download   ftp://ftp.ensemblgenomes.org/pub/plants/release-36/fasta/zea_mays
   Annotation Identifier   NCBI 101
   Annotation Provider   NCBI
   Annotation Date   2017-03-20
   Annotation Software   NCBI Eukaryote Annotation
   Annotation Description   Annotated by the NCBI Eukaryotic Genome Annotation Pipeline.
   Annotation Download   ftp://ftp.ncbi.nlm.nih.gov/genomes/Zea_mays/protein/