Exceptional intra-specific gene order and gene structural variations between maize B73 and Mo17 genome.
Silong Sun, Yingsi Zhou, Jian Chen, Junpeng Shi, Haiming Zhao, Hainan Zhao, Weibin Song, Mei Zhang, Yang Cui, Xiaomei Dong, Han Liu, Xuxu Ma, Yinping Jiao, Xuehong Wei, Joshua C. Stein, Jeff C. Glaubitz, Fei Lu, Guoliang Yu, Chengzhi Liang, Kevin Fengler, Bailin Li, Antoni Rafalski, Patrick S. Schnable, Doreen H. Ware, Edward S. Buckler, Jinsheng Lai
The length of scaffold which takes the sum length (summing from longest to shortest scaffold) past 50% of the total assembly size.
Total sequence length represented by contigs.
The longest contig.
The shortest contig.
The length of contig which takes the sum length (summing from longest to shortest contig) past 50% of the total assembly size.
A contig is a contiguous consensus sequence that is
derived from a collection of overlapping reads.
A scaffold is set of a ordered and orientated contigs
that are linked to one another by mate pairs of sequencing reads.
Doreen Ware Laboratory, Gramene, Cold Spring Harbor
MAKER-P version 3.1 was used to annotate genes in the Mo17 genome, which used a comprehensive strategy by combining results obtained from protein homology-based prediction, RNA-seq-based prediction, and ab initio prediction. We used the same evidence that was used for previous B73 gene annotations, with addition of Mo17-specific RNA-seq datasets. All annotated proteins from Sorghum bicolor,Oryza sativa, Setaria italica, Brachypodium distachyon and Arabidopsis thaliana were downloaded from Gramene.org release 48 and used for protein homology-based prediction. 74,471 assembled transcripts from Mo17 multiple tissues, full-length transcripts from B73 Iso-seq, another set of 69,163 publicly available full- length cDNAs from B73 deposited in Genbank, a total of 1,574,442 Trinity-assembled transcripts from 94 B73 RNA-Seq experiments, and 112,963 transcripts assembled from deep sequencing of a B73 seedling were collected and included as transcript evidence. Augustus and FGENESH were used to ab initio predict gene models in TE-masked Mo17 genomes. 44,747 genes (53,021 transcripts) were identified in the Mo17 genome and referred as to the working gene set. This working set of gene annotations is expected to contain TEs that were not masked prior to annotation or annotations with poor supporting evidence. We further filtered this working set based on AED scores which were produced by MAKER-P software, and confirmed splice sites and transposon screening. Finally, 38,620 high-confidence genes were defined as the filtered gene set.