Maize Genetics Nomenclature

A Standard For Maize Genetics Nomenclature

From MNL 69:182-184 (1995), as updated Sep 1996; Apr 2000; Apr 2002; Oct 2006.

Gene model identifiers updated in 2015 and 2021.

Index:

Preamble
1. Definitions
2. Anonymous Transcripts — Sep 1996 update
3. Standard Nomenclature and Symbols
4. Loci with the Same Gene Name
5. Allelic Designations — Sep 1996 update

5.1. Alleles of Independent Mutational Origin
5.2. Designation of Nonmutant Alleles
5.3. RFLPs and RAPDs as Alleles

6. Naming Deletions
7. Mutations Resulting From Transposable Element Insertions
8. Naming RFLPs and RAPDS
9. Chromosome Rearrangements
10. Organellar Genes
11. Transcription Factors (Oct 2006 addition)
12. Gene Model Identifiers (2015 addition)
Whole-Genome Assembly and Annotation Nomenclature Document (PDF) (2021)
Updates
Clearing House
Nomenclature Committee
Appendix: RFLP Acronyms in Use

PREAMBLE: We wish to have a system that is consistent, compatible with the historical background of maize genetics (insofar as these two goals can be reconciled), is easily understood by plant geneticists working with other species, and forms the basis for the importation of maize data into a general plant genetics data base so that the basic knowledge concerning maize genes is available to researchers with other species and vice versa. We believe that this goal is best implemented by the researchers in each species having their own working vocabulary, while the identification of genes that catalyze the same functions in all species should rely on entry into a relational data base of the genes' function as an E.C. number (2.4.1.13), trivial name (sucrose synthase), and systematic name (UDPglucose:D-fructose 2-glucosyltransferase). The situation can be less completely categorized for genes whose products are transcription factors, structural proteins, storage proteins, etc.

If one accepts the premise outlined above that the common ground between species need not reside in the working vocabulary of geneticists using any species as a model system but in the manner in which their data are expressed in the data base, then the previously adopted names for maize genes can be retained. It will not be necessary to rename the genes previously named on the basis of the mutant phenotype produced as soon as the function of the nonmutant alleles becomes known, but we should proceed to define more precisely words or terms whose meanings need clarification and to decide how we wish to deal with the new information becoming available.

1. DEFINITIONS: The words "locus" and "gene" should not be treated as synonymous. A locus can be defined as "a chromosomal site of variable size at or within which is located a gene, a restriction site, a knob, a breakpoint, an insertion, or other distinguishable feature". This necessitates specifying whether we mean a gene locus or an RFLP locus, etc. We can then define a plant gene as "a DNA sequence of which a segment is regularly or conditionally transcribed at some time in either or both generations of the plant. The DNA is understood to include not only the exons and introns of the structural gene but the cis 5' and 3' regions in which a sequence change can affect gene expression". This treats the gene as a functionally defined entity that is not circumscribed by the transcribed region or other fixed limits.

2. ANONYMOUS TRANSCRIPTS: For most of the history of genetics, the existence of a gene was recognized when a mutation occurred, and the gene was then named by a word/term that was descriptive of the mutant phenotype. That will continue to be the practice except with isozyme markers, for which the designation will be the enzyme in question, or the instances in which the biochemical lesion responsible for the mutant phenotype is identified before the locus is reported. The loci of these genes have then been placed on chromosome maps in relation to other mapped loci. However, we now have the possibility of recognizing genes in which no mutation has been detected through the construction of cDNA libraries. These anonymous cDNAs are often used as probes in RFLP mapping. When such a probe hybridizes to a single band, it is clear that the RFLP loci circumscribe the transcriptional unit that encodes the message represented by the cDNA, and these RFLP loci with other RFLP loci can be used as the basis for mapping the gene. Mapping a locus in this fashion is encouraged as a means of obtaining maximum coverage of the genome. As long as the locus retains an anonymous status (unknown function or no mutant phenotype), the symbol for the locus should be assigned according to the convention used for RFLP loci (as umc148, see Section 8). Further information about the probe and its derivation is best provided in tabular or data base form rather than in the symbol itself.

A gene name identifying function for a locus detected with a cloned sequence should be given only when there is unambiguous evidence that this is the site by which that function is encoded. Particular caution should be taken in identifying genes (and their function) from several RFLPs hybridizing to a gene-specific probe from another organism. Until a sequence has been shown to encode the function in question, the gene designation should be that of an RFLP locus (see Section 8).

The decision was made to not utilize the parenthetic 'gfu' designation for "gene, function unknown". RATIONALE: in common usage, the 'gfu' suffix has proven confusing, implying 'known function', especially to researchers from other species. The confusion arises from the practice in RFLP naming to include parenthetic acronyms where sites are detected by probes with an assigned or putative identity with a particular gene product.

3. STANDARD NOMENCLATURE AND SYMBOLS: The names and symbols that have been used for maize genes should be retained. The name and symbol of a gene locus should be represented with lower-case, italic characters (defective kernel12, dek12). Note that no hyphen separates the gene name from a numerical suffix, which is a change from previous usage. We use a hyphen in the case of mutant alleles to separate the allele designation from a suffix specifying the particular allele (see Section 5). We advocate strongly that all genes identified in the future be given a three letter symbol. Newly detected maize genes that have been previously identified in other plant species should be named where appropriate (see the last paragraph in Section 2) with reference to the list of generic names compiled by the Commission on Plant Gene Nomenclature.

When designating homozygous genotypes with two or more unlinked genes, the genes are separated by semicolons, e.g. a1;a2;c1;c2;r. If linked, the genes are separated by spaces, e.g.C1 sh1 bz1 Wx1. Heterozygous genotypes should be written with a slash separating the sets of linked genes, e.g. C1 Bz1/c1 bz1. If the genes are unlinked, the proper designation is Sh2/sh2; Bt2/bt2.

4. LOCI WITH THE SAME GENE NAME: Where we have more than one nonallelic mutant with the same gene name, the earlier recommendation was that the first one to receive that name should not have a numerical suffix but the second has 2 as a suffix. Thus we have shrunken (sh), shrunken2 (sh2), and shrunken4 (sh4) mutants. Geneticists outside the maize community are apt to misinterpret this convention. We recommend that we be consistent and write shrunken1 or sh1 and advocate that even if a new locus is identified and given a unique name, it be designated as 1. This has the definite advantage in maintaining data bases and indices that no retrospective correction would be necessary if a second gene locus receives the same designation.

5. ALLELIC DESIGNATIONS: Where a mutant allele is recessive, it should be designated by an italicized symbol (lower case) as dek12, which is the same as the symbol of the locus. Since it is unlikely that any two mutant or nonmutant alleles in a highly polymorphic species such as maize have identical sequences, maize geneticists are encouraged to specify the particular allele with which they are working (see in this Section, Alleles of Independent Mutational Origin and Designation of Nonmutant Alleles). The symbol for dominant, nonmutant (i.e., conditioning a normal phenotype) alleles will be the same italicized three letter symbol as the mutant alleles but with the first letter capitalized (Dek12). The symbol of the gene product should not be italicized and should be written with all letters capitalized (e.g., ADH1). The name of the gene product (alcohol dehydrogenase) should neither be capitalized nor italicized.

When the mutant alleles of a gene are dominant, the first letter of the mutant symbol is capitalized. The nonmutant symbol has all the letters lower case. For example, the corn grass1 (cg1) gene locus has several dominant mutant (Cg1) alleles as well as nonmutant (cg1) alleles. The reference mutant allele is designated as Cg1-R or -1.

Codominant alleles such as isozymes where the variants are functional and distinguished from each other by electrophoretic mobility, should be designated by symbols with the first letter capitalized and identified by allelic specifications as Pgm2-5 or Pgm2-7.

The decision was made to use '-', rather than '+', in designations of non-mutant alleles. RATIONALE: use of '+' has met with resistance by journal editors; definition of non-mutant alleles can be a grey area.

5.1. ALLELES OF INDEPENDENT MUTATIONAL ORIGIN: The unambiguous designation of mutant alleles that have arisen as independent mutational events is increasingly important. It is generally understood that a gene symbol followed by a hyphen plus a letter or number(s) specifies a particular recessive allele at that gene locus. We have referred to the mutation by which the gene was identified as the reference allele; e.g. bz1-Ref or bz1-R. It is equally appropriate to refer to that allele as bz1-1. The mutations in any gene that were identified subsequently have been categorized in various idiosyncratic ways. Alleles that have arisen by independent mutational events have been designated by letters, numbers, a letter plus numbers, the name of the inbred in which the mutation occurred, and sometimes all of these applied to a group of alleles at a gene locus. While all of these designations served the purpose of indicating that these alleles had independent mutational origins, there is a clear advantage to greater standardization. As in the 1973 Nomenclature Standard, it is recommended that new alleles be identified by a laboratory number that might indicate the year of isolation as sh2-6801. This has the definite advantage that two laboratories are unlikely to designate two new mutations of the same gene by the same number. However, if two laboratories are targeting the same locus in mutagenesis experiments, they should consult before naming their new alleles to avoid giving the same designation to different alleles. Also recommended is the convention of referring to a new mutation of a given phenotype by a provisional designation as bt*-lab number until it is ascertained whether the mutant is a new allele of a known gene or identifies a previously unidentified gene. In the first instance, the proper gene symbol (bt1 or sh2) replaces bt*, but the lab number is retained (e.g., bt1-8711). In the second instance (a previously unidentified locus), a new gene name and symbol would be selected, and this mutant would become the reference allele (-R or -1).

When mutant alleles are referred to in the generic sense without specification of their origin, a hyphen without further designation (e.g., bz1-, dek12-) is desirable to make it clear that one is referring to an allele or alleles, not the gene locus.

5.2. DESIGNATION OF NONMUTANT ALLELES: Since it is now apparent that in a species as polymorphic as maize, nonmutant alleles from different sources are apt to have a number of sequence differences one from the other, and these differences can be reflected in gene action (nonmutant isoalleles), it is desirable to specify the nonmutant allele being investigated or used as a control. Incorporating the name of the inbred as part of the allelic designation, Bz1-W22, is an appropriate method of doing this. However, mutant alleles should not be designated by the inbred in which they arose (e.g., bz1-W22) to avoid confusion with the progenitor allele. Also, there may eventually be numerous mutant alleles of a particular gene isolated in that inbred if a researcher uses that inbred in a mutagenesis experiment. A particular nonmutant allele may be found in an exotic race or other accession that is not an inbred. A unique designator (e.g., a PI number or Bolivia #) should be part of the allelic designation.

5.3. RFLPs AND RAPDs AS ALLELES: The presence or absence of a restriction site or a primer-amplifiable sequence at a particular locus represent Mendelian alternatives. They fall under the broadest definition of an allele, and it is appropriate to refer to these alternatives as alleles as has already been done in some reports.

6. NAMING DELETIONS: When it is clear that a mutation results from a deletion that has removed all or part of two gene loci, it would be appropriate to indicate this in the following manner. For an1-6923, this would be def(an1..bz2)-6923, and for sh-bz-X2, def(bz1..sh1)-X2. When molecular evidence indicates that a deletion has removed all of the structural portion of a gene as is true of wx1-C34, it should be indicated in the same manner; i.e., def(wx1)-C34.

7. MUTATIONS RESULTING FROM TRANSPOSABLE ELEMENT INSERTIONS: There is one further point concerning allelic specification. Maize in particular has many mutable alleles resulting from the insertion of a transposable element. These have been designated by the mutant symbol, a hyphen, a lower case "m", and an isolation number; e.g., wx-m1. When the transposable element insertion [Ac, Ds, Spm(En), dSpm(I), Mu1..MuX, etc.] is known, it is suggested that this be indicated by a double colon following the allele as wx-m1::Ds1. Since a maize stock may have more than one transposable element family active at the same time, firm genetic and/or molecular evidence is necessary to ascribe mutability to a particular transposable element family. Further, mutable alleles generate both stable nonmutant and stable mutant alleles when the transposable element excises from the gene locus. Since the mutant derivatives are certain to differ in sequence from the nonmutant progenitor allele around the site of the transposable element insertion and the nonmutant derivatives are very likely to differ at that site, researchers should be certain to indicate the origin of such alleles in their reports. One means of doing this is to indicate such an origin by an apostrophe following the locus symbol as Bz1'-7801 or bz1'-8905. The specifics of its origin including the transposable element involved could then be included in the text and entered in the Maize Genome Data Base. Since transpositions of a transposable element from a site within a gene often insert in locations where they have no phenotypic effect but can be useful markers, it is desirable to have a standard to refer to such insertions. Designate them as RFLP's would be designated (see Section 8), but follow the institutional symbol and number with a double colon and the symbol of the transposable element (e.g., dnap2094::Ac).

8. NAMING RFLPs AND RAPDS: In naming RFLPs and RAPDs, use a lower case three or four letter code designating the originating university or company followed by a laboratory number (no space between the code and the number). When the probe used is a cDNA or a subclone of a gene, the gene symbol should be added in parentheses after the RFLP locus designation, as umc000(a1). Since a probe not infrequently recognizes RFLPs on two or more chromosomes, these should be designated by the same institutional code, number, and probe followed immediately by A, or B, or C. In so far as possible, the locus with the strongest hybridization should be designated A and the more weakly hybridizing loci be designated B, C etc. in descending order of signal strength.

9. CHROMOSOME REARRANGEMENTS: The conventions for dealing with chromosomal rearrangements are well established and adequate for the purpose. To designate particular reciprocal translocations as T1-2a or T1-9(4995) etc. with the breakpoints noted parenthetically or in a table of supporting information is explicit and sufficient. Additional information (the fact that the translocation stock is homozygous for wx1) can be incorporated by prefacing the translocation number with the gene symbol as the Co-op does in its stock lists (e.g., wx1 T1-9c). Translocations with B chromosomes have designations that indicate the arm of the A chromosome involved (L or S) as well as a lower case letter distinguishing that translocation from any others involving that particular chromosome arm, as TB-5Sc. The cytological breakpoint in the A chromosome as well as the loci uncovered when the TB translocation is used as a male parent can be noted in the text or in a table of supplementary information. The designations for inversions (e.g., Inv9b again with the breakpoints, 9S.05-L.87, listed in a supporting table) are succinct and convey the necessary information.

10. ORGANELLAR GENES: For chloroplast and mitochondrial genes, we accept for the present the proposals already in place. For chloroplast genes, this is Hallick and Bottomley, 1983. Plant Mol. Biol. Rep. 1(4): 38-43, as updated at SwissProt or by the Chloroplast working group for the Commission on Plant Gene Nomenclature. For mitochondrial genes, this is Lonsdale and Leaver, 1988. Ibid. 6(2):14-21, updated by the Mitochondrion working group for the Commission on Plant Gene Nomenclature. For brevity's sake, these are not summarized here.

11. TRANSCRIPTION FACTORS: (Oct 2006 addition) We define here TFs as proteins that contain a DNA-binding domain and that fall within one of the families described in http://arabidopsis.med.ohio-state.edu/AtTFDB/.

There is currently no coherent effort in maize for a rational and organized naming of transcription factors (TFs). The use of GenBank accession numbers, EST names or locus identifiers provides an impractical mechanism, which often leads to ambiguities, for example because of multiple entries in GenBank or of several ESTs for the same protein. Thus, we propose here to create a uniform nomenclature for maize TFs, following the lead from Arabidopsis. A similar proposal is being adopted by the TIGR rice annotation group and by the SUCEST-FUN sugarcane annotation group.

Recommendation
Gene products - Each transcription factor will have an organism identifier (Zm) to be used only in the context of other organisms, followed by letters that represent the TF family (e.g., MYB, bHLH, HD, bZIP) and by a number that will start with '1'. A similar strategy is currently being applied to other maize gene families (e.g., the kinesins, see 276102). Since we realize that many TFs are known by their genetic names, this nomenclature will permit the use of synonyms. For example, KNOTTED could be named HD1(KN) (or ZmHD1(KN) when being compared to HDs of other species) and C1 would be MYB1(C1) (or ZmMYB1(C1)). In addition, whenever possible, we will try to have the numbers provide a historic perspective of which TFs have been first identified. In that regard, since KN and C1 correspond to the founding members of their respective families in maize, they are assigned the number '1'. Prior genetic nomenclature will be incorporated in the database.

Genes - Existing names for genes encoding TFs will not be altered. If necessary, and only as a way to provide coherence with the naming of the gene products, the synonym strategy described above would be used. In that regard, c1 would continue to be c1 but could also be cross-referenced as c1(myb1). New genes will be named according to their products. If mutant phenotypes are identified at a later date, gene names derived from mutant phenotypes will be added as synonyms, but the original name will not be changed. As indicated for the gene products, the use of the prefix Zm in front of the gene's name will only be used when comparing maize genes with related genes from other species (e.g., Zm myb1).

Note that for generating a position for transcription factors, Erich Grotewold served on the Nomenclature Committee in an ad hoc capacity.

12. GENE MODEL IDENTIFIERS: MaizeGDB, the Maize Genetics COOP Stock Center, Gramene, and the Maize Nomenclature Committee recognize the need to formulate a method for naming assemblies and structural annotations (gene models) across the subspecies such that the nomenclature would do the following:

Assembly names will reflect both accession (e.g., inbred) specific information and project-specific information that allows linking to available germplasm and associated metadata.
All identifiers should meet the following criteria: be concise, human readable, bioinformatics friendly, and scalable to millions of unique assemblies and versions.
Gene model identifiers do not contain any biological information including accession name, chromosome location, or chromosomal order. Some annotation pipelines (e.g. Maker-P) may sequentially order gene models along a reference sequence (e.g. Zm00004a019013, Zm00004a019014, Zm00004a019015, Zm00004a019016) but order should not be assumed. Current order and orientation of gene models within BACs that make up the pseudomolecule may not represent their correct order and orientation on the chromosome.
Allow the unique genetic variation among maize lines to be accounted for. Order and orientation (indeed presence/absence and copy number) are not conserved among lines [Wang Q and Dooner H 2006 PNAS 103:17644-17649; Springer NM et al 2009 PLoS Genet 5:e1000734]. Nomenclature of genes based on the order in B73 would likely be in conflict among lines, and could unnecessarily imply or confound the order of genes in other lines. Therefore each assembly should be named and annotated independently of B73.

Assembly names will consist of 4 parts: the species identifier, a specific cultivar descriptor, the assembly quality, a project-specific identifier, and version number (e.g. "Zm-B73-REFERENCE-GRAMENE-4.0" for "Zea mays B73 cultivar of reference quality from the Gramene project; version 4.0").

Assembly version codes create a short unique identifier for assembly versions. It consists of 2 parts: the assembly code and an alphabetic version code (e.g. 00001d - Zm-B73-REFERENCE-GRAMENE-4.0).

Gene models will consist of 3 parts: the species ID, the assembly version code, and a random six digit number (e.g. Zm00001d459384; Zea Mays, Zm-B73-REFERENCE-GRAMENE-4.0, gene model 459384).

The new nomenclature will be applied to B73 RefGen_v4 Zm-B73-REFERENCE-GRAMENE-4.0) and all assemblies released after June 2015. For B73, previous identifiers (e.g. GRMZM and ZEAMMB73) are retained as associated gene models and can be searched.

To download a full description of the assembly/gene model nomenclature click here.

CLEARING HOUSE FOR NOMENCLATURE: We also believe that it is desirable to initiate a clearing house for maize nomenclature so that a researcher wishing to name a recently identified gene can ascertain almost immediately that no one has used the proposed designation and symbol. This clearing house can, in principle, function through the MaizeGDB website, which will be refereed by a cooperator. The same facility could be used to insure that allelic designations are not duplicated or to answer questions concerning nomenclature.

Submitted Sep 10, 1996 by the Nomenclature Subcommittee.

1996 UPDATES:

ANONYMOUS TRANSCRIPTS: decision made not to utilize the parenthetic 'gfu' designation for "gene, function unknown". RATIONALE: in common usage, the 'gfu' suffix has proven confusing, implying 'known function', especially to researchers from other species. The confusion arises from the practice in RFLP naming to include parenthetic acronyms where sites are detected by probes with an assigned or putative identity with a particular gene product.
ALLELIC DESIGNATIONS: decision made to use '-', rather than '+', in designations of non-mutant alleles. RATIONALE: use of '+' has met with resistance by journal editors; definition of non-mutant alleles can be a grey area.

NOMENCLATURE COMMITTEE:
Carlos Arbizu
Ethy Cannon
Elli Cryan
Elizabeth Kellogg
Samuel Leiboff
Caroline Marcon
Josh Strable
Rachel Wang

Past members:
William Beavis
Mary Berlyn
Ed Buckler
Benjamin Burr
Vicki Chandler
Ed Coe
Hugo Dooner (past chair)
Charles (Chunguang) Du
Christiane Fauron
Curt Hannah
Lisa Harper
Carolyn Lawrence Dill
Mark Mikel
Oliver Nelson (past chair)
Mary Schaeffer (Polacco)
Steve Rodermel
Marty Sachs (past chair)
Mike Scanlon
Phil Stinard
Carolyn Wetze

APPENDIX:Probe ACRONYMS IN USE

May 2000 Updated:

       agr    Agrigenetics                                       
       asg    Asgrow Seed                                        
       ast    Academica Sinica, Taiwan                           
       bcd    barley cDNA, Cornell University                    
       bnl    Brookhaven National Laboratory 
       bnlg   Brookhaven National Laboratory, SSR probes                    
       cdo    oat leaf cDNA, Cornell University                  
       crc    Carlsberg Research Center                          
       csh    Cold Spring Harbor                                 
       csic   Centro de Investigacion y Desarrollo, Barcelona
       csu	California State University, Hayward    
       cuny   City University of New York                        
       dnap   DNA Plant Technologie Corp                         
       dup    Dupont 
       fco    Colorado State U. Fort Collins
       fmi    Friedrich Miescher-Institut                                            
       gii    Genetics Institute Inc.                            
       ias    Iowa State University
       iger   Institute of Grassland and Environmental Research
       inra   Institut National de al Recherche Agronomique
       isc    Ist Sper Cereal
       isu    Iowa State University
       klp    Universitat Hohenheim, Stuttgart                                    
       koln   University of Koln 
       ksu    Kansas State University
       lim    Limagrain
       mmc    Maize Microsatellite Consortium (UK) 
       mmp    Missouri Maize Project                               
       mpik   Max-Planck-Institute, Koln 
       mps    Mycogen Plant Sciences
       nc     North Carolina                        
       ncr    North Carolina Raleigh                             
       ncsu   North Carolina State University                    
       niu    Northern Illinois University                       
       npi    Native Plants Incorporated
       op     Operon Technologies
       osu    Ohio State University
       pbs    Purdue Biological Sciences                         
       pge    Plant Gene Expression Center 
       pgs    Plant Genetic Systems
       phi    Pioneer Hi-Bred International (SSR)                      
       php    Pioneer Hi-Bred International 
       pic    Plant Industry Canberra                     
       psu    Penn State University                              
       rg     rice genomic, Cornell University 
       rgp    Rice Genome Program, Japan                  
       rny    Rockefeller University                             
       rpa    Rhone Poulenc                                      
       rz     rice cDNA, Cornell University
       sb     Sorghum biocolor 
       scri   Scottish Crop Research Insitute                     
       std    Stanford University
       tda    Tripsacum dactyloides
       tjp    University of Tokyo, Japan
       ttu    Texas Tech University
       tum    Technische Universitat Munchen
       uat    University of Arizona - Tucson                               
       uaz    University of Arizona                              
       ucb    University of California - Berkley
       ucd    Univeristy of Califormia - Davis                   
       ucla   University of California - Los Angeles               
       ucr    University of California - Riverside                 
       ucsd   University of California - San Diego                
       ufg    University of Florida - Gainesville                  
       uiu    University of Illinois - Urbana 
       ukd    University of Copenhagen
       uky    University of Kentucky                     
       umc    University of Missouri - Columbia                    
       umn    University of Minnesota
       umsl   University of Missouri - St. Louis
       uob    University of Barcelona
       uom    Univeristy of Manitoba
       uor    University of Oregon                            
       uox    University of Oxford
       usu    Utah State University                               
       uwo    University of Western Ontario
       uzh    University of Zurich                      
       wsu    Washington State University                        
       wusl   Washington University, St. Louis                   
       ynh    Yale University

Return to the homepage

Welcome to MaizeGDB!

Project

Outreach

Helpful Links

Maize genetics community

Maize Genetics Cooperation - MGC

Articles

Data

Resources

Maize Genetics Meeting

Archive

Featured tools at MaizeGDB

New tools at MaizeGDB

Other tools at MaizeGDB

A-I

L-Z

Index: