The Chromosome 7 Annotation Project
Home   GBrowse   Clinical Data   Data Tables   Download   Resources   Links   For Families  

 

About the Project

Overview

 

The objective of this project is to generate the most comprehensive description of human chromosome 7 to facilitate biological discovery, disease gene research and medical genetic applications. In our vision, the DNA sequence of chromosome 7 should be made available in a user-friendly manner having every biological and medically relevant feature annotated along its length. We have established this website and database as one step towards this goal. In addition to being a primary data source we foresee this site serving as a "weighing station" for testing community ideas and information to produce highly curated data to be submitted to other databases such as NCBI, Ensembl, and UCSC. Therefore, any useful data submitted to us will be curated and shown in this database. For data sharing and submissions, please send any enquiry to tcag-chr7@sickkids.ca. A major challenge ahead will be to represent chromosome alterations, variants, and polymorphisms and their related phenotypes (or lack thereof), in an accessible way. This is our first attempt to accomplish this and, as will continue to be the case with the DNA sequence and gene annotation, improvements will occur incrementally. The project will be considered a success when an equal number of molecular biologists, medical geneticists, and physicians utilize the information.

 

 

I. Chromosome 7 Features (March 2003)

 

Chromosome type: Submetacentric

 

Physical size

 

Overall:
Short arm (7p):
Long arm (7q):
Centromere:

157,953,789 bp
58,344,712 bp
96,909,027 bp
2,700,000 bp

 

Two different polymorphic alpha satellite arrays are known with an average size of 2,580 kb for one (D7Z1) and 265 kb for the other (D7Z2), as determined by pulsed-field gel electrophoresis. D7Z2 lies adjacent to 7p and D7Z2 adjacent to 7q. Therefore, to represent the 'average' chromosome we have inserted 2,700,000 bp 

 

 

Genetic size

 

Overall average:
Male average:
Female average:
Average recombination length:

181 cM
133 cM
230 cM
1.15cM/Mb

 

 

Euchromatin

 

G+C content
SINES
LINEs

40.75%
13.50% (89,168 copies)
19.55% (46,111 copies)

 

Segmental duplication:

(>5kb and >90%)

Intra-chromosomal
Trans-chromosomal

8.3 Mb (5.3%)
3.0 Mb (1.9%)

 

Genes

 

Gene structure number
Known genes
Novel genes
Partial genes
Predicted genes
Putative and non-coding
TCR gene segments
TCR pseudogenes
Pseudogenes
Alternative splicing
Gene density
Average size known gene
Intergenic distance
Largest gene
Smallest gene
Largest mRNA
Overlapping genes
% chromosome transcribed
CpG islands

1,917
863
71
40
481
213
81
24
144
55% (474 of 863 known genes)
10.7/Mb
69.9 kb
42.4 kb
2300 kb (CNTNAP2)
174 bp (hY3 cytoplasmic Ro RNA)
14,977 bp (MLL3)
100
72.9 Mb (46.5%)
1,335 (marking 541 of 863 or 63% of known genes)

 

Fragile sites:

 

FRA7A (rare, 7p11.2); FRA7B (7p22), FRA7C (7p14.2), FRA7D (7p13), FRA7E (7q21.2), FRA7F (7q22), FRA7G (7q31.2), FRA7H (7q32.3) are all common fragile sites: FRA7E, FRA7G, FRA7H and FRA7I are characterized at the molecular level.

 

Chromosome heteromorphisms: 

 

Inversion polymorphism detected in some Williams-Beuren patients and carrier parents.  Natural occurring heteromorphism at centromeres observed in normal population. Centromeric heteromorphism found in spurious monosomy 7 in leukemia.

 

Synteny:

 

Six murine chromosomes (5, 6, 9, 11, 12 and 13) grouped into 36 blocks.

 

Imprinted genes and regions:

 

MEG1/GRB10 at 7p12 shows isoform and tissue-specific imprinting in humans. In fetal brain, most isoforms are expressed from the paternal allele. In skeletal muscle, one isoform of GRB10 (gamma1) is expressed from the maternal allele alone, whereas in other fetal tissues, all GRB10 spliced isoforms are expressed from both parental alleles. Mouse Grb10 was found expressed from the maternal allele only. It functions as a growth suppressor via inhibitory interactions with IGF-1, the growth hormone-receptor and insulin.

 

SGCE at 7q21 is a component of the dystrophin-sarcoglycan complex. Imprinting of human SGCE has not been experimentally confirmed but the mouse gene is expressed from the paternal allele only. Moreover, heterozygous loss-of-function mutations in SGCE have been identified in myoclonus-dystonia syndrome (MDS), which demonstrates a marked difference in penetrance in MDS depending on the parental origin of the disease allele.

 

PEG10 at 7q21 located just distal to the SGCE gene contains two long open reading frames having homology to the gag and pol proteins of some vertebrate retrotransposons. Paternal expression of PEG10 has been demonstrated in placental villi.

 

PEG1/MEST at 7q32 is maternally imprinted in many human and mouse fetal tissues. The function of the protein is unknown but it shows sequence similarity with the alpha/beta hydrolase fold family. Two isoforms, distinguished by having unique first exons, have been identified. In lymphoblastoid cells only isoform 1 is imprinted, whereas isoform 2 is biallelically expressed. Mest-deficient mice show growth retardation and abnormal maternal behavior.

 

COPG2 at 7q32 encodes coatomer protein complex subunit gamma 2. It is located adjacent to PEG1/MEST and the two genes have overlapping 3’-UTRs. Initially, COPG2 was determined to be maternally imprinted. Subsequent studies could not reproduce this result but instead indicated the observation was due overlap with the PEG1/MEST gene.

 

CIT1 at 7q32 is an antisense transcript located within intron 20 of COPG2. It is maternally imprinted in all fetal tissues examined. DNA marker, Mit1/Lb9, located within intron 20 of mouse Copg2, is also maternally imprinted. The mRNAs for CIT1 and Mit/Lb9 are only partially known and so far, they appear to be non-coding RNAs. The CPA4 gene has also recently shown to be imprinted in some tissues.

 

 

II. Cytogenetic characteristics:

 

Disease related:

 

-7q11.23 microdeletion in Williams-Beuren Syndrome

-maternal uniparental disomy (UPD) of chromosome 7 observed in Russell-Silver syndrome (paternal UPD has also been detected but no defined phenotype is associated)

-various deletions of 7p associated with craniosynostosis

 

Cancer related:

 

-t(7;12) involving the ETV6 gene at 12p13 in myeloid disorders of children and other hematologic malignancies

-t(7;11)(p15;p15) involving fusion of HOXA9 and NUP98 in human myeloid leukemia

-t(2;7)(p12;q21) involving Ig kappa sequence with CDK6 in splenic marginal zone lymphoma

-t(7;17)(p15;q21) involving JAZF1 and JJAZ1 in endometrial stromal tumors

-t(3;7)(q27;p12) involving BCL6 and Ikaros in diffuse large B-cell lymphoma

Cytogenetic evidence suggesting a disease causing cancer gene resides on chromosome 7 (gene not yet identified)

 

Disease

Chromosome 7 abnormality

Acute lymphoblastic leukemia

Deletion 7q11

Acute lymphoblastic leukemia

Deletion 7q22

Acute myelogenous leukemia

Deletion 7q33-q35

Acute myeloid leukemia

Deletion 7q22

Acute non-lymphoblastic leukemia

Monosomy 7

B-cell chronic lymphoproliferative disorder

Monosomy 7

Bladder carcinoma

Trisomy 7

Brain tumors

Trisomy 7

Breast carcinoma

Deletion 7q11

Breast carcinoma

Deletion 7q31-q32

Chronic myeloid leukemia

Deletion 7q11

Colorectal carcinoma

Trisomy 7

Gastric MALT lymphoma

Deletion of 7p

Head and neck squamous cell carcinoma

Deletion 7q

Intestinal polyps

Trisomy 7

Lung adenocarcinoma

Deletion 7q22

Malignant andrological neoplasias

Deletion 7q31.1-q32

Malignant melanoma

Deletion 7q

Mesothelioma

Monosomy 7

Myelodysplastic syndrome

Monosomy 7

Myelodysplastic syndrome

Deletion 7q22

Myeloid disorders

Deletion 7q31-q32

Neurofibromatosis

Monosomy 7

Non-Hodgkin’s lymphoma

Monosomy 7

Non-Hodgkin’s lymphoma

Deletion 7q22

Ovarian carcinoma

Deletion 7q31-q32

Papillary renal carcinoma

Trisomy 7

Primary lung carcinoma

Trisomy 7

Primary prostate carcinoma

Trisomy 7

Primary prostate carcinoma

Deletion 7q

Skin carcinoma in situ

Deletion 7q31-q32

Small lymphocytic lymphoma

Deletion 7q32

Thyroid carcinoma

Deletion 7q22

Thyroid tumors

Trisomy 7

Uterine leiomyoma

Deletion 7q22-q31.1

 

 

III. General History and Discussion:

 

Human Gene Mapping: Chromosome 7 has been one of the more extensively studied chromosomes because of the many interesting genes and diseases that map to it. In the 1980’s the T-cell receptor gene families (TCRB and TCRG), erythropoietin (EPO), the multi-drug resistance (PGY1 and PGY3) genes and the homeobox A (HOXA) gene family were localized to chromosome 7 placing it in the spotlight of genetics research. In 1985, the cystic fibrosis locus was mapped to chromosome 7 by Lap-Chee Tsui and colleagues in Toronto, setting off a race to find the causative gene (CFTR), which was eventually identified in 1989 by the same group. As a result of this intensive search, many genetic and physical mapping resources were generated for studies of other disease genes on this chromosome. Moreover, the positional cloning strategies developed in the search for the cystic fibrosis gene provided the prototypic example for moving from a linked genetic marker to a disease gene.

 

Key figures in the subsequent study of chromosome 7 and the human genome project emerged from the cystic fibrosis gene hunt. Helen Donis-Keller, then at Collaborative Research, shared reagents with Tsui in the mapping of the cystic fibrosis gene to chromosome 7. Donis-Keller, members of Collaborative Research, and others later produced the first genetic map of the human genome using restriction fragment length polymorphisms (RFLPs). On a side note, the CEO of Collaborative Research, Orrie Friedman, shocked the scientific community by stating their company “owned chromosome 7”. Some consider this statement a foreshadow of the events around the human genome sequencing race that would occur between the public and private sectors a decade later. Francis Collins, who would lead the U.S. National Institutes of Health (NIH) genome initiative, collaborated with Tsui in the cloning of the cystic fibrosis gene. The laboratories of Tsui, Donis-Keller, and Karl-Heinz Grzeschik (Marburg, Germany) continued to generate chromosome 7-specific mapping information and they were also actively involved in the curation of genetic information deposited into the Genome Database. International workshops on chromosome 7 were held in Marburg and Toronto in 1993 and 1994, respectively, and reports summarizing the gene mapping efforts were published.

 

Genetic and Physical Maps: With the advent of the human genome initiative in the early 1990s, many of the initial NIH pilot study grants also focussed efforts on studying chromosome 7 at a higher resolution. A group led by Donis-Keller, Maynard Olson and David Schlessinger at Washington University in St. Louis set out to make chromosome-specific genetic and physical maps based on the sequence-tagged site (STS) concept. Donis-Keller and Olson were working on chromosome 7 and Schlessinger on the X chromosome. In 1994, a member of this group, Eric Green, moved the chromosome 7 project to NIH operating within Francis Collins’ newly formed intramural National Human Genome Research Institute (NHGRI) in Bethesda. His group constructed important STS-based physical maps of chromosome 7 (and are currently performing large-scale multi-species comparative DNA sequencing projects of targeted regions of chromosome 7). Also Leroy Hood’s laboratory (with Lee Rowen and Ben Koop) sequenced 685 kb encompassing the entire TCRB locus representing the first large-scale human DNA sequencing project. In the late 1980s and early 1990s in Canada, Tsui and his then student, Stephen Scherer, constructed the first chromosome-7 specific yeast artificial chromosome (YAC) library. This clone resource was used to generate a number of high-resolution physical, cytogenetic, and gene maps providing foundation for numerous disease gene discoveries, as well as structure and function analysis of chromosome 7. In the mid-1990’s, many other valuable resources and reagents were also generated by groups constructing whole genome genetic, physical, radiation hybrid, and gene maps including those initiatives at Genethon in France, Stanford University, and the MIT Whitehead Institute. As a result, there are now over 6,500 DNA markers and 12,000 clones for chromosome 7 described in the databases.

 

DNA Sequence Maps: In the late 1990’s, once again chromosome 7 took center stage when it was marked down to be one of the first large human chromosomes to be sequenced. The bulk of the work completed by publicly-funded sources would be completed at Washington University in St. Louis by Robert Waterston using primarily bacterial artificial chromosome (BAC) clone maps assembled by John McPherson, Marco Marra and co-workers using a 'fingerprinting' and 'oligonucleotide-hybridization' strategy. Maynard Olson and Leroy Hood’s laboratories (both now in Seattle) also sequenced defined regions of chromosome 7. Moreover, other sequencing projects in disease gene regions were completed by Ben Koop (Victoria, Canada) and Andre Rosenthal (Jena, Germany), using clone maps generated by Scherer and Tsui. The combination of all of the genetic and physical mapping, gene, and DNA sequence information (primarily WashU) from the genome centers were combined with data in public repositories to form the basis of the draft sequence and annotation of human chromosome 7 published in February 2001 (as part of the Nature human genome draft paper). Concurrently, Celera Genomics published in Science their draft version of the human genome sequence (and annotation) using the whole-genome shotgun sequencing and assembly strategy. In the project described here and deposited in the Genome Browser at this website, a comprehensive DNA sequence assembly covering the vast majority of chromosome 7 was compiled using 85% Celera whole-genome scaffolds combined with data from public databases. The annotation includes information from 90 collaborators worldwide (unpublished and published data), combined with data from other databases.

 

Annotation of Chromosome 7 DNA Sequence: With the majority of the DNA sequence of chromosome 7 finished, efforts will now include properly cataloguing all of the genes and their function, identifying the genes and DNA sequence variations that either directly cause or are associated with disease, and defining all of the structural and functional features of the chromosome. Chromosome 7 contains many important genes for development and maintenance including Sonic hedgehog, the homeobox genes PAX4, HLXB9, GBX1, the HOXA cluster, DLX5 and DLX6, a cluster of cytochromome P450 oxidases, and the leptin gene, to name just a few.

 

There are over 360 disease-associated genes or loci on chromosome 7. The first description of the phenomena of uniparental disomy (UPD) in humans was observed in a female cystic fibrosis patient identified to inherit two chromosome 7s both of maternal origin (with no paternal contribution).  Many individuals with maternal UPD have a growth deficiency disorder called Russell-Silver syndrome. Paternal uniparental disomy of chromosome 7 has been observed in a small number of individuals, but no consistent disease condition is associated with it. Recently, FOXP2 was the first gene found by Anthony Monaco's group to be causative in speech and language disorder. This discovery captured public attention and the description of the pursuit of the disease gene was featured in Matt Ridley’s novel Genome: The Autobiography of a Species in 23 Chapters. Media attention focussed on the fact that the identification of FOXP2 provided yet another example of how some human behavioral characteristics can be reduced to gene function, or lack thereof. In fact, one of the first examples of a genetic lesion underlying a behavioral condition was the finding that a 1.5 million base pair deletion (and in some cases inversion) of 7q11.23 in Williams-Beuren syndrome leaves affected individuals with a specific cognitive profile with deficits in visuo-spatial reconstruction. There are also putative disease loci for other neuropsychiatric diseases on chromosome 7 including autism, alcoholism, bipolar affective disorder, panic disorder and schizophrenia. The mapping of the first autism locus to 7q (by Monaco and his group) and also a gene for asthma to 7p (Juha Kere, Finland), promise that both arms of the chromosome will continue to be studied intensely in the years to come.

 

Monosomy 7 is one of the most frequent chromosomal abnormalities observed in myelodysplasia and acute myelogenous leukemia (AML). The incidence of monosomy 7 is particularly high in those individuals who have been previously exposed to drugs, radiotherapy, or toxins. Monosomy 7 is also observed in constitutional disorders such as Fanconi's anaemia, congenital neutropenia and familial monosomy 7.  These congenital bone marrow disorders predispose to leukaemia usually through a myelodysplastic phase. Other cytogenetic abnormalities of chromosome 7 are found in many different types of human neoplasia, some of which present consistent patterns of genetic alteration (see above).  It has been postulated that the long arm of chromosome 7 may contain multiple tumor suppressor and oncogenes, and even an anti-senescence gene.  So far, the ST7 tumor suppressor gene in colon cancer, the MET protooncogene in hereditary papillary renal cell carcinoma, the CDK6 gene in splenic lymphoma, and a few other fusion proteins have been identified. Currently, intensive searches are ongoing for a putative tumor suppressor gene at 7q22 and another at 7q34-q35 involved in AML. All of these studies will benefit from the vast wealth of resources, data, and literature available for human chromosome 7.