Track Descriptions & Citations

Tracks Built into this Server

No additional information available.

Cytogenetic Bands
Chromosome 7 cytogenetic bands as determined by FISH mapping experiments.

Genetic Markers
A list of 521 genetic markers were manually curated from numerous databases and publications, and include
genetic mapping information from DeCODE for 240 markers.

FISH Clones
BAC, PAC and cosmid clones mapped to chromosome 7 by Fluorescent In Situ Hybridization (FISH). All
Chromosome 7 genomic clones (cosmids, BACs, YACs) listed in GBrowser and in other data tables are
freely distributed. Visit The Centre for Applied Genomics (TCAG) for more details.

Breakpoints (Trans/Inv)
Manually curated data set of chromosomal rearrangement breakpoints for translocations and inversions.
To search the entire data set by phenotype or Pubmed ID, go to

Breakpoints (Del/Others)
Manually curated data set of chromosomal rearrangement breakpoints for deletions and other cytogenetic
rearrangements (including duplications, insertions, UPD, LOH and Ring chromosomes). To search the entire
data set by phenotype or Pubmed ID, go to

TCAG Assembly
Comprehensive DNA sequence assembly of human chromosome 7 encompassing 158,329,839 nucleotides. The
assembly was generated utilizing all available sequence and mapping data including Celera whole genome shotgun
sequences, sequence data from Genbank and targeted DNA sequence data generated at The Centre for Applied Genomics.

Clone Coverage
This track contains mapping information for all chromosome 7 fully sequenced clones from Genbank.
Please note that clones in HTGS phase of sequencing were not included. Also, please note that the
sequenced segment of the clones doesn't neccesarily encompass the entire insert. Clones may be
longer than they appear in the genome browser.

The location of 6 physical gaps (including the centromere) and a number of small "Celera intra-scaffold"
gaps are mapped to the chromosome 7 sequence assembly. Gap size estimates are based on Celera mate-pair
information, comparison to the chimpanzee (Pan troglodytes) genomic sequence while physical gap sizes
were estimated by utilizing Fibre FISH (SeeDNA).

Microsatellite Markers
Chromosome 7 specific subset of microsatellite markers described by Tamiya et al, "Whole genome association study of rheumatoid arthritis using 27 039 microsatellites/. Hum. Mol. Genet/. 14 (16), 2305-2321 (2005)

Structural Features
The structural features data track includes a number of DNA sequence features mapped to chromosome 7.
The features include imprinted genes, fragile sites, gene deserts, sequence gaps, assembly and sequence
variations, mouse imprinted genes, RIKEN putative mouse imprinted genes, SHH regulatory elements,
centromeric and telomeric boundaries, SV-40 integration sites, sites of recurrent chromatid breaks,
Williams-Beuren related segmental duplications and the SHSF (Split Hand Split Foot) critical region.

TCAG Annotations
Chromosome 7 project gene annotation. The TCAG annotations are a hand curated dataset of all known, partial,
putative, predicted, pseudogene, gene segments, pseudogene segments, ncRNA (non-coding RNA) and novel genes.
The annotations are derived from all available EST, mRNA and gene sequence data, from all public, Celera and TCAG data sources).

Promoter sequences were obtained from the Eukaryotic Promoter Database which is an annotated non-redundant
collection of eukaryotic POL II promoters, for which the transcription start site has been determined
experimentally. Promoter sequences from the "Identification and Functional Analysis of Human
Transcriptional Promoters" Genome Res. 2003 13: 308-312 were added also from the supplementary data
provided at .

TCAG Annotations (without variants)
Chromosome 7 project gene annotations without displaying transcript variants.

CpG Islands
This track displays regions of genomic sequence that are rich in the CpG dinucleotides. CpG refers to a
C nucleotide immediately followed by a G. The 'p' in 'CpG' refers to the phosphate group linking the two
bases. The regions were identified using CpGReport (EMBOSS), with a length minimun cutoff of 200 nt
and default parameters. These regions are functionally important as they are resistant to methylation,
and tend to be associated with the 5' transcription start site of housekeeping genes which are frequently
turned on. For more information visit the European Bioinformatics Institute at

Pseudogenes (Gerstein Lab)
Pseudogenes are sequences of genomic DNA with such similarity to normal genes that they are
regarded as non-functional copies or close relatives of genes. This data track was obtained from
the Gerstein Lab at Yale Univeristy. For more information visit the Gerstein lab website at, and see the relevant publication listed below.

Zhang Z, Harrison PM, Liu Y, Gerstein M.
Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome.
Genome Res. 13:2541-58 (2003).

NCBI RefSeq Genes
This track displays the current release (RefSeq Release 3) of curated gene entries from the NCBI RefSeq Project.
RefSeq records are derived from primary GenBank submissions with varying levels of validation, additional
annotation, and manual curation. For additional information visit the RefSeq Project website @

UniGene is an experimental system for automatically partitioning GenBank sequences into a non-redundant
set of gene-oriented clusters. Each UniGene cluster contains sequences that represent a unique gene,
as well as related information such as the tissue types in which the gene has been expressed and map location.

Visit the UniGene website for more information.

Human mRNA sequences were obtained from GenBank at NCBI and mapped to TCAG chromosome 7.

Ensembl Genes
This track displays the current release of gene entries from Ensembl. Ensembl human genes (ENS*) are
generated automatically by the Ensembl gene builder.

H-Invitational cDNAs
H-Invitational Database (H-InvDB) is a human gene database, with integrative annotation
of 41,118 full-length cDNA clones currently available from six high throughput cDNA sequencing
projects. This database represents 21,037 cDNA clusters describing their gene structures,
functions, novel alternative splicing isoforms, non-coding functional RNAs, functional domains,
sub-cellular localizations, mapping of SNPs and microsatellite repeat motifs in relation with orphan
diseases, gene expression profiling, and comparative results with mouse full-length cDNAs in the
context of molecular evolution.

For more information, visit the H-invitational website at
and see the relevant publication listed below.

Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones.
PLoS Biol. 2004 Apr 20 [Epub ahead of print]
PMID: 15103394 [PubMed - as supplied by publisher]

TIGR EST Clusters
EST cluster sequences from The Human Gene Index (Release 16.0; Feb 22, 2005) from The Institute for
Genomic Research (TIGR) , were aligned to the TCAG chromosome 7 sequence. For more information about the
Human Gene Index, please visit TIGR at

Spliced dbESTs
No additional information available.

Chromosome 7 expressed sequence tags (ESTs) from dbEST (Expressed Sequence Tags database) a division of
GenBank, that contains sequence data and other information on 'single-pass' cDNA sequences.

non-Human mRNAs
Non-human vertebrate and invertebrate mRNAs were taken from GenBank, and then mapped to chromosome 7 using translated BLAT. 50% identity and 30% coverage were used as cutoff.

RIKEN DIGIT Gene Predictions
DIGIT gene predictions were obtained from RIKEN Genome Sciences Centre and aligned to TCAG chromosome 7
sequence by BLAT. A total of 211 gene predictions mapped to chromosome 7. DIGIT employs existing
gene-finders (HMMgene, GENSCAN, FGENESH) to search the input sequence. Next, DIGIT produces all possible
exons from the results of gene-finders, and assigns them their reading frames and scores. Finally, DIGIT
searches a set of exons whose additive score is maximized under their reading frame constraints. Bayesian
procedure and hidden Markov model are used to infere scores and search exon set, respectively. For more
information please visit DIGIT at

Gene Predictions
This data track contains gene predictions using various ab initio gene prediction software programs.

UniSTS Markers
STS markers were mapped to TCAG chromosome 7 by e-PCR using the March 30th STS database from NCBI

BAC Ends
BAC End sequences were obtained from Genbank at NCBI (
and mapped to TCAG chromosome 7 sequence by BLAT.

Fosmid Ends
Fosmid end sequences were obtained from Genbank at NCBI (
and mapped to TCAG chromosome 7 sequence by BLAT.

Clones on Spectral Genomics Chip
Mapping information for chromosome 7 clones included on the Spectral Genomics CGH array. The Spectral
Genomics 1 Mb resolution BAC arrays are used for the CGH studies. This technology co-hybridizes a "test"
genomic DNA sample labeled in one fluor and a control genomic DNA sample labeled in a different fluor
to a BAC genomic DNA microarray. For a complete overview of the Spectral Genomic array-based CGH go to

Segmental Duplications
This track contains data from the detection and analysis of segmental duplications in the human genome.
Segments of DNA with near-identical sequence (segmental duplications or duplicons) in the human genome
can be hot spots or predisposition sites for the occurrence of non-allelic homologous recombination or
unequal crossing-over leading to genomic mutations such as deletion, duplication, inversion or
translocation. These structural alterations, in turn, can cause dosage imbalance of genetic material or
lead to the generation of new gene products resulting in diseases defined as genomic disorders. For more
information on how the data was generated see,

Cheung J, Estivill X, Khaja R, MacDonald JR, Lau K, Tsui LC, Scherer SW.
Genome-wide detection of segmental duplications and potential assembly errors in the human
genome sequence. Genome Biol. 2003;4(4):R25. Epub 2003 Mar 17. [PMID: 12702206]

Chimp Clone Mapping
Chimpanzee sequences were mapped to TCAG chromosome 7 to provide a comparative species sequence analysis
and mapping resource. Chimp clone sequences were downloaded from GenBank (April 6, 2004) by using the
following query string: "Pan troglodytes"[Organism] AND "chromosome 7"[All Fields] AND genomic[All Fields]
then only clones with completed sequence were mapped to chr7 by BLAT. Note, although chimpanzee
chromosome 6 is syntenic to human chromosome 7, syntenic chimpanzee clones in Genbank are labelled
chromosome 7.

Marshfield InDel Polymorphism
Human insertion/deletion polymorphism data was obtained from the Marshfield Center for Medical Genetics.
The best current estimate is that 20% of all human polymorphisms are of the insertion/deletion (indel) type. Indels can be broken down into a roughly 50:50 mix of multiallelic and diallelic polymorphisms. Multiallelic indels include the minisatellites (also called VNTRs) and the short tandem repeat polymorphisms (STRPs) (also called microsatellites or simple sequence length polymorphisms (SSLPs)). Minisatellites are relatively rare and typically have repeat lengths of a few tens of nucleotides with tandem repeat copy numbers in the hundreds to thousands. STRPs are abundant and have repeat lengths of 1-6 nucleotides with tandem repeat copy numbers mostly < 30. Diallelic indels are also common, but are only just now beginning to be studied in detail (see below). All of the diallelic indels and most of the STRPs have the nice feature of being able to be analyzed simply by PCR followed by gel electrophoresis.

They have confirmed and characterized well over 2,000 human diallelic short insertion/deletion (indel) polymorphisms (Weber, et al., AJHG 71: 2002). The complete chromosome 7 dataset is available in the Table Section. Allele frequencies are available at the Marshfield website in the MIDAlleleFreqs table. The MIDAlleleFreqs table contains allele frequencies in five populations for 2000 of our Marshfield Insertion/Deletion Polymorphisms. The five populations are Europeans, Japanese, African Pygmies, Native Americans from the Amazon, and the Polymorphism Discovery Resource (PDR). Also included are average values for all populations. The MIDdata table contains in addition to the allele frequency information, other information about the markers such as chromosomal location, inserted sequence, chimp/gorilla results, and data mining sources.

For more information, please visit the Marshfield website at

Mouse Syntenic Anchors
Mouse syntenic anchors were obtained from Celera Genomics and were mapped to TCAG chromosome 7
sequence by BLAT. Mouse syntenic anchors represent best reciprocal sequence matches between
human and mouse genomes.

This track contains single nucleotide polymorphism data (SNP) from the NCBI's SNP database (dbSNP).
Sequences were downloaded from (Mar. 14, 2004).
The sequences were aligned using BLAT, and the results were parsed using a perl script to pinpoint the
exact SNP position.

Affymetrix 10K SNP Chip
The SNP data was obtained from Affymetrix and mapped to the TCAG chromosome 7 sequence by
BLAT. All available genotype and supplementary data are available at

For more information on the Human 10K SNP mapping array see
Affymetrix 10K SNP Information .

Affymetrix 120K SNP Chip
The SNP data was obtained from Affymetrix and mapped to the TCAG chromosome 7 sequence by
BLAT. All available genotype and supplementary data are available at

For more information on the Human 120K SNP mapping array see
Affymetrix 120K SNP Information .

Affymetrix 500K SNP Chip
No additional information available.

HapMap SNPs
This data track contains information from the International HapMap project, and displays SNPs
currently being genotyped and the relative allele frequencies for each.

The International HapMap Project

The goal of the International HapMap Project is to develop a haplotype map of the human genome,
the HapMap, which will describe the common patterns of human DNA sequence variation. The HapMap
is expected to be a key resource for researchers to use to find genes affecting health, disease,
and responses to drugs and environmental factors. The information produced by the Project will be
made freely available.

The Project is a collaboration among scientists in Japan, the U.K., Canada, China, Nigeria, and
the U.S. [see participating groups]. The Project officially started with a meeting on October 27-29,
2002, and is expected to take about three years.

Genetic variation and use of the HapMap

Most common diseases, such as diabetes, cancer, stroke, heart disease, depression, and asthma,
are affected by many genes and environmental factors. Although any two unrelated people are the
same at about 99.9% of their DNA sequences, the remaining 0.1% is important because it contains
the genetic variants that influence how people differ in their risk of disease or their response
to drugs. Discovering the DNA sequence variants that contribute to common disease risk offers one
of the best opportunities for understanding the complex causes of disease in humans.

Currently only genotypes for one population (90 individuals from
30 CEPH family trios) are available, but samples from two other
populations (Africa & Asia) will also be genotyped in the project.
See project website for further information
on samples and overall scientific goals.

This track contains data for interspersed and low complexity repeats identified using Arian Smit's
Repeat Masker. We used the RepBase library v7.2 update (2002/03/23) for Repeat Masker (ver 2002/07/13)
to identify all repetitive sequences. The data was generated using the sensitive mode option (-s) and run
with the WU-BLAST/MaskerAid option (-w).

This track displays the GC content of the DNA sequence calculated in 100 bp windows, and the DNA sequence
can be viewed in windows less than 100 bp in size. DNA sequence can be obtained from gbrowse for the
current window by selecting "Dump Decorated FASTA file" and then clicking on "GO" beside Dumps, Searches and
Other operations.


Note: This page uses cookies to save and restore preference information. No information is shared.
Generic genome browser version 1.70