|
SNiPs: Single Nucleotide Polymorphism |
Single nucleotide polymorphism or SNP
(pronounced snip) is a DNA sequence variation occurring when a single nucleotide
- A, T, C, or G - in the genome (or other shared sequence) differs between
members of a species (or between paired chromosomes in an individual). E.G.
Two sequenced DNA fragments from different individuals, AAGCCTA to AAGCTTA,
contain a difference in a single nucleotide. In this case we say that there are
two alleles : C and T. Almost all common SNP have only two alleles.
Within a population, SNP's can be assigned a minor allele frequency - the ratio
of chromosomes in the population carrying the less common variant to those with
the more common variant. It is important to note that there are variations
between human populations, so a SNP allele that is common in one geographical or
ethnic group may be much rarer in another. In the past, single nucleotide
polymorphisms with a minor allele frequency of 1% (or 0.5% etc.) were given the
title "SNP", an unwieldy definition. With the advent of modern bioinformatics
and a better understanding of evolution this definition is no longer necessary.
Single nucleotide polymorphisms may fall within coding sequences of genes,
non-coding regions of genes, or in the intergenic regions between genes. SNP's
within a coding sequence will not necessarily change the amino acid sequence of
the protein that is produced, due to degeneracy of the genetic code. A SNP in
which both forms lead to the same polypeptide sequence is termed synonymous
(sometimes called a silent mutation) - if a different polypeptide sequence is
produced they are non-synonymous. SNP's that are not in protein coding regions
may still have consequences for gene splicing, transcription factor binding, or
the sequence of non-coding RNA.
On average, SNPs occur in the human population more than 1 percent of the time.
Because only about 3 to 5 percent of a person's DNA sequence codes for the
production of proteins, most SNPs are found outside of "coding sequences". SNPs
found within a coding sequence are of particular interest to researchers because
they are more likely to alter the biological function of a protein. Because of
the recent advances in technology, coupled with the unique ability of these
genetic variations to facilitate gene identification, there has been a recent
flurry of SNP discovery and detection.
As a result of recent advances in SNPs research, diagnostics for many
diseases may improve. Finding single nucleotide changes in the human
genome seems like a daunting prospect. But over the last 20 years, biomedical
researchers have developed a number of techniques that make it possible to do
just that. Each technique uses a different method to compare selected regions of
a DNA sequence obtained from multiple individuals who share a common trait. In
each test, the result shows a physical difference in the DNA samples only when a
SNP is detected in one individual and not in the other.
Many common diseases in humans are not caused by a genetic variation within a
single gene but are influenced by complex interactions among multiple genes as
well as environmental and lifestyle factors. Although both environmental and
lifestyle factors add tremendously to the uncertainty of developing a disease,
it is currently difficult to measure and evaluate their overall effect on a
disease process. Therefore, we refer here mainly to a person's genetic
predisposition, or the potential of an individual to develop a disease based on
genes and hereditary factors.
Genetic factors may also confer susceptibility or resistance to a disease and
determine the severity or progression of disease. Because we do not yet know all
of the factors involved in these intricate pathways, researchers have found it
difficult to develop screening tests for most diseases and disorders. By
studying stretches of DNA that have been found to harbor a SNP associated with a
disease trait, researchers may begin to reveal relevant genes associated with a
disease. Defining and understanding the role of genetic factors in disease will
also allow researchers to better evaluate the role non-genetic factors?such as
behavior, diet, lifestyle, and physical activity?have on disease.
Because genetic factors also affect a person's response to drug therapy, DNA
polymorphisms such as SNPs will be useful in helping researchers determine and
understand why individuals differ in their abilities to absorb or clear certain
drugs, as well as to determine why an individual may experience an adverse side
effect to a particular drug. Therefore, the recent discovery of SNPs promises to
revolutionize not only the process of disease detection but the practice of
preventative and curative medicine.
It will only be a matter of time before physicians can screen patients for
susceptibility to a disease by analyzing their DNA for specific SNP profiles.
Each person's genetic material contains a unique SNP pattern that is made up of
many different genetic variations. Researchers have found that most SNPs are not
responsible for a disease state. Instead, they serve as biological markers for
pinpointing a disease on the human genome map, because they are usually located
near a gene found to be associated with a certain disease. Occasionally, a SNP
may actually cause a disease and, therefore, can be used to search for and
isolate the disease-causing gene.
To create a genetic test that will screen for a disease in which the
disease-causing gene has already been identified, scientists collect blood
samples from a group of individuals affected by the disease and analyze their
DNA for SNP patterns. Next, researchers compare these patterns to patterns
obtained by analyzing the DNA from a group of individuals unaffected by the
disease. This type of comparison, called an "association study", can detect
differences between the SNP patterns of the two groups, thereby indicating which
pattern is most likely associated with the disease-causing gene. Eventually, SNP
profiles that are characteristic of a variety of diseases will be established.
Then, it will only be a matter of time before physicians can screen individuals
for susceptibility to a disease just by analyzing their DNA samples for specific
SNP patterns.
Using SNPs to study the genetics of drug response will help in the
creation of "personalized" medicine. As mentioned earlier, SNPs may also
be associated with the absorbance and clearance of therapeutic agents.
Currently, there is no simple way to determine how a patient will respond to a
particular medication. A treatment proven effective in one patient may be
ineffective in others. Worse yet, some patients may experience an adverse
immunologic reaction to a particular drug. Today, pharmaceutical companies are
limited to developing agents to which the "average" patient will respond. As a
result, many drugs that might benefit a small number of patients never make it
to market.
In the future, the most appropriate drug for an individual could be determined
in advance of treatment by analyzing a patient's SNP profile. The ability to
target a drug to those individuals most likely to benefit, referred to as
"personalized medicine", would allow pharmaceutical companies to bring many more
drugs to market and allow doctors to prescribe individualized therapies specific
to a patient's needs.
Most SNPs are not responsible for a disease state. Instead, they serve as
biological markers for pinpointing a disease on the human genome map. Because
SNPs occur frequently throughout the genome and tend to be relatively stable
genetically, they serve as excellent biological markers. Biological
markers are segments of DNA with an identifiable physical location
that can be easily tracked and used for constructing a chromosome
map that shows the positions of known genes,
or other markers, relative to each other. These maps allow
researchers to study and pinpoint traits resulting from the interaction of more
than one gene. NCBI plays a major role in facilitating the identification and
cataloging of SNPs through its creation and maintenance of the public SNP
database (dbSNP). This powerful genetic tool may be accessed by the biomedical
community worldwide and is intended to stimulate many areas of biological
research, including the identification of the genetic components of disease.
Thus, SNPs are small genetic changes, single
base nucleotides in DNA (individual A, T, G, or C), that vary among individuals.
Human populations are estimated to be 99 percent identical at the level
of genetic sequence. Diversity arises from the remaining 1 percent
variation, most of which is accounted for by SNPs (although a small
percentage is due to deletions or insertions of DNA). There are estimated to be
approximately 10 million SNPs in the human genome. They are found, on average,
every 100 to 300 base pairs in the 3-billion-base pair genome, although their
density varies between regions. SNPs are found in both coding and non-coding
regions, and the majority of them (two thirds) are substitutions of thymine (T)
for cytosine (C). They are relatively stable evolutionarily, and are therefore
useful in population studies. Also, because SNPs are distributed more or less
evenly throughout the human genome, they can serve as helpful landmarks in the
construction of genetic maps.
Most SNPs are silent -- that is, they exert no discernible effect on gene
function or phenotype. They can, however, have important consequences for
individual susceptibility to disease and reactions to medical
treatment. One of the better known associations of SNPS with disease results
from the presence of the E4 allele, which is associated with a higher risk of
developing Alzheimer's disease than the E2 allele. SNPs in the genes BRCA1
(breast cancer gene 1) and BRCA2 (breast cancer gene 2) that inactivate these
tumor suppressors occur in five percent of all breast cancer cases and also put
carriers at risk for developing ovarian cancer. The lifetime breast cancer risk
for women who carry such genetic mutations is in the range of 50-80 percent.
In addition to changes in single genes that affect disease risk, it is thought
that particular combinations of SNPs located across multiple genes contribute to
a predisposition to developing medical conditions. SNPs are also believed to
underlie individual variation in response to medical treatments. An
understanding of the genetic basis for drug response, usually referred to as
pharmacogenomics, would have important clinical implications. By being able to
predict how different individuals are likely to react to different drugs, a
physician could tailor treatment to a specific patient's genetic profile, thus
maximizing therapeutic benefit and minimizing hazardous side effects.
Currently, a vast literature exists reporting possible associations between SNPs
and diseases. Some links are supported by multiple reports; other associations
are plagued by conflicting reports possibly due to false positives, false
negatives or true variations among the populations studied. Those associations
with high penetrance -- ones that confer a relatively high disease risk of 50 to
80 percent -- have been most clearly defined. The situation for those with a
lower penetrance is considerably murkier.
A recent study published in Nature examined the impact of possible false
positives on the literature. In the study, a collaborative team spearheaded by
investigators at the Whitehead Institute used a meta-analysis approach to
examine 300 published studies covering 25 reported associations. The studies
selected were follow-up reports that corroborated an association. If the initial
report had been a false positive, by chance only 5 percent, or 15, of the
follow-up studies should contain statistically significant data. In contrast,
the meta-analysis showed that a much higher fraction, 59, of the replication
studies were significant. Therefore, while false positives no doubt occur, many
associations in the literature in fact can be replicated.
Investigators in the field are advocating approaches to SNP disease association
studies that take into account function and biological plausibility and make use
of large sample sizes that will increase the power to detect associations with
small magnitude effects.