|
|
 |
GENE SEQUENCING
Introduction
The DNA containing life's instructions is found in a set of
chromosomes within each of the cells that make up all organisms. All of
the instructions encoded in DNA are spelled out using a shockingly
simple alphabet consisting of four 'letters': A, C, G, and T. Each of
these letters represents a molecule called a nucleotide that is composed
of a sugar, a phosphate, and a base. It is the bases that differentiate
each of the four DNA nucleotides.
A = Adenine
T = Thymine
C = Cytosins
G = Guanine
If the classical DNA double helix is untwistedÑmuch as it is
each time the molecule self-replicatesÑa ladder-like structure is
revealed. The sides of the ladder are formed by the repeating sugar and
phosphate units of DNA's component nucleotides. The rungs are each made
of a pair of the nucleotide bases, which can only pair up in a strictly
defined way: Adenine is always paired with Thymine, while Cytosine is
always paired with Guanine. This base pair complementarity underlies
much of DNA's remarkable behavior. It is also the trait that allows
molecular biologists to cut, clone, probe, and determine the precise
sequence of an organism's genetic material.
Modern genetic sequencing of the sort that made possible the completion
of the Human Genome Project, is a technique used by numerous
biotech research teams described on this website. It is a highly
automated, computer-controlled process, but the basic techniques can be
briefly outlined here.
Step By Step
The process begins with a sample of several cells from a human, or a
fruit fly, or a sponge, or whatever organism is being studied. Starting
with multiple cells ensures that multiple copies of all the DNA are
present. These are added to a buffer solution.
Next, Restriction enzymes are added to the mix. Restriction enzymes
have the unique ability to cut large strands of DNA into smaller
fragments, an ability thought to be used as a defense against invading
bacteriophages (bacterial viruses), by cleaving the molecule only at
locations containing a specific short sequence of bases. Several dozen
restriction enzymes have now been discovered, mostly in bacteria, and
each is capable of cleaving DNA any time it encounters the particular
base sequence it was built to recognize. The restriction enzyme Hind
III, for example, cleaves DNA whenever it encounters the base sequence
AAGCTT, with the cuts being made between the two Adenine bases in every
instance. Because a simple sequence such as AAGCTT is bound to be
repeated countless times in the millions of bases in an organism's DNA,
addition of restriction enzymes is guaranteed to do plenty of chopping.
Only a limiting amount of the selected restriction enzyme is added
because, importantly, the DNA samples must not be cut at all possible
cleavage points (e.g, every AAGCTT run if Hind III is used). This is
because if all possible cuts were made there would be no fragment
overlap between the DNA copies that have been cleaved, and identifying
such overlaps is a key part of gene sequencing and mapping.The goal at
this stage is instead to end up with fragments that are around 150,000
bases long.
Although the process began with multiple sample cells and, hence,
several copies of starting genetic material, the sequencing process
requires copies in far, far greater abundance. To get the copies
needed, molecular biologists turn to bacteria (or sometimes yeast) to do
the hard work.
The replication begins with small, circular pieces of DNA called cloning
vectors that are capable of replicating on their own when inside a cell.
These can be made artificially, but also occur naturally in a variety of
forms such as plasmids, which are bacterial DNA segments found outside
an organism's chromosomes. These vectors are combined with restriction
enzymes that open (straighten) and cut them in such a way that the ends
are staggered. Using the same restriction enzyme with the vectors as
that used to create the sample DNA fragments insures that both the cut
vectors and the sample fragments will have complementary ends. Once an
additional enzyme known as DNA ligase is added to the mixture, these
complementary ends will bond, creating "loaded" vectors that are once
again closed and circular.
The loaded vectors are then inserted into live bacterial cells,
typically by shocking the bacteria with a mild electrical charge, a
process called electroporation. Normally, molecules cannot pass easily
through a bacterium's cell membranes. However, the electrical shock
temporarily breaks bonds between fatty acids found in the membrane,
allowing DNA to pass through and allowing introduction of the vectors
inside the cells.
The bacteria containing the newly inserted vectors are then thinly
plated onto a suitable growth medium and incubated. As the bacteria
repeatedly divide, each daughter cell formed contains a new copy of the
inserted DNA vector. Soon, visible colonies are formed on the culture
plate. As each distinct colony arose from a single bacterial cell, each
colony contains many clones of one particular vector-incorporated target
DNA fragment.
Once the colonies have grown to contain about a million cells, a single
colony is selected and used to inoculate a liquid scale-up culture.
Cell division continues in the liquid until several billion copies of
the original cell (and inserted vector) are obtained.
The cloned DNA fragments are recovered from the bacteria by using
detergent to rupture the cell walls. Sodium hydroxide is then added
because it degrades the relatively larger bacterial DNA while leaving
the relatively smaller DNA vectors fairly undamaged. The process yields
billions of copies of vector-incorporated target DNA all cloned from a
single starting DNA fragment.
At around 150,000 base pairs, the target DNA fragment is still too long
for current sequencing technology to handle straightaway. This fragment
must be broken down further by repeating the entire process, beginning
with the addition of more restriction enzyme. The result, again, is
billions of copies of a single vector-incorporated DNA fragment; now
however, the fragment length is in the 2,000-4,000 base pair range.
The next step in the process is to create strands new strands of the
target DNA sequence that are fluorescent. The first step in this part of
the gene sequencing process is to apply heat to the cloned DNA fragments
to separate the double-stranded DNA into single strands.
To promote the synthesis of new strands of DNA complementary to the
single stranded target material, sufficient amounts of free nucleotide
bases (A, T, C, G) are added to the reaction vessel along with DNA
polymerase. This is an enzyme needed for reading the single stranded
DNA templates and assembling the complementary strands from free
nucleotides. A DNA primer is also added. These are short DNA chunks of
known sequence that bind to complementary sites on the single stranded
DNA vectors that then initiate construction of the complementary
strands.
Finally, a measured amount of fluorescently tagged dideoxynucleotide
bases (ddA, ddT, ddC, ddG) is added. These behave differently than the
'regular' (deoxy) nucleotides in two important ways:
First, the dideoxynucleotides have an altered molecular structure that
causes them to halt the construction of complementary DNA strands after
they have themselves been added to the strands. This ensures that the
tagged nucleotides are always the last bases on any strand they are
built into.
Second, the dideoxynucleotides are tagged with a marker that visibly
fluoresces when it absorbs energy emitted from a laser. The tags are
specifically color-coded so that each nucleotide base can be positively
identified:
ddA = green
ddG = yellow
ddC = blue
ddT = red
Recall that in a previous step it was important not to add too much
restriction enzyme so that adequate fragment overlap would be ensured.
Similarly, it is important here not to add too much dideoxynucleotide
base, relative to the amount of regular nucleotide provided. The idea
is that complementary DNA chain elongation is supposed to proceed for a
while using available free nucleotides and DNA polymerase until, by
chance, a ddA, ddG, ddC, or ddT is added to the chain and elongation
stops. Using this technique, a set of DNA fragments with the same
starting point, which is defined by the primer used, but having many
differing total base lengths is generated.
Heat is again used to separate the newly synthesized strands from their
vector templates. Additional free nucleotides, tagged nucleotides,
primer, and DNA polymerase can be added and the original single-stranded
for reuse of the vector templates as many as 40 times. This allows
generation of billions upon billions of strands of complementary DNA of
all possible base lengths.
These billions of DNA copies must now be sorted according to length, a
task accomplished through gel electrophoresis. This process is most
often carried out within small plastic capillary tubes in automated
sequencing systems.The entire collection of DNA fragments is placed in
one end of a gel-filled capillary, and an electric current is applied to
the gel. This causes the slightly negatively charged DNA molecules to
migrate toward the positive end of the electrical field. The natural
resistance of the gel allows smaller fragments to migrate faster than
larger fragments.
A laser is aimed at a fixed location on the capillary. As each subset
of DNA fragments of a particular length pass through the beam, energy
striking them causes the color-coded dideoxynucleotide tags to
fluoresce, revealing the identity of each successive base in the target
DNA sequence.
Even though each strand is very, very small, the combined fluorescence
of many identical strand copies passing through the beam simultaneously
is intense enough to be recorded by CCD sensor and the information sent
to an attached computer.
Current technology only allows about 500 bases at a time to be sequenced
in this manner. This is only about one-third the average length of the
coding region of a gene, and only around 3% of the average total gene
size when non-coding regions are also considered.
Overcoming this limitation is where all of those overlapping fragments
come into play. Repeating the sequencing procedure using many different
2,000-4,000 base pair target fragments (i.e., obtained from different
clonal bacterial colonies) will reveal sequences with varying degrees of
overlap. Repeating the procedure with many different 150,000-base
starting fragments reveals longer sequences and still more overlap. By
painstakingly charting where these overlaps occur, a map can be created
showing the proper positions of each fragment with respect to the
others.
Automation of the sequencing process, using robots, supercomputers, and
redundant sequencing arrays, now allows sequences to be decoded at rates
of thousands of bases per hour. Such automation was at the heart of the
Human Genome Project [LINK?]. Multiple highly automated sequencing
facilities around the world operated at breakneck speed to sequence
human DNA one nucleotide at a time until entire fragments, then entire
genes, entire gene regions, chromosomes, and eventually the entire human
genome Ð all 3 billion base pairs of it Ð was mapped out. Gene
sequencing is used in countless less complex ways to advance all aspects
of biotechnology, including marine biotechnology.
|
 |