The sequencing of the human genome was the greatest task in the history of analytical chemistry. Four tools were required for this task. First, the basis for virtually all DNA sequencing was the dideoxy-chain terminating reaction, developed by Sanger and for which he received his second Nobel prize. In this method, the DNA that is to be sequences, called the template, is hybridized with a short, complementary oligonucleotide, called the primer. DNA polymerase and deoxynucleotides are added to the mixture, and the enzyme catalyzes the synthesis of the complementary DNA strand, starting at the primer and extending in the 3' direction. Sanger incorporated a chain terminating nucleotide into the reaction mixture. This reagent is a modified deoxynucleotide triphosphate, where the 3' hydroxyl group is replaced with a hydrogen atom, creating a dideoxynucleotide. When the sequencing reaction proceeds in the presence of a small amount of the dideoxynucleotide, the resulting chain-extension reaction products will consist of a set of molecules, all starting at the primer, all extending in the 3' direction, but terminating wherever the dideoxynucleotide happened to be incorporated, opposite its complementary nucleotide in the template. This set of reaction products is called a sequencing ladder. Four such ladders were synthesized by use of the four dideoxynucleotides in separate chain extension reactions.
The second tool required for genomic sequencing was a strategy for preparing the millions of template molecules required to cover the genome. Two methods were employed. The Celera team employed whole-genome shot-gun sequencing protocol, where the entire genome was randomly sheared into pieces several thousand bases in length. In contrast, the public sequencing effort employed a directed strategy that prepared an intermediate set of very large DNA molecules, which were cloned in bacteria. These bacterial artificial chromosomes were mapped onto the human genome, so that their location was precisely known. The BACs were then randomly sheared to create the sequencing templates. In both the Celera and public efforts, the templates were sequencing using Sanger's chemistry.
The third tool required for genomic sequencing was a method for rapidly analyzing the sequencing ladders prepared by Sanger's chain-terminating reaction. In the original embodiment employed by Sanger, radioactive phosphorus was enzymatically incorporated into the sequencing fragments, which were separated by gel electrophoresis, detected by autoradiography, and interpreted by a skilled technician. This method was tedious and error prone. In 1986, Leroy Hood described an automated DNA sequencer, which was based on the replacement of the radioactive label with a fluorescent tag, the replacement of autoradiography with the use of a laser-based fluorescence detector that directly monitored the migration of DNA fragments through the gel, and the replacement of the skilled interpretation of the gel with computerized analysis. Four different fluorescent labels were employed, one for each of the Sanger sequencing reactions. These products were pooled, separated in a single electrophoresis lane, and resolved based on differences in the spectra of the fluorescent labels.
Although the automated electrophoresis instrument eliminated the autoradiographic and manual interpretation steps, the electrophoresis remained cumbersome and tedious. We, along with other groups, began to investigate the replacement of the conventional electrophoresis step with capillary electrophoresis, which promised higher speed genomic analysis. As an important step in the development of capillary electrophoresis for DNA sequencing, we described in 1990 the use of a sheath-flow cuvette for laser-induced fluorescence detection in DNA sequencing by capillary electrophoresis. This instrument provided ultrasensitive detection and, as we pointed out the following year, could be extended to operate with many capillaries in parallel. We, along with a group in Hitachi, developed multiple capillary DNA sequencers based on this sheath-flow cuvette, which was commercialized by Applied Biosystems as their model 3700 DNA sequencer. This instrument was used to generate all of the data reported by Celera and the lion's share of data generated by the public sequencing effort. We have a page devoted to our multiple capillary DNA sequencers.
The fourth tool required for genomic sequencing was software to assemble the vast amounts of data into the finished sequencer. This effort continues, and refinements to the sequence are reported regularly.
The genome can be viewed at the NIH. Celera has another site that provides access to their human and Drosophila genome sequences.
We have published dozens of papers dealing with DNA separations by capillary electrophoresis, which can be found on our publications page.