The discovery is based on the idea that as organisms evolve, sections of genetic code that do something useful for the organism change in different ways.
The research is reported by Adam Siepel, Cornell assistant professor of biological statistics and computational biology, Cornell postdoctoral researcher Brona Brejova and colleagues at several other institutions in the online version of the journal Genome Research, and it will appear in the December print edition.
The complete human genome was sequenced several years ago, but that simply means that the order of the 3 billion or so chemical units, called bases, that make up the genetic code is known. What remains is the identification of the exact location of all the short sections that code for proteins or perform regulatory or other functions.
More than 20,000 protein-coding genes have been identified, so the Cornell contribution, while significant, doesn't dramatically change the number of known genes. What's important, the researchers say, is that their discovery shows there still could be many more genes that have been missed using current biological methods. These methods are very effective at finding genes that are widely expressed but may miss those that are expressed only in certain tissues or at early stages of embryonic development, Siepel said.
"What's exciting is using evolution to identify these genes," Siepel said. "Evolution has been doing this experiment for millions of years. The computer is our microscope to observe the results."
Four different bases -- commonly referred to by the letters G, C, A and T -- make up DNA. Three bases in a row can code for an amino acid (the building blocks of proteins), and a string of these three-letter codes can be a gene, coding for a string of amino acids that a cell can make into a protein.
Siepel and colleagues set out to find genes that have been "conserved" -- that are fundamental to all life and that have stayed the same, or nearly so, over millions of years of evolution.
The researchers started with "alignments" discovered by other workers -- stretches up to several thousand bases long that are mostly alike across two or more species. Using large-scale computer clusters, including an 850-node cluster at the Cornell Center for Advanced Computing, the researchers ran three different algorithms, or computing designs -- one of which Siepel created -- to compare these alignments between human, mouse, rat and chicken in various combinations.
Over millions of years, individual bases can be swapped -- C to G, T to A, for example -- by damage or miscopying. Changes that alter the structure of a protein can kill the organism or send it down a dead-end evolutionary path. But conserved genes contain only minor changes that leave the protein able to do its job. The computer looked for regions with those sorts of changes by creating a mathematical model of how the gene might have changed, then looking for matches to this model.
After eliminating predictions that matched already known genes, the researchers tested the remainder in the laboratory, proving that many of the genes could in fact be found in samples of human tissue and could code for proteins. The researchers were sometimes able to identify the proteins by comparison with databases of known proteins. The discovered genes mainly have to do with motor activity, cell adhesion, connective tissue and central nervous system development, functions that might be expected to be common to many different creatures.
The entire project, from building and testing the mathematical models to running final laboratory tests, took about three years, Siepel said. The work was supported by the National Cancer Institute, a National Science Foundation Early Career Development Grant and a University of California graduate research fellowship.
cornell/
The researchers have now shown that the same four factors can generate iPS cells from fibroblasts taken from human skin. From about 50,000 transfected human cells, we obtained approximately 10 iPS cell clones, Yamanaka said. This efficiency may sound very low, but it means that from one experiment, with a single ten centimeter dish, you can get multiple iPS cell lines.
The iPS cells were indistinguishable from embryonic stem cells in terms of their appearance and behavior in cell culture, they found. They also express genetic markers that are used by scientists to identify embryonic stem cells. Human embryonic stem cells and iPS cells display similar patterns of global gene activity.
They showed that the converted human cells could differentiate to form three germ layers in cell culture. Those primary germ layers in embryos eventually give rise to all the body's tissues and organs. They further showed that the human iPS cells could give rise to neurons using a method earlier demonstrated for human embryonic stem cells. The iPS cells could also be made to produce cardiac muscle cells, they found. Indeed, after 12 days of differentiation, clumps of cells in the laboratory dishes started beating.
The human iPS cells injected under the skin of mice produced tumors after nine weeks. Those tumors contained various tissues including gut-like epithelial tissue, striated muscle, cartilage and neural tissue. They finally showed that iPS cells can also be generated in the same way from other human cells.
We should now be able to generate patient- and disease-specific iPS cells, and then make various cells, such as cardiac cells, liver cells and neural cells, Yamanaka said. These cells should be extremely useful in understanding disease mechanisms and screening effective and safe drugs. If we can overcome safety issues, we may be able to use human iPS cells in cell transplantation therapies.
cellpress/