Can we simulate life's machinery?

September 8, 2000
by Alan Boyle

Projects enlist Internet users to work on life-and-death protein puzzles

Sept. 8 - For decades, researchers have been puzzling over the mysteries of protein folding - the "machinery of life" that translates DNA's instructions into action. Solving the mysteries could lead to medical breakthroughs, but progress has come slowly. Now programmers are putting regular Internet users on the case, thanks to distributed computing.

EXPERTS AGREE that solving the protein-folding puzzle would represent a milestone in biotechnology.

"It's of fundamental importance for genomics," said Sorin Istrail, senior director of informatics research at Celera Genomics, which was involved in decoding the human genome. "It's the central understanding of the machinery of life."

Living cells assemble amino acids into thousands of types of protein, to carry out tasks such as carrying oxygen through the bloodstream, flexing muscles and fighting infection. The molecules of each protein twist and fold automatically into just the right shape to do its work, based on complex chemical interactions.

But sometimes those interactions can go haywire - resulting in conditions associated with a long list of diseases, including Alzheimer's, mad-cow disease, cystic fibrosis and some forms of cancer. Thus, understanding the protein-folding process and how to keep it from going astray could save lives as well as unlock genetic secrets.

Distributed computing's role

Distributed-computing teams believe they can succeed where others have gotten bogged down.

"It's a big problem," said Scott Le Grand, a molecular biologist turned computer programmer who helped develop a just-released program called Folderol.. "It's not solved yet. It hasn't been solved in the 40 years since it was first discovered, and it looks like the computers are ready to solve this for us. We just have to come up with the right algorithm."

Distributed-computing projects let Internet users download scientific data, run it on their own computers using spare processing cycles, then send the results back to a central database.

Folderol is just one of several such programs focusing on the proteinpuzzle:

Folding@home, developed at Stanford University using Cosm's distributed-computing tools, has been chugging away for months. Server statistics indicate that the project has chalked up more than 6.6 years' worth of processing time so far.

Entropia, a distributed-computing firm based in San Diego, is talking with researchers about putting its own Internet grid to work on the problem.

"Imagine that, even while you were using your computer, 98 percent of those cycles while you were typing fell on the floor," Jim Madsen, Entropia's president and chief executive officer, told "Your PC could have been working on protein folding while you were writing this story."

United Devices is planning to take on analysis of human genome sequences as one of its first distributed-computing projects, said chief technology officer David Anderson. Anderson also happens to serve as director for the reigning champion in the distributed-computing field: the SETI@home program, with more than 2.3 million users worldwide.

Could thousands or millions of desktop computers succeed where supercomputers have failed? Istrail, who led a protein-folding simulation project at Sandia National Laboratories before moving over to Celera, said he's "extremely skeptical."

"When it comes to factoring numbers, everybody understands the problem," he told "Protein folding is so complex. The best minds in this world have been working on this problem for 40 years, and we're still somehow in mysterious territory."

Anderson, however, said the skepticism shouldn't deter people from trying.

"People are always skeptical at first," said the SETI@Home veteran.

What's the problem?

The problem is that the protein-folding process is so complex some scientists believe cracking the code just might be impossible. Figuring out all the possible permutations for a single protein would take billions of billions of years' worth of brute-force calculations, by some estimates.

Folderol would take a different approach, said Le Grand, who wrote several papers on protein folding and edited a textbook on the subject during nine years of research. The program doesn't check every possible permutation by brute force. Instead, it farms out data on a particular protein to run on multiple computers, and eventually compares the results from millions of parallel simulations. It would take roughly two to six hours for each user to complete work on a protein with 100 amino acids, Le Grand said.

"If a million runs (of the simulation) run independently, and a thousand runs converge on what's roughly the same thing, then that is the most likely confirmation," he said.

Le Grand and his colleagues say they'd like to let other computer users modify Folderol's source code, as long as the code relating to data distribution can be protected.

"I would really like to see homebrew hackers get into this the same way they've gotten into prime factorization and encryption," Le Grand said. "If I can provide them a code base that they can work with as building blocks, then I think I can get them involved."

The Folderol team - which also includes mathematician-musician Stephanie Wukovitz and artist-engineer Doug Engel - already has some Internet cachet: The trio was involved in the creation of BattleSphere, a video game for the Atari Jaguar that has attracted a cult following.

"A lot of the algorithms for a 3-D video game are the same algorithms that one would use to simulate the folding of proteins, so there was a natural overlap," Le Grand said.

The Folderol team plans to analyze the same protein data that's used for the Critical Assessment of Techniques for Protein Structure Prediction, a biennial gathering where researchers gauge how much progress they've made on the protein puzzle. That should provide a good opportunity for judging Folderol's success.

Making it their business

Several companies already have been built around the application of distributed computing to medicine and biotechnology. Entropia offers a range of team projects for participants in 83 countries.

"If someone who was dear to you had Alzheimer's disease, and we were working on some research on ways to stall the progress of that disease, that's a valuable thing to offer," explained Tim Cusac, the company's senior marketing analyst.

Scott Kurowski, vice president for business development, indicated that Entropia's smorgasbord could include protein analysis.

"It's conceivable the genetic algorithm approach could be applied to this," he said. "We've implemented similar kinds of technologies in biotechnology solutions."

Entropia's executives said comparing families of proteins may be a better approach to the problem than trying to analyze each and every protein. Researchers estimate that millions of proteins can be found in nature, grouped into just 5,000 families that share similar structures.

Understanding the principles

Istrail, meanwhile, said more attention should be devoted to identifying and classifying proteins.

"Computing will get you only so far," he said. "What we need is to understand the principles."

He said much more data would be needed to start figuring out the principles of protein folding.

"What will be a tremendous help is an industrial as opposed to a piecemeal approach to the problem. There are 6,000 or 7,000 structures in the database ... that's not enough," he said.

Istrail would like to see 50,000 structures entered into the Protein Data Bank.

"But how do you get to them?" he asked. "I think the area is in a big deadlock. We need a phase transition to a new environment for research."

© Mithral Inc. 1995-2017. All Rights Reserved.
Mithral® and Cosm® are trademarks of Mithral Inc.