Tuesday, October 19, 2010

Genetic Algorithms: Applying Evolutionary Biology to Computing

Principles of biological evolution are being applied to how computers solve problems in a variety of fields today, from aerospace to finance, acoustics, engineering, etc. It holds the promise of turning innovation into a computational process that can be scaled. How does this work?

Replication and Randomness
Computers are good at replicating, just as biological organisms are. The advances that come with biological evolution come about as minor changes (random mutations) end up favoring fitness for survival (over the course of many generations). So, with that in mind, a computer replicates millions of instances of a problem, and then introduces an element of isolated randomness to each of those instances (just as every individual person replicates the human genome, but we come with randomized versions of our own genetic code).

Survival of Fittest Solutions
Then, since computers are really good at repeating cycles, it's possible to have different "generations" or repeated cycles of the computer trying to find a solution. Benchmarks are set up to determine the viability of candidate solutions, and those candidate solutions either "live" or "die" depending on how well they get to those benchmarks. The computer then replicates the more favorable candidate solutions, introduces additional random factors for the next generation, and tries it again. Here's an example of how it works.


Let's take a difficult computing problem like programming a computer to understand people talking. Understanding human speech is extremely complex, and computers still aren't all that good at it. There are so many factors and variables that scientists do a lot of guessing about how to program computers to do this task. For simplicity's sake, let's say that they have broken this goal down into three outcomes: 1) the ability to identify words as separate sounds; 2) the ability to identify the human language spoken; and 3) the ability to accommodate different speech dialects. So, 1) parsing sounds into words; 2) identifying which human language; 3) dealing with dialects. These will be the benchmarks for any candidate solutions (in real life I'm sure there are many more). Imagine them using a sound recording as a test to measure these things, such as the audio from this video clip, which is taken from the movie Louisiana Story and features a conversation in Cajun French. Could the computer hear this stream of language as a set of distinct words (test 1)? Could it tell this was in French (test 2)? Could it tell that this was a dialect of French from the Southern USA (test 3)?

Okay, on the other end of things, you have all these variables in terms of possible solutions. For example, one variable could be matching sounds against dictionary databases. Another could be analyzing intonation so that intelligent guesses could be made for content (such as the way we make our voices go up at the end of questions). Another variable could be pitch range (I've heard that certain languages take place within a more restricted range of pitches). And of course, what really makes this all complex are the variables of how you combine all these variables, and in what order. For example, is it better to identify the type of content first (statement, question, exclamation) or the language being spoken? Would it work better to identify phrases before identifying words? And of course I haven't even touched factors like the gender or age of the speakers, or issues of separating speech from other background noise, etc. Obviously there are lots of variables. Let's take just the first three I mentioned and label them A, B, and C:

A - Dictionary match (for parsing words)
B - Intonation
C - Pitch range

Of course, we anticipate many, many more variables. Programmers could give letters to a long set of very specific variables that might be part of the solution (for example, "D" could be to change the order of what's analyzed, so that the computer went BCA instead of ABC; "E" could be trying out that possibility of identifying phrases instead of words first, etc.). Obviously, many of these variables would be at odds with one another.

Alright, so next, a computer program is written that uses A,B,&C to analyze the Cajun audio clip. But imagine that instead of it being attempted just once, it is attempted many times with the computer instructed to create slightly different instances of the problem by introducing, randomly, one of the specified other factors (D,E,F,G,H,I,J,K,L,M,N,O,P....etc.). So, the first iteration or "generation" of the attempt to solve the human speech problem might be something like this:

1. ABC + D
2. ABC + E
3. ABC + F
4. ABC + G
5. ABC + H
6. ABC + I
7. ABC + J
8. ABC + K
9. ABC + L
10. ABC + M

In other words, basically you are sticking with some core variables, but randomly trying out a single other factor. In this case, for simplicity, I set forth 10 different instances. In reality, the number of instances is limited only by the processing resources of the computer. Whether it's 10 or 10,000 instances, the computer tests each variation against the benchmarks set up above. Let's say that this is what this particular generation yielded:

GENERATION 1
1. ABC + D - met criteria #1 at 75%; criteria #2 at 50%; criteria #3 at 65% (overall average of 63%)
2. ABC + E - met criteria #1 at 65%; criteria #2 at 35%; criteria #3 at 25% (overall average of 42%)
3. ABC + F - met criteria #1 at 25%; criteria #2 at 13%; criteria #3 at 10% (overall average of 16%)
4. ABC + G - met criteria #1 at 95%; criteria #2 at 28%; criteria #3 at 45% (overall average of 57%)
5. ABC + H - met criteria #1 at 45%; criteria #2 at 55%; criteria #3 at 12% (overall average of 37%)
6. ABC + I - met criteria #1 at 55%; criteria #2 at 99%; criteria #3 at 77% (overall average of 77%)
7. ABC + J - met criteria #1 at 10%; criteria #2 at 74%; criteria #3 at 62% (overall average of 49%)
8. ABC + K - met criteria #1 at 88%; criteria #2 at 21%; criteria #3 at 21% (overall average of 43%)
9. ABC + L - met criteria #1 at 14%; criteria #2 at 14%; criteria #3 at 30% (overall average of 19%)
10. ABC + M - met criteria #1 at 1%; criteria #2 at 61%; criteria #3 at 5% (overall average of 22%)

You can see that I bolded three of the results. They were the combinations that achieved an average score over 50%. So, in evolutionary terms, those are the ones that are most viable, most likely to reach the goal and therefore, they get to "survive" into the next generation. The others are "killed" (not replicated).

Okay, so for the next round, candidate solutions #1, #4, and #6 above (the bolded ones) are going to be duplicated (whether 10 or 10,000 times, as before, depending on computing capacity), something like this:

1. ABC + D
2. ABC + D
3. ABC + D
4. ABC + G
5. ABC + G
6. ABC + G
7. ABC + I
8. ABC + I
9. ABC + I
10. ABC + I

But just as in the first generation, another random variable will be introduced to each of these "survivor" solutions:

1. ABC + D + N
2. ABC + D + O
3. ABC + D + P
4. ABC + G + N
5. ABC + G + O
6. ABC + G + P
7. ABC + I + N
8. ABC + I + O
9. ABC + I + P
10. ABC + I + Q

Can you see what's happening here? Three additional variables (N,O,P) are now being tested against the three "survivor" solutions from generation one (ABC + D, ABC + G, and ABC + I). The exact thing as happened before is now repeated in this generation. The benchmarks are applied, each iteration is measured against them, and the survivors get to go on to be replicated and slightly varied.

So, each iteration or generation ends up being more efficient than the last. Instead of scientists trying just one solution or one variable, they allow the machine to test these for approximate solutions that, over time, approximate more and more closely to the benchmark goals. When computers can go through literally millions of "generations" of such candidate solutions, it is an orderly way to use randomness. It boils down to very intelligent kinds of guessing.