Technical Difficulties
Technical Difficulties Evolutionary Programming Toolkit

evolutionary programming toolkit

a simple way to create evolutionary algorithms
by Zach Barth (Zachtronics Industries)

NOTE: Links to download project files are at the bottom of the page.

Technical Difficulties Evolutionary Programming Toolkit
evolution of various sequences

Using simple evolutionary models, it's easy to create algorithms that "evolve" solutions to problems, such as fitting a function to a dataset (one of my applications) or creating an optimum antenna (one of NASA's applications). The concept I used is simple: sequences of "DNA", strings that consist of the characters A, T, C, and G (from real biology), map out to some sort of algorithm. Sequences are then ranked, such that the "better" or "stronger" algorithms they represent achieve a better rank (in EPTK, lower ranks are "better"). The EPTK frontend then proceeds to cull the algorithms by rank: the top algorithms reproduce, creating a pure copy and two mutated copies (with either a letter change, insertion, or deletion), while the weaker algorithms are randomly weeded out. Thus, a strong algorithm will either continue its lineage, mutate into something weaker and die out, or mutate into something stronger and perhaps become the new dominant algorithm. In the end, you're left with a strong algorithm.

Technical Difficulties Evolutionary Programming Toolkit
interaction between EPTK frontend and DNAKit DLLs

The EPTK frontend knows NOTHING about what the sequences mean; it simply sorts, copies, mutates, and deletes sequences of arbitrary letters, along with keeping track of sequence age and removing ones that get too old. All the real "thinking" is done by DNAKit DLL files; each one is a .NET managed class library DLL that exports a class with two functions (can be programmed in C# and Visual Basic .NET, and probably C++ .NET and J# and maybe even your other favorite .NET language):

double rank (string dna);

This function returns the ranking of the DNA sequence, with a low number being better. The function below (C# code) is for an experiment in which we wish to "encourage" sequences with the greatest number of 'A' nucleotides (letters):

public double rank (string dna)
{
   double ret = 100;
   for (int i=0; i<<dna.Length; i++)
      if ( dna[i] == 'A' )
         ret--;
   return ret;
}

With every additional A, the rank number decreases and therefore designates a stronger sequence.

string decode (string dna);

This function returns a string that gives the human representation of the DNA sequence; GCTATTACC doesn't mean much to us, especially since it's different depending on what DNAKit you're currently using, but it could mean anything from a mathematical algorithm to a simple vector image to a small program. To help make it easier to understand, the decode() function returns a string that we can understand. The example below, which is from the same file as the example above, simply decodes the DNA to say "Sequence has X A nucleotides!", where X is the number of A's:

public string decode (string dna)
{
   return "Sequence has " + (100 - rank(dna)) + " A nucleotides!";
}

And that's it! A DNAKit DLL only provides these TWO FUNCTIONS, and from there is able to run fairly complex simulations. The image below shows what happens after running a DNAKit with the above two functions implemented for a few cycles:

Technical Difficulties Evolutionary Programming Toolkit
'A' nucleotide maximizing example (adenine-DNAKit.dll)

Because we weighted more A's to be better, the sequences that mutated to have more A's became dominant. Repeating this cycle over and over creates sequences with more and more A's, as would be expected. Hooray for evolutionary models!

Evolutionary Programming Toolkit v1.0

Contains the EPTK frontend and a few example DNAKits (with DLL files and Visual Studio .NET 2003 projects and sources).

The functions DNAKit uses "codons" of two letters (eg 'AA', 'GT', and 'CA') to represent one-in-one-out mathematical functions that are chained together to build up bigger functions. The sequences are then ranked according to how well a set of 10 data points in data.txt fit (in the order X Y X Y X Y X Y X Y X Y X Y X Y X Y X Y). The data file presently in the zip was created from the function 2*abs(X)+1; load functions-DNAKit.dll and run the simulation a few times and see what you come up with!

Update: I've added a taylor polynomial kit DLL that approximates the data in taylor-data.txt. Read the readme and find out!

Technical Difficulties Evolutionary Programming Toolkit
All original, non-credited content on Technical Difficulties is ©2004-2006 Igor Stolarsky