Sayed's Blog

Weasel Program

Posted June 21st, 2019

One big misconception of evolution by natural selection is the idea that it operates by random chance. People often look at the number of ways a protein, for example, could be arranged, and claim that the probability of it evolving must be miniscule.

In his book The Blind Watchmaker, Richard Dawkins addresses this using a computer program to demonstrate an analogy.

He does this by first referencing infinite monkey theorem.

This is the idea that monkeys bashing on keys randomly, given enough time, will produce all the works of Shakespeare.

He contrasts this with an evolutionary approach: starting from random gibberish, copy it with a probability of a random mutation. Select the mutation that most resembles the works of Shakespeare. Repeat until the works of Shakespeare have been reproduced.

To simplify the demonstration, instead of attempting to produce all the works of Shakespeare, he instead tries to produce just one phrase: "METHINKS IT IS LIKE A WEASEL" (from Hamlet).

28 characters. Each character can be any one of 27 characters (the alphabet and a space character). That's 272827^{28} combinations, or roughly 104010^{40} (underestimate).

If we were producing these sequences at a rate of one per second, it would take 2.2962872910222.29628729 * 10^{22} times longer than the length of the universe to go through all combinations.

This program has come to be known as "The Weasel Program".

Let's reproduce this program here. I'll be using python to write this.

How it works

  1. Generate a random sequence of 28 characters.
  2. Copy every character that in that sequence 100 times.
  3. Give every character a 5% chance of mutating to a random character.
  4. Choose the sequence that most closely resembles "METHINKS IT IS LIKE A WEASEL".
  5. Repeat the process for this sequence until the target phrase has been produced.

First we define our target phrase and the possible characters than can be used.

import random
def main():
	goal = "METHINKS IT IS LIKE A WEASEL"
	characters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ "

Next we create a method to generate a random sequence of characters. I will pass in the goal so I don't have an extra variable for the length.

def generate_random_sequence(goal, characters):
	sequence = ""
	for i in range(len(goal)):
		sequence += characters[random.randint(0, len(characters)-1)]
	return sequence

Let's go back to the main method to use the random sequence generator.

def main():
	goal = "METHINKS IT IS LIKE A WEASEL"
	characters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ "
	generations = 0
	current_sequence = generate_random_sequence(goal, characters)
	print("Generation:", generations)
	print(current_sequence)
	while current_sequence != goal:
		generations += 1
		print("Generation:", generations)
		print(current_sequence)

If we run this now, we'll have an infinite loop. The next step is to modify the current sequence by mutating it. We will do this by creating a method that takes in the current sequence, creates one hundred mutated copies, and returns the copy that most closely resembles the target sequence.

def get_mutated_sequence(sequence, goal, characters):
	sequence_list = get_sequence_mutations(sequence, characters)
	best_seq = sequence_list[0]
	best_similarity_factor = get_score(best_seq, goal)
	for seq in sequence_list:
		similarity_factor = get_score(seq, goal)		
		if similarity_factor > best_similarity_factor:
			best_similarity_factor = similarity_factor
			best_seq = seq
	return best_seq 

This depends on:

def get_sequence_mutations(sequence, characters):
	sequence_list = []
	for i in range(100):
		sequence_list.append(mutate_sequence(sequence, characters))
	return sequence_list	

And

def mutate_sequence(sequence, characters):
	result = ""
	for i in range(len(sequence)):
		if random.randint(0, 100) <= 5:
			result += characters[random.randint(0, len(characters)-1)]
		else:
			result += sequence[i]
	return result

To calculate the score we count the number of letters in the correct place

def get_score(sequence, goal):
	score = 0
	for i in range(len(goal)):
		if goal[i] == sequence[i]:
			score += 1
	return score

To tie it all together, we modify the current sequence in the main loop by the mutated sequence.

def main():
	goal = "METHINKS IT IS LIKE A WEASEL"
	characters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ "
	generations = 0
	current_sequence = generate_random_sequence(goal, characters)
	print("Generation:", generations)
	print(current_sequence)
	while current_sequence != goal:
		current_sequence = get_mutated_sequence(current_sequence, goal, characters)
		generations += 1
		print("Generation:", generations)
		print(current_sequence)

And to run it, we call main.

main()

Running this program yields this:

YBNBWOMPQYBVOC UELOZYJMU QKG
Generation: 1
YBNBWOKPQIBVOC UEJOZYKMU QKG
Generation: 2
MBNBWOKPQIBVRC UEJOUYKMU QKG
Generation: 3
MBNBWOKPQIBVRC UEJOUAKMR QKG
Generation: 4
MBNPWOKPQIBVRC UIJOUAKMR QKG
Generation: 5
MBNPWOKI IBVRC UIJOUAKMR CKM
Generation: 6
MBNPWNKI IBVRC UIJOUAKMR CKM
Generation: 7
MBNPWNKI IBVRC UIJOUA MR CKM
Generation: 8
MBNPWNKI IBVRZ UIJOUA WR CKM
Generation: 9
MBNPWNKI IBVRZ UIJOUA WR CKL
Generation: 10
MBOPXNKI ITVRZ UITOUA WR CKL
Generation: 11
MBTPXNKI ITVRZ UITOUA WR CKL
Generation: 12
MBTPXNKI ITVRZ UIKOUA WR CKL
Generation: 13
METPXNKS ITVRZ UIKOUA ZR CKL
Generation: 14
METPXNKS ITBRZ UIKOUA ZR CKL
Generation: 15
METPINKS ITBRZ UIKOUA ZR CKL
Generation: 16
METPINKS ITBRZ OIKOUA ZR CKL
Generation: 17
METBINKS ITBRZ OIKOUA ZRACKL
Generation: 18
METBINKS ITBRZ OIKOUA ZRASKL
Generation: 19
METBINKS IT RZ WIKOUA ZRASKL
Generation: 20
METBINKS IH RZ WIKO A ZGASKL
Generation: 21
METBINKS IH RZ WIKO A ZGASKL
Generation: 22
METHINKS IH RZ WIKO A ZGASKL
Generation: 23
METHINKS IH IZ WIKO A ZGASKL
Generation: 24
METHINKS IH IZ WIKF A WGASKL
Generation: 25
METHINKS IH IZ LIKF A WGASKL
Generation: 26
METHINKS IH IZ LIKF A WGASKL
Generation: 27
METHINKS IB IZ LIKE A WGASKL
Generation: 28
METHINKS IT IZ LIKE A WGASKL
Generation: 29
METHINKS IT IZ LIKE A WGASKL
Generation: 30
METHINKS IT IZ LIKE A WEASKL
Generation: 31
METHINKS IT IZ LIKE A WEASKL
Generation: 32
METHINKS IT IZ LIKE A WEASKL
Generation: 33
METHINKS IT IZ LIKE A WEASKL
Generation: 34
METHINKS IT IZ LIKE A WEASKL
Generation: 35
METHINKS IT IZ LIKE A WEASKL
Generation: 36
METHINKS IT IZ LIKE A WEASKL
Generation: 37
METHINKS IT IZ LIKE A WEASKL
Generation: 38
METHINKS IT IZ LIKE A WEASEL
Generation: 39
METHINKS IT IZ LIKE A WEASEL
Generation: 40
METHINKS IT IS LIKE A WEASEL

In just 40 generations of, as Dawkins describes it, cumulative selection, we were able to arrive at a target phrase that has a 1 in 104010^{40} chance of being arrived at by pure chance in single step selection.

Note that there are differences between this program and natural selection - here we explicity set a target phrase, in nature there is no target phrase. However, the principle is almost the same. In fact, it could be comparable to a predator not being able to see certain colours (e.g camouflage).

This is the basis for genetic algorithms. It has a starting population - the randomly generated phrase, it has a fitness funtion - the score method, and there is mutation and reproduction.

PreviousNext

Subscribe to my blog



© 2023