This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.
Tetris is a very famous computer puzzle game that invented by a Russian scientist, Alexey Pajitnov at 1984. Tetris is played on a two dimensional rectangular grid board that consists of 20x10 small squares. The game board is empty at the beginning, once the game is start different pieces will fall down from the top of the game board. There are seven different shaped pieces that look like "L", "J", "S", "Z", "I", "O", "T", each of them is build up by four squares that will filled up the game board space. When the piece is falling from the top, the player can rotate it either 90 degrees or 180 degrees and translate it either left or right until that piece hit the ground and become static. New piece will continuously enter the board once the previous piece freeze. The player's goal is to play the game as long as possible by arrange the pieces to build a completely filled horizontal line, completed line will burned and all the pieces above that line will fall down to keep the height as low as possible. The game will continue until there are no spaces available for the incoming piece. This game will become more and more difficult for human player to clear the line when at the higher level because the speed of the falling pieces will increase as well.
The first version of Tetris is on an Electronica 60 at 1984. After launched at IBM PC, Tetris spreads very quickly and become very popular throughout the world beyond the Soviet Union. Few years later, Tetris was licensed to Nintendo Company; Nintendo then launches Tetris in the Game Boy platform. This game become more and more well known until 1996, the Tetris Company is finally formed to control the quality standards and handle the licensing problem. Tetris become the first mobile phones game in the Japanese market at 2000 then rapidly conquer the mobile market in North America until become the leader in mobile games at 2002. Besides mobile phones games, Tetris also become Apple's first downloaded game for iPod. The development of Tetris not stops here, many researchers starts to build a controller for the Tetris in order to let it play without human interaction. Artificial intelligent is then introduces to Tetris to evolve it become more and more intelligent. Until today, the researcher still fights to find the best algorithm that bring the best result to Tetris.
A system is intelligent when it has the ability to adapt its behavior in the ever changing environment. In order to create an intelligent behavior, the system must first need to get through evolution process. Evolution process on a computer is stimulates by evolutionary computing. After some generations, the survivor has the learning capability to predict the changes in the environment. There are different methods of evolutionary computing that can be used to evolve a system like genetic algorithms, evolutionary strategies, genetic programming and evolutionary programming.
From the research, the first people who first apply evolutionary algorithm to Tetris are Dellacherie which fixed the feature weights by hand and he successfully removed average 66 x 104 lines. After some failure face by other researcher that used reinforcement learning to learn the weights, there are another technique appear that is using optimization techniques to learn the feature weights. The first people apply evolutionary computing to design the artificial players for Tetris are Siegel and Chaffee. They using genetic programming and get a mean score of 74 over 20 runs3, while Llima used genetic algorithm and get an average score of 5 x 104. Lastly Amine Boumaza running her experiment using evolutionary strategies and get the result of 36.3 x 106 on average. There is no researcher using evolutionary programming to Tetris player yet.
Tetris consists of a very large number of game state that approximate 1060, this limit the exploration to the game and hence extending the evaluation process. Because of this problem, some version of Tetris is introduced by researcher in order to shorten the evaluation time. There are two types of Tetris: one piece Tetris game and two piece Tetris game. One piece Tetris only know the current piece while two piece Tetris provide the next piece hint. Two piece Tetris is quite straight forward and easy if compare to one piece Tetris strategy. Beside include the hint for the next piece, some researcher also create a reduce form of Tetris where the game board reduce to 6x12 instead of 10x20 game board. This reduced version also consist fewer kinds of pieces compare to the full version standard Tetris.
2.2 Decision Making Techniques and Their Related Work
Decision making technique is acts as a structure to implement artificial intelligent. There are many different types of decision making techniques like finite state machine, decision tree, rule-based system, artificial neural network, and the others. Finite state machine is the most efficient technique in artificial intelligent among the other techniques stated above.
2.2.1 Finite State Machine
Figure 2.1 Finite State Machine's Example
Finite state machine is a behavior model that consists of a finite number of states, and state transitions. The state transition working among the states, it defines the output symbol and the next state to go depending on the current state and the current input symbol without going backward. The operation of a finite state machine start from a state that start state then goes through a number of transition until accept state is reach. Finite state machine is similar to a flow graph that show the way a logic run to meet a certain conditions. There are few types of finite state machine: deterministic finite state machine, non-deterministic finite state machine and hierarchical finite state machine.
The application of finite state machine can found in the game of prisoner's dilemma in the journal "Acquisition of General Adaptive Features by Evolution" by Dan Ashlock and John E. Mayfield. A fashion-based cellular automata with finite state machine as the representation of the state that originally modified by Lindgren and Nordhall is used. The cellular automata using is a 100x100 rectilinear cell with the finite state machine as the set of states at most 255 states using the Mealy machine architecture. A score is generated at each cell by playing a noisy iterated prisoners dilemma against its up, down, left and right neighbors and also against environment (a fixed finite state machine). After the neighbor that has the highest score is adopted, randomly select a small number of finite state machines to undergo mutation operation.
The game of Prisoner's dilemma is follows the rules if the players decide to defect at the same time, a score of D will be assigned while C will be assigned if both cooperate. The defector receives a score of H and the cooperator gets the score of L if their decision is different. In this case, L=0, D=1, C=3, H=5. The transition's input string of the finite state machine is labeled with (opponent's last action)/ (the machine current action). In this research, initially the finite state machine consists of eight states, by mutation operation; the number of state can be change from 1 to 255. The score obtained for each cell is used for updating the states of the cellular automata. A score matrix indexed for cell is computed by the finite state machine and updated at the beginning of each time step.
After update the states, mutation operation is carried out with the mutation rate of 0.4 per time step. There are three types of mutation used in this paper: basic mutation, growth mutation and contraction mutation. Basic mutation is used to transit the current state to the next state by randomly change the action, transition or a response of a finite state machine. Growth mutation is used to add a state to the machine if the total number of state is less than 255 states (the maximum number of state that a finite state machine can has in this research). Contraction mutation mainly is to delete a state from a finite state machine. The transitions that connected with the deleted state are reassign to the remaining states.
Dan Ashlock and John E. Mayfield run the experiment using seven environment strategies: null, always cooperate, always defect, Tit-for-Tat, anti-Tit-for-Tat, Tit-for-Two-Tats and Pavlov. They set that for every 10th time-step, a summary statistics (include the mean and standard deviation made by the machine) will be taken down. A comparison is made between the olden and modern populations every 30 simulations for a specific environment. The result of the experiment proved that the hypothesis (probability of a modern population out-competing an older population is 0.5) is not accurate. A probabilistic of above 0.5 is achieved.
2.2.2 Rule-based systems
A set of rules, a working memory that stores temporary data and an inference engine build up a rule-based system. The rules consists of some simple condition-action pairs while the working memory save the current state of the system every time the rules are execute that will cause some changes. Inference engine used to handle the condition when more than one rule is qualify to execute. There are two types of inference mechanisms that used by inference engines: forward chaining and backward chaining.
Forward chaining is data-driven reasoning that works with the known data and continues with that data. Forward chaining works only the topmost rule and when that rule is execute, a new fact is added to the database. Forward chaining mainly is for information gathering, it is not suitable for the system that has a goal. Backward chaining is goal-driven reasoning that fitting for system that has a goal. The inference engine searches the database to find data that match the rule. Some sub-goal is created if the goal cannot direct been proven by the rule. Stacking of sub-goal will continues until no rules are found to prove the current sub-goal. Backward chaining is more effective for a situation that one particular fact need to be inferred.
Rule that match the memory
Figure 2.3 Rule-based System that Apply on AI Game Cycle
Daniel Policarpo, Paulo Urbano and Tiago Loureiro implement rule-based behavior with the dynamic scripting technique to evolve the first person shooting (FPS) game. The journal "Dynamic Scripting Applied to a First-Person Shooter" describes the cooperation of dynamic scripting with adaptable rule-based behaviors in Non-Player Character (NPC). Dynamic scripting is a kind of reinforcement learning that keep a list of action for an opponent to choose in order to play effectively against human player.
There are four components that build the FPS AI: behavior, movement, animation, and combat. Behavior component comes out the character's objectives, state and his immediate destination. Movement component in charge on how the character moves and navigating in the game. How the character looks like in the game is under the animation component. Lastly and is the most noticeable is the combat component that in charge in the tactics selection when combat.
Given a set of rules from a rule base, dynamic scripting that is an unsupervised learning algorithm can efficiently construct a proper behavior for a particular situation. The behavior of a character is inserted in a game script. Different opponent type may have different list of rules inside their corresponding rule base. Each rule has its own rule weight that used to define the probability of a particular rule to be selected. Rule weight is subject to change accordance with the success or failure of the rules during a combat. Dynamic scripting enables the scripts to be flexible, adaptable and even forecast the game strategies.
A set of rules, script selection, rule policy, and rule value updating compose the dynamic scripting. Normally the developer will creates a set of rules with a weight rule associated with each of the individual rule. Every learning episode will update the rules in the rule bases then the script selection component will select a number of rules based on the assigned weight value from a complete set of rules based. Within the learning episode, rule policy determines how a rule is selected. Rules are ordered according to their priority, the first rule are normally being selected. Rule value updating is follows the reward function from the feedback of each of the rule on the learning episode. Rule that has good performance will giving a high reward while those having bad performance will giving low reward.
An experiment is running out to compare the dynamic scripting controlled character with the static AI controlled character. When either one of the characters was defeated, the environment will restart again as the initial situation, the rule weights for the dynamic scripting is discarded and updated at every learning episode. The result shown that the dynamic scripting is performs better than static AI because it successfully exploits all the weaknesses of the static AI agent. Implementing and testing static AI also proven to be more time consuming than the dynamic scripting.
2.2.3 Decision Tree
Decision tree is a technique that makes use of a set of instances to make decision. Decision tree structure consists of a root and leaves, leaves are the branches for the root. The non-leaf nodes are labeled as the attribute. Attributes indicate the final decision for every branch.
Typically decision tree is constructed by the "divide-and-conquer" approach. First of all, an attribute is selected and act as the root of the tree, the root is then splits up and divides into two or more path, each of them contain the different subset from the dataset. Each leaf represents a branch and continues split up until no more subset is available.
After completely construct a decision tree, convert each path in the tree from the beginning (root) until the last leaf node into a set of rule. There is a problem related to the model structure of the decision trees. Instable path choosing maybe occur when the variable is highly correlated with one another, then when multiple tests is carries out, mostly the same variable will enter the tree but in different order. The split in the tree may shift from one variable to another when a small sample data is applied; this situation is called the "variable masking".
One of the used of decision tree is at the soccer agent. In the paper "Learning Decision Trees for Action Selection in Soccer Agents", Savas Konur, Alexander Ferrein and Gerhard Lakemeyer applied the decision tree learning method to C4.5 algorithms to RoboCup's Simulation League. C4.5 is an algorithm developed by Ross Quinlan that used to generate a decision tree. There are different player in a match: defenders, midfielder, and attacker; this method learn the action selection technique for the whole team whenever a player is in ball possession. The benefits of using decision tree in this approach are a larger set of game situations can be applied and performs well if compare to the previous method. Moreover the learning method of a decision tree produces a set of qualitative features to classify the game situations that are useful in reactive decision making.
The researcher faced some challenges when encode the artificial soccer agent. Those challenges include the difficult in determined which action should be chosen since the characteristics of a game situation is unclear. If the characteristics is clear, the researcher also faced some problem on translate them into rules for decision making. Hence, decision tree learning (C4.5) is used to cope with the challenges above. When the building the decision tree, the rules for action selection is automatically encoded. Decision tree also more readability by human if compares to other decision making techniques. The expandable nodes and leaf enables the decision tree do not to decide all the relevant features beforehand and hence it can covers wider range of game situations and actions.
The reactive soccer agent in this paper performs a skill hierarchy with the highest level is the meta-level skills followed by high-level skills, intermediate skills, low-level skills and lastly is the soccer server commands. Low-level skills layer consists of the basic action for a player, intermediate skills layer defines the actions based on the action taken by low-level skills layer, while the high-level skills action comes out with the desired behavior by using the intermediate layer. Meta-level skill is used to overcome the problem that high-level skills layer cannot be called directly because lack of some arguments in order to be executes.
The different type players in a match increase the level of difficulty of choosing an appropriate attribute set. Over fitting will happen if the attributes choose is irrelevant. The researchers divide the field into different region, use those region together with the player types as the attribute in one decision tree to solve the over fitting problem. Training are generated by the supervisor monitor that storing the output (action) and the input (world model information) that comes from the soccer server together. With those input from the soccer server, it can make sure that the training and the test data done by the supervision process is almost from the same distribution. The supervisor will advised the agent how soccer is played depends on what type of player involves and the region it plays in.
The model generation phase starts right after the training phase. With those input files, the C4.5 system executes the program and then produced the decision tree model. The decision tree is passes the consult phase to select an action in play situation by an agent. Whenever a process is in the ball-kickable margin, it will starts the process by calculates the attribute values based on its world situation then the decision tree will offer an action category for the agent using those values, the consult process will halts for this moment. This process is repeated until the end of the game.
From the experiment, it shown that increase of the number of examples will cause the decreases of the error rate. The occurrence of the error is because of the mistakes done by the supervisor in giving contradictory examples and large number of continuous attributes being used. From the result also note that when the training data become bigger, the sizes of the trees will also increase. In conclusion, decision tree is suitable in the selection of relevant attributes when qualitative world descriptor is given.
2.2.4 Artificial Neural Network
Typical artificial neural network architecture is a hierarchy layers that consists of input layer, middle layer and output layer. All the neurons in the networks are arranged along these layers, neurons is a very simple and highly interconnected processor. Signal passing between the neurons is done by the weighted link that connects all the neurons together. Each neuron can receive more than one input signals but it produce only a single output signal. This output signal is then splits to a number of branches that transmit the same signal through the neuron's outgoing connection.
An artificial neural network learns by a numerical weight that associates at each links that connects all the neurons together. Weights are the long term memory in artificial neural network that deliver the strength of each neuron input. By repeat the step of changing and adjusting the weight at the memory.
Figure 2.4 Architecture of a Typical Artificial Neural Network
Neural network job on the Tetris can be found in the journal of Nicholas Lundgaard and Brian McKee in the "Reinforcement Learning and Neural Network for Tetris". Reinforcement learning is a machine learning technique that make decision based on the environment then change the environment by the decision it made and receive a reward. Neural network is a supervised learning technique, the difference between the model prediction and the expected result used to modify the weights of the neural network. Combination of reinforcement learning with neural network can uses for unsupervised learning. In this case, the output of the net is the expected reward that can determine what action to take. Weights adjustment also change from to use the difference between the expected reward and the actual reward received.
The goals of Nicholas and Brian are to research into the effectiveness of unsupervised machine learning techniques towards the game of Tetris and the compared the effectiveness of this technique with other techniques. There are few agents implemented in this paper: random agent, heuristic agents, reinforcement learning agent and neural network agent. The researcher running the experiments using neural network and reinforcement learning and the input of those experiments are the raw configuration of the Tetris board and the higher level state representation.
Random agent is the easier operations that only randomly generate a rotation and translation values for each of the falling pieces without taking care about the types of pieces falling down. Heuristic agent is the best agents to compare with the learning agents as it always chose to perform a single high-level action as the learning agents. There are two types of learning in the reinforcement learning agent: Q-Learning and the SARSA learning. Q-Learning is used in this case as it is a good fit for the neural network compared to the SARSA learning. Reinforcement learning agent evolves a function that measures the value for an action towards a particular state based on the reward function (points scored). This performance feedback (reward function) can gradually adjust the value of an action. Neural network used the input state to predict the value of a particular action. There are three types of operation integrated with the neural network to update the weights: Q-Learning and back-propagation for the neural network, low-level neural net and high-level neural net.
The experiment is running over 500 games in Tetris using the various types of the agents to test the result for each learning agents. The researcher running heuristic agents in three types of action: maximize lines, minimize holes and minimize column height, and clear board agent. Maximize lines performs badly among the heuristic agents, it simply placed the falling piece that resulted in clearing the most lines. Maximize lines agent is even worst than the random agent that can only cleared last than 10 lines. The result of minimize holes and minimize column height is quite similar where the number of lines cleared are around 9.532 lines. Clear board agent is the combination of minimize holes and minimize column height action, it managed to clear an average of 466.494 lines. The result for reinforcement learning is dissatisfactory; however the performance can be improved by playing a reasonably expert agent in a competitive environment.
Low-level neural network agent faced some problem according to the state space. The large state space makes the training become very time consuming. At the beginning of the training, action choosing is based on the greedy policy but the agent will soon explore random actions and the performance will stunt. Instead of simply rotating and translating pieces, high-level neural network agent operates on the high-level state representation to choose high-level action. Before the exploration, the result shown that clear board agent is better than high-level neural net agent as it only managed to clear 305.312 lines compare to the clear board action agent's 466.494 lines. After the exploration to the other actions, neural net agent learns a new and more effective strategy that is building a valley action to score more points per line cleared.
Over those 500 games, the winning rate of 29.75% for the neural net versus 71.25% for the opponent is obtained. Mostly the neural net agents lost to the opponent are because of their risky behavior. When the neural net played the game of Tetris alone without compete with other agents, the result seem to be improve as the number of training increased.
2.3 Evolutionary Algorithms and Their Related Work
There are some components under evolutionary algorithms: representation that contain the definition of a particular individual, evaluation function (sometimes also called as fitness function), population, parent selection mechanism, variation operators that using either mutation or recombination, and lastly is the survivor selection (also known as replacement). The different between crossover and mutation are crossover operator exchanges only parts from two single chromosomes while mutation operator changes the gene value in a location that randomly chosen. A chromosome's performance is measured by fitness function; a good chromosome will has high fitness value and will survive for some generation then gradually dominate the population.
Start2.3.1 Genetic Algorithm
Randomly generate a population of size N
Calculate the fitness of each chromosome
Termination criterion satisfied?
Select a pair of chromosome for mating
Form new population
Size of new population = N?
Replace the parent chromosome with the offspring chromosome
Figure 2.5 Basic Genetic Algorithms Major Step
According to the journal paper "Using a Genetic Algorithm to Weight an Evaluation Function for Tetris", Llima used Genitor-based genetic algorithm to search for the optimal weights to evaluate Tetris agents. The Tetris simulator created by the researcher is different from the standard Tetris rules in sense of the real-time aspect is eliminated. This experiment is using one piece Tetris which the agents know only the current piece, they use internal Tetris simulator to play then calculate the utility of the next board state by evaluation function. The agent will play exactly the same rotation and translation that get the highest utility at the actual game. The weighted linear sum of numerical features that calculate from the state is used to determine the utility of a board's state. Those features used include pile-height, the number of closed holes, the number of wells, the number of lines that were just made, and a number that show the bumpiness of the pile. The agents will reward when a line is making and penalize to everything else.
First of all, binary representation is used to weight the chromosomes, and then two parents were randomly selected from the population to recombine to make one child for that generation. Recombination also called crossover. Throughout the whole experiment, uniform crossover operation finally in used after judging among one-point, HUX, and reduced surrogate one-point crossover operation. Mutation will then take over the job by flipping each bit on the copy of the child with some probability, both original and mutated children is then evaluated and added into the population. The worst members will eliminate while the good members will remain at the population. In this experiment, all agents are played on a game that having the same sequences of pieces for every evaluation in order to get a fair game. Chromosomes were evaluated by an agent when playing Tetris while an agent's fitness was determined by the number of line that able to clear out before losing.
The result of the experiment gets an average score of 5 x 104 lines cleared. The most rewarded feature was the number of lines cleared. However this are not always true for different play in case of strategic tradeoffs occurs.
2.3.2 Evolution Strategies
Determined number of generations, N
Randomly select an initial value for each parameter to generate parent population
Calculate the solution for the parent parameters
Create offspring parameters
Calculate the solution for the offspring parameters
Is the offspring better than the parents?
Replace the parent population with the offspring population
N is reached? Or satisfactory solution is reached?
Figure 2.6 Basic Evolutionary Strategies Major Step
Evolution strategies use self adaptation strategy in illustrating some useful feature to evolve the agents. The most important steps in evolution strategies are to update the step size and the covariance matrix of the search distribution. The adaptation ability helps the algorithm adjust itself in the search space, larger step sizes is to increase exploration of the space while smaller step sizes used to increase exploitation of good solutions. Amine Boumaza using this algorithm to evolve Tetris agents is to proved that evolution strategies can compete with or even overtake the best reported player, that is Dellacherie's hand coded player at the research paper "On the Evolution of Artificial Tetris Player".
The experiment is divided into two parts: the learning process and the evaluation process. Agents will learned the weights of the player's strategy using covariance matrix adaptation evolution strategy (CMA-ES) in the learning phase, after that the player's strategy will enter the evaluation process to further judging its performance by a series of games. At every generation, the game being played is the same for all individual in order to prevent the flawed of fitness values ranking. The author runs the experiment on an average of 100 games with the confidence intervals of Â±20% and probability 0.95. At the initiation of learning process, the author assigns a zero vector and 0.5 step sizes for a fixed number of generations. This procedure is repeated for a certain number of times thus at the end a mean vector is recorded and the average of these recorded mean vectors is the player's strategy that the researcher evaluate.
There appear some problem when the players learn better strategies throughout their evolution; they can play longer before losing their game. Playing longer means the evaluation time increase at the learning phase and hence slow down the learning process. There are many different way using by different author to reduce the time, the author testing the player using both 10 x 20 and 6 x 12 game board and also increasing the probability of getting the "Z" and "S" pieces to make the game harder. From the result it shows that player that learned on bigger game board performed better compare to those learned on small game board, hence a better way to reduce the learning time is to learned harder game using 10 x 20 game boards.
The final result shows that the player learned using CMA-ES performs better that those player using hand fixed weights on 10 x 20 and also 6 x 12 game boards.
Start2.3.3 Genetic Programming
Create an S-expression of size N as initial population
Execute and calculate its fitness then assign the best result for that run
Termination criterion met?
Place the best S-expression in the new population
Genetic operator selection
Select an S-expression to produce a mutant then place into new population
Select a pair of S-expression to produce a pair of offspring then place into new population
Select an S-expression to produce the same offspring then place into new population
Size of new population = N?
Replace parent population with offspring population
Figure 2.7 Basic Genetic Programming Major Step
In the research paper "Genetically Optimizing the Speed of Programs Evolved to Play Tetris" introduce aggregate computation time ceiling to allocate the computation time among the fitness cases in order to control the execution time of an evolved program. There are two categories that used to optimizing the computation time of evolved programs: a traditional post-hoc methods and genetic approach. Optimizing computation time genetically is more popular then post-hoc methods, Siegel and Chaffee also used this approach to implement his experiments. The reason genetic approach is more favorable is because the genetic search process check up the problem domain and may discover a shot cut for it. The method used by Siegel and Chaffee to evolve Tetris is parse tree of canonical genetic programming with the addition of indexed memory. Tetris is chosen but not other game is because it is a feasible test problem for dynamic computation time allocation.
The indexed memory apply to Tetris in the research is the array of 128 integers values, READ and WRITE function are used to access these indexed memory, the return value for the parse tree is always within the range of 0 to 127. The parse tree terminals include the current piece type ("L", "J", "S", "Z", "I", "O" or "T"), the current piece's rotational and orientation (four possibility), and another five Boolean flags (one is the indicating when a new piece enter the board and the others indicating is the sensor is at the board's boundaries). The wrappers indicating what the sensor's action: move up, down, left, right, halt (immediately drop the piece), and do nothing.
The Tetris fitness case is the game play on the board, it is determined by the particular series of pieces arrangement that enter the game board. The sequence of pieces for every game is randomly generated; hence the performance of the evolved game must be well on average for a successful program. Only few fitness case perform very well is not enough to measure the fitness of an evolved program accurately, a number of fitness cases must be evaluated in order to get a better measurement since the accuracy can be improved by increasing the number of fitness cases. Increasing the number of fitness case can cause the computation time become longer; progressive fitness measure is used here to efficiently distribute time among good and bad individuals.
The evolved program start to work once a new piece is enter the game board by scanning the game board with the mobile sensor to determine where to drop the piece. Content of the board and the type of piece are the determinant of the number of times the sensor needs to be moved by the evolved program. To allocate computation time cycle among fitness cases efficiently, an aggregate computation time ceiling is applies over a series of fitness cases to strategically allocate their allotted computation time to maximize the fitness. Either spend too many or too little time on each fitness case will also get a low fitness measurement. Aggregate computation time ceiling differs from multi objective optimization in that aggregate computation time ceiling constraint the time spent for each fitness case while multi objective approach penalize those evolved program that take longer time.
There are results show that by implement aggregate time ceiling to Tetris players, the allocation of computation time become more efficient if compare to those without time constraint. With time limit maintain on both fitness and test case, the evolution of Tetris players is much more successful.
Start2.3.4 Evolutionary Programming
Randomly initialize population
Mutation operation to produce offspring
pop1 (parent) and pop2 (offspring)
Individuals compete to form next generation
new_pop (the fittest)
Figure 2.8 Basic Evolutionary Programming Major Step
Lawrence J. Fogel is the first researcher uses evolutionary programming to generate evolution as a learning process in order to achieve artificial intelligence. Evolutionary programming normally used finite state machine as their representation in the evolved system to predict the environment. The paper "An Evolutionary Programming Approach to Self-Adaptation on Finite State Machine" by Lawrence J. Fogel, Peter J. Angeline and David B. Fogel focus on the experiments in evolutionary programming with the used of self-adaptive parameters. Evolutionary programming and evolutionary strategies emphasized the behavioral link between the parents and the offspring instead of genetic link. Self-adaptive parameters are represented as finite state machine that consists of the information about the generation of offspring. These parameters combined with the mutation and selection to apply in real-valued function to optimization the problem offered.
Fogel et al. (1991) offered self-adaptive procedure for the evolutionary programming that Gaussian distribution was used to alter the standard deviations. Gaussian distribution has the ability to fine tune the current parents. The formulas corresponding are:
x'i = xi + N(0, Ïƒi )
Ïƒi = Ïƒi + . Ïƒi . N(0,1)
where x is the object variable to be assessed by the F(x), Ïƒ is the strategy parameter(standard deviation) that the value is approximately to 0.2. By preventing the strategy parameter very close to the 0, it allows the effective mutation. N(0,1) is the Gaussian distribution with zero means and standard deviation 1. While is a scaling parameters.
The experiment is start by judging each machine of the fitness function, tournament selection produce a tournament score based on the fitness function. In each competition, a "win" is assign to the parents that has the greater fitness. The parent that has the greatest number of "win" will form the population for the next generation. Each parent will create a single offspring to put into the next population. This process is repeated until the expected result is achieved.
There are two self-adaptive program examined in the finite state machine: selective self-adaptation and the multi-mutational self-adaptation. In self-adaptation method, each component of a finite state machine was combined with a mutability parameter to determine which component being chose in each mutation operation. The mutability parameter is initially set to 0.001 in the calculation of the probability for particular component to be mutated. While for the multi-mutational self-adaptation, the probability of an offspring to be chosen is independent from the other. When the uniform random variable number is greater than the offspring's mutability parameter, it will be mutated.
From the experiment, it clearly shown that with self-adaptive parameters can improved the practicality of the evolutionary optimization algorithms. Besides that, it also proved that self-adaptation is beneficial on discrete structure like finite state machine.
In this chapter, there are four decision making techniques and four evolutionary algorithms being introduced. The decision making techniques include finite state machine, rule-based system, decision tree, and artificial neural network while the evolutionary algorithms include genetic algorithm, evolution strategy, genetic programming, and evolutionary programming. From the literature reviews, it show that the Tetris game already done using the genetic algorithm, evolution strategy, genetic programming, and artificial neural network. Since there are still no research is done using the evolutionary programming, the motive of this project is to evolve the Tetris game using evolutionary programming with finite state machine as the decision making technique. Feedforward neural network is used at the testing phase for further testing the validation of a trained data set and predict is there are an error occur during the training in order to get a better evolution.