MINISTRY OF EDUCATION AND TRAINING MINISTRY OF NATIONAL DEFENCE MILITARY TECHNICAL ACADEMY CHU THI HUONG SEMANTICS-BASED SELECTION AND CODE BLOAT REDUCTION TECHNIQUES FOR GENETIC PROGRAMMING DOCTORAL DISSERTATION: MATHEMATICAL FOUNDATION FOR INFORMATICS HA NOI - 2019 luan an MINISTRY OF EDUCATION AND TRAINING MINISTRY OF NATIONAL DEFENCE MILITARY TECHNICAL ACADEMY CHU THI HUONG SEMANTICS-BASED SELECTION AND CODE BLOAT REDUCTION TECHNIQUES FOR GENETIC PROGRAMMING DOCTORAL DISSERTATION Major: Mathematical Foundations for Informatics Code: 9 46 01 10 RESEARCH SUPERVISORS: 1. Nguyen Quang Uy 2. Nguyen Xuan Hoai HA NOI - 2019 luan an ASSURANCE I certify that this dissertation is a research work done by the author under the guidance of the research supervisors. The dissertation has used citation information from many different references, and the ci- tation information is clearly stated.
Experimental results presented in the dissertation are completely honest and not published by any other author or work. Author Chu Thi Huong luan an ACKNOWLEDGEMENTS The first person I would like to thank is my supervisor, Dr Nguyen Quang Uy, the lecturer of Faculty of Information Technology, Military Technical Academy, for directly guiding me through the PhD progress. Dr Uy’s enthusiasm is the power source to motivate me to carry out this research. His guide has inspired much of the research in this dissertation.
I also wish to thank my co-supervisor, Assoc. Dr Nguyen Xuan Hoai at AI Academy. He has given and discussed a lot of new issues with me. Working with Prof Hoai, I have learnt how to do research sys- tematically.
Particularly, I would like to thank the leaders and lecturers of the Faculty of Information Technology, Military Technical Academy for supporting me with favorable conditions and cheerfully helping me in the study and research process. Last, but most important, I also would like to thank my family, my parents for always encouraging me, especially my husband, Nguyen Cong Minh for sharing a lot of happiness and difficulty in the life with me, my children, Nguyen Cong Hung and Nguyen Minh Hang for trying to grow up and study by themselves. Author Chu Thi Huong luan an CONTENTS Contents. v List of figures.
vii List of tables. Representation of Candidate Solutions. Initialising the Population. GP benchmark problems.
Some Variants of GP. Linear Genetic Programming. Cartesian Genetic Programming. Multiple Subpopulations GP.
Semantics in GP. Survey of semantic methods in GP. Semantics in selection and control of code bloat. Statistical Hypothesis Test.
TOURNAMENT SELECTION USING SEMANTICS. Tournament Selection Strategies. Tournament Selection based on Semantics. Statistics Tournament Selection with Random.
Statistics Tournament Selection with Size. Statistics Tournament Selection with Probability. Symbolic Regression Problems. Results and Discussions.
Performance Analysis of Statistics Tournament Selection 57 2. Combining Semantic Tournament Selection with Semantic Crossover. Performance Analysis on The Noisy Data. 76 ii luan an SEMANTIC APPROXIMATION FOR Chapter 3.
REDUCING CODE BLOAT. Controlling GP Code Bloat. Constraining Individual Size. Adjusting Selection Techniques.
Designing Genetic Operators. Bloat, Overfitting and Complexity Analysis. Function Complexity Analysis. Comparing with Machine Learning Algorithms.
Applying semantic methods for time series forecasting. Some other versions. Time series prediction model and parameter settings. 113 iii luan an 3.
Results and Discussion. 123 CONCLUSIONS AND FUTURE WORK. 146 iv luan an ABBREVIATIONS Abbreviation Meaning AGSX Angle-aware Geometric Semantic Crossover BMOPP Biased Multi-Objective Parsimony Pressure method CGP Cartesian Genetic Programming CM Competent Mutation CTS Competent Tournament Selection CX Competent Crossover DA Desired Approximation EA Evolutionary Algorithm Flat-OE Flat Target Distribution GA Genetic Algorithms GCSC Guaranteed Change Semantic Crossover GP Genetic Programming GSGP Geometric Semantic Genetic Programming GSGP-Red GSGP with Reduced trees KLX Krawiec and Lichocki Geometric Crossover LCSC Locality Controlled Semantic Crossover LGP Linear Genetic Programming LGX Locally Geometric Semantic Crossover LPP Lexicographic Parsimony Pressure MODO Multi-Objective Desired Operator MORSM Multi-Objective Randomized Similarity Mutation MS-GP Multiple Subpopulations GP MSSC Most Semantically Similar Crossover v luan an Abbreviation Meaning OE Operator Equalisation PC Perpendicular Crossover PP Prune and Plant PP-AT Prune and Plant based on Approximate Terminal RCL Restricted Candidate List RDO Random Desired Operator ROBDDs Reduced Ordered Binary Decision Diagrams RSM Random Segment Mutation SA Subtree Approximation SAC Semantics Aware Crossover SAS-GP Substituting a subtree with an Approximate Subprogram SAT Semantic Approximation Technique SAT-GP Substituting a subtree with an Approximate Terminal SDC Semantically-Driven Crossover SiS Semantic in Selection SSC Semantic Similarity based Crossover SS+LPE Spatial Structure with Lexicographic Parsimonious Elitism TS-P Statistics Tournament Selection with Probability TS-R Statistics Tournament Selection with Random TS-S Statistics Tournament Selection with Size vi luan an LIST OF FIGURES 1 Number of articles about GP. 2 2 Number of articles using semantics in GP .1 GP syntax tree representing max(x + x, x + 3 ∗ y).2 An example of crossover operator.3 An example of mutation operator.4 An example of LGP program.5 An example of CGP program.6 Structure of MS-GP.7 Running the program p on all fitness cases .8 An example of calculating the desired semantics of the selected node N .1 Testing error and Population size over the generations with tour-size=3.1 An example of Semantic Approximation .3 Average bloat over generations on four problems F1, F13, F17 and F25.4 Average overfitting over the generations on four problems F1, F13, F17 and F25.5 Average complexity of the best individual over the gener- ations on four problems F1, F13, F17 and F25.
108 vii luan an 3.6 An example of PP-AT.7 Plot of log(unit sale + 1) from 9/1/2016 to 12/31/2016.8 Testing error over the generations.9 Average size of population over the generations. 121 viii luan an LIST OF TABLES 1.1 Summary of Evolutionary Parameter Values .2 GP benchmark regression problems. Variable names are, in order, x, y, z, v and w. Several benchmark problems in- tentionally omit variables from the function.
In the train- ing and testing sets, U [a, b] is uniform random samples drawn from a to b inclusive, and E[a, b] is a grid of points evenly spaced from a to b inclusive .1 Problems for testing statistics tournament selection tech- niques .2 Evolutionary Parameter Values.3 Mean of best fitness with tour-size=3 (the left) and tour- size=7 (the right).4 Median of testing error with tour-size=3 (the left) and tour-size=7 (the right) .5 Average of solution’s size with tour-size=3 (the left) and tour-size=7 (the right) .6 Average semantic distance with tour size=3. Bold indi- cates the value of SiS and TS-S is greater than the value of GP.7 Average percentage of rejecting the null hypothesis in Wilcoxon test of TS-R and TS-S with tour-size=3.8 Median of testing error of TS-RDO and four other tech- niques with tour-size=3 (the left) and tour-size=7 (the right). 66 ix luan an 2.9 Average of solutions size of TS-RDO and four other tech- niques with tour-size=3 (the left) and tour-size=7 (the right) .10 Median of testing error on the noisy data with tour-size=3 (the left) and tour-size=7 (the right) .11 Average running time in seconds on noisy data with tour- size=3 (the left) and tour-size=7 (the right) .12 Average execution time of a run (shorted as Run) and average execution time of selection step (shorted as Tour) of GP and TS-S in seconds on noisy data with tour size=3.13 Median of testing error and average running time in sec- onds on noisy data with tour-size=3 when the statistical test is conducted on 100 fitness cases. The left is the median of the testing error and the right is the average running time.1 Evolutionary parameter values .2 Mean of the best fitness .3 Average percentage of better offspring .4 Median of testing error .5 Average size of solutions .6 Average running time in seconds .7 Values of the grid search for SVR, DT and RF .8 Comparison of the testing error of GP and machine learn- ing systems.
The best results are underlined.9 Mean of the best fitness .10 Median of testing errors .11 Average of solution’s size .12 Average running time in seconds .1 Mean best fitness on training noise data with tour-size=3 (the left) and tour-size=7 (the right) .2 Average of solutions size on training noise data with tour- size=3 (the left) and tour-size=7 (the right) .3 Mean of best fitness with tour size=5. The left is original data and the right is noise data.4 Median of testing error with tour size=5. The left is orig- inal data and the right is noise data.5 Average of solution’s size with tour size=5. The left is original data and the right is noise data.6 Mean of best fitness of TS-RDO and four other techniques with tour size=5.
The left is original data and the right is noise data.7 Median of fittest of TS-RDO and four other techniques with tour size=5. The left is original data and the right is noise data.8 Average of solutions size of TS-RDO and four other tech- niques with tour size=5. The left is original data and the right is noise data. 154 xi luan an INTRODUCTION Machine learning is a branch of artificial intelligence that provides the capability to automatically learn and improve from past experience to make future decisions.
The fundamental goal of machine learning is to generalize or induce an unknown rule from examples of the rule’s appli- cation. Machine learning has been studied and applied in many different fields of science and technology. It can be said that most smart systems today are the application of one or more machine learning methods. A GP system is started by initializing a population of individuals.
The population is then evolved for a number of generations using genetic operators such as crossover and mutation. At each generation, the individuals are eval- uated using a fitness function, and a selection schema is used to choose better individuals to create the next population. The evolutionary pro- cess is continued until a desired solution is found or when the maximum number of generations is reached. Since first introduced in the 1990s, GP has been successfully applied in a wide range of problems, especially with applications in classifica- tion, control and regression.
Figure 1 surveys the number of GP articles 1 luan an indexed in Scopus1 over a period of 19 years, from 2000 to 2018. The figure shows that GP studies have rapidly increased in the 2010s, about 750 articles per year, and remained roughly stable to date. Figure 1: Number of articles about GP Comparing to other machine learning methods, GP has some advan- tages. Firstly, GP has the ability to simultaneously learn models (the structure of solutions) and parameters of the models while other methods often have to pre-define models and then find parameters.
Secondly, the solutions found by GP are probably interpretable. Recently, several re- searches have shown that GP can be used to evolve both the architecture and the weights of a Deep Learning model effectively [40, 109]. Inter- estingly, in 2018, GP outperformed neural networks and deep learning machines at video games [46, 47, 121]2. However, despite such advantages, GP is not well known in the main- stream AI and Machine Learning communities.
One of the main rea- 1 https://db.vn:2088/search/form.uri?display=basic 2 https://www.com/s/611568/evolutionary-algorithm-outperforms-deep-learning-machines-at- video-games/ 2 luan an sons is that the evolutionary process is often guided by only syntactic aspects of GP representation. Consequently, there is complex, rugged genotype-phenotype mapping, and low similarity of offspring to parents. An offspring generated by changing syntax may not produce the desired result, or a small change in syntax can significantly change its output (behavior). For example, if we replace the structure x ∗ 0.001 in a tree with the structure x/0.001 that is a small structural change (replacing ‘*’ with ‘/’) but leads to a significant change in behavior.
Algorithms based solely on structure as that often do not achieve high efficiency since, from a programmer’s perspective, programs must be correct not only syntactically, but also semantically.