Benchmarking the Generalization Capabilities of a Compiling Genetic programming System using Sparse Data Sets   [GP] [DS]

by

Francone, F., D., Nordin, P. and Banzhaf, W.

Literature search on Evolutionary ComputationBBase ©1999-2013, Rasmus K. Ursem
     Home · Search · Adv. search · Authors · Login · Add entries   Webmaster
Note to authors: Please submit your bibliography and contact information - online papers are more frequently cited.

Info: Genetic Programming 1996: Proceedings of the First Annual Conference (Conference proceedings), 1996, p. 72-80
Keywords:Genetic Programming, Genetic Algorithms
Notes:
GP-96 Notes based upon version submitted to GP-96 Wed, 17 Apr 1996 09:20:19 PDT When I read your email (koza's), I went back and checked the output on two other problems that we ran as part of that paper. Gaussian 3D and Phoneme Classification. Each of these was a two output problem and the way the classification was set up, one would expect less than 50% correct classification from a randomly created individual. In those problems, we used 10 different random seeds, 3000 individuals per run. The following were the results for the best individual from generation 0's classification rate. Mean Best Worst gauss 0.59 0.64 0.55 iris 0.98 0.99 0.97 phoneme 0.73 0.75 0.71 Note that these figures represent the results of a random search of 30,000 individuals. As Peter Nordin points out in his email to which this is a reply, on the IRIS problem, even the worst figure is very good. In fact it was statistically indistinguishible from a highly optimized KNN beachmark run on twice as large a training set. This is because the IRIS problem is trivial. As pointed out in the above referenced paper, IRIS should probably not be used as a measure of the learning ability of any ML system, notwithstanding its status as a 'classic' problem. It is probably better characterized as a 'classic' way to make a ML system look good. On the other two problems, which were much more difficult, the genetic search improved on the random search considerably. The individuals with the best abilitiy to generalize on the test data set were respectively. Best Generalizer Gaussian 3D 72% Phoneme 85% I report these figures here because the generation 0 figures are not reported in the above paper directly. Regards Frank Francone
Author(s) DL:Online papers for Nordin, P.
Online papers for Banzhaf, W.
Internet search:Search Google
Search Google Scholar
Search Citeseer using Google
Search Google for PDF
Search Google Scholar for PDF
Search Citeseer for PDF using Google

Review item:

Mark as doublet (will be reviewed)

Print entry



BibTex:
@InProceedings{francone:1996:bench,
  author =       "Frank D. Francone and Peter Nordin and Wolfgang
                 Banzhaf",
  title =        "Benchmarking the Generalization Capabilities of a
                 Compiling Genetic programming System using Sparse Data
                 Sets",
  booktitle =    "Genetic Programming 1996: Proceedings of the First
                 Annual Conference",
  editor =       "John R. Koza and David E. Goldberg and David B. Fogel
                 and Rick L. Riolo",
  year =         "1996",
  month =        "28--31 " # jul,
  keywords =     "Genetic Programming, Genetic Algorithms",
  pages =        "72--80",
  address =      "Stanford University, CA, USA",
  publisher =    "MIT Press",
  size =         "9 pages",
  notes =        "GP-96 Notes based upon version submitted to GP-96

                 Wed, 17 Apr 1996 09:20:19 PDT

                 When I read your email (koza's), I went back and
                 checked the output on two other problems that we ran as
                 part of that paper. Gaussian 3D and Phoneme
                 Classification. Each of these was a two output problem
                 and the way the classification was set up, one would
                 expect less than 50% correct classification from a
                 randomly created individual.

                 In those problems, we used 10 different random seeds,
                 3000 individuals per run. The following were the
                 results for the best individual from generation 0's
                 classification rate.

                 Mean Best Worst gauss 0.59 0.64 0.55 iris 0.98 0.99
                 0.97 phoneme 0.73 0.75 0.71

                 Note that these figures represent the results of a
                 random search of 30,000 individuals.

                 As Peter Nordin points out in his email to which this
                 is a reply, on the IRIS problem, even the worst figure
                 is very good. In fact it was statistically
                 indistinguishible from a highly optimized KNN beachmark
                 run on twice as large a training set. This is because
                 the IRIS problem is trivial. As pointed out in the
                 above referenced paper, IRIS should probably not be
                 used as a measure of the learning ability of any ML
                 system, notwithstanding its status as a 'classic'
                 problem. It is probably better characterized as a
                 'classic' way to make a ML system look good.

                 On the other two problems, which were much more
                 difficult, the genetic search improved on the random
                 search considerably. The individuals with the best
                 abilitiy to generalize on the test data set were
                 respectively.

                 Best Generalizer Gaussian 3D 72% Phoneme 85%

                 I report these figures here because the generation 0
                 figures are not reported in the above paper
                 directly.

                 Regards

                 Frank Francone

                 ",
}