| |
|
| Log in | New user? |
|
Benchmarking your AlgorithmBenchmarking is currently conducted over 10 "unseen" landscapes. This number may be increased to improve statistical power as more computing power becomes available. The landscapes are randomly ordered on each benchmarking run to prevent attempts to overspecialise algorithm variants on individual landscapes. On each landscape you are allowed 1000 probes (fitness evaluations) to use as you see fit. This allows a fair comparison between population based methods, local search methods (for example local search with restart), and hybrid methods. The challenge is to find the lowest values you can with a fixed number of probes. Your program will thus have an outer loop that runs 10 times. Each time it will initialise and run your algorithm until 1000 fitness evalutations have been used. Simple examples are provided in the files Test.java (which conducts a random sampling) and UniformSearch.java (which conducts a uniform or grid search). Note that you must register or login before benchmarking, and choose the Benchmarking item from the menu to initialise the run. A benchmarking run may currently take some time, possibly up to an hour depening on network transit times. We are currently working on decreasing this time and new software will be released shortly. It is intended that the benchmarking set not be used for tailoring algorithms. A series of over two billion training moons have been provided for this purpose. These can be probed as many times as you wish without human initialisation, so can be used by meta GAs and the like. Human initiation is required for the benchmarking series as we wish to avoid overspecialised training. We hope that users will agree that the purpose is to find the best algorithm for the whole class of problems. We reserve the right to change which set of moons is used in benchmarking. Finally, the scores on the league tables are calculated by mapping the minimum value achieved by the algorithm to an exponential function. This is to linearise the improvements that can be achieved at different scales. Since the minima are unknown, the scoring function for each landscape is calibrated relative to the minimum depths found for uniform searches using a 100x100 grid (corresponding to a score of 100) and a 1000x1000 grid (corresponding to a score of 102). Thus if your algorithm achieves a score of 100, it is achieving as good a result with 1000 probes as a uniform search is achieving with 1 000 000 probes. As an example, the scoring function for Moon 20_101 is shown below. More will be written on this shortly.
|
| |
| Copyright © 2005, 2006, 2007 Cara MacNish, School of Computer Science & Software Engineering The University of Western Australia Version 1.0 |
![]() |