This has been on my mind for a few days. I'd love a criticism of my arguments outlined here: https://groups.google.com/forum/#!topic/fallible-ideas/9bcC5WN6bLs. I'll re-issue them here:
While studying genetic algorithms and genetic programming, I stumbled upon the concept of symbolic regression, developed by the father of genetic programming John Koza.
The idea is that instead of specifying the form of the function in advance (as other methods of regression may require), all you do is provide some data points and the program will figure out the equation from which all points can be derived. For example, given the (artificially very small) set of x-y coordinates (-1, 1), (0, 0), (1, 1) it may conjecture the solution f(x) = |x|. (For anyone who already noticed that f(x) = x^2 would also work, hold that thought.)
Here's how the program does it: It guesses random functions and eliminates those that are the worst fits. It then applies some genetic operators (such as random mutation and crossover at random points) to the remaining population of functions to introduce changes and lets them duplicate. This process continues until either a maximum amount of iterations have occurred, or until a function with perfect fitness is found. Koza found that using this process, perfect matches for quite complex data sets can be found rather quickly. (For anyone interested in the details, check out his book "Genetic Programming", chapter 7). Koza may even have done this for kicks using data from planetary motions to recreate Kepler's third law (it's not clear to me from his wording whether he actually did it; the point being I find it easy to agree that it could be done).
This process struck me as rather Popperian. I will explain what I mean with the help of Popper's "The Bucket and the Searchlight", which can be found as appendix I to his "Objective Knowledge" (to anyone familiar with Popper, this will surely all seem very familiar as well). Here Popper speaks of universal laws, specific initial conditions (which together make the explicans, ie the explanation), and explicanda (the things to be explained, usually problematic observations). He goes on to say that we conjecture the explicans, and that "the various methods of explanation all consist of a logical deduction; a deduction whose conclusion is the explicandum", ie we need to be able to derive the explicanda from the explanation. We criticize explanations by finding explicanda which cannot be derived from them even though we would have expected them to (these are problematic observations).
Here is why symbolic regression seems Popperian to me: the points we are given are the explicanda, and the function the program conjectures is the "universal law" (universal in the sense that the function always returns the same value for a given parameter, which parameter is the specific initial condition). Criticism is applied based on how many of the explicanda can be derived from the explanation.
I've been told that symbolic regression seems rather inductivist, however. (I realize that inductivism itself does not exist, so what I mean by "inductivist" is that it seems reminiscent of how inductivism was thought to work.) Presumably, this is because we seem to start with data/observations, and then try to construct a theory/function from them, whereas in the Popperian method, theories and problems usually precede observation. Let me first say I completely agree with this view in general, and that I am in no way advocating inductivism. In this particular case, however, I think this is a red herring, because the problems that symbolic regression deals with are first (admittedly broadly), "What's the explanation from which every point can be derived?", and second "Why can't these other points be derived from our best explanation so far?". The latter case can happen when given an additional explicandum, such as (2, 4), which cannot be derived from f(x) = |x|, and we need to instead conjecture g(x) = x^2 as a better explanation (my guess here is that the way humans do it, and the way you and I just did as we read this paragraph, is through some kind of symbolic regression as well). So we had to solve the problem of integrating the problematic new observation into a new, better explanation (better in the sense that g explains everything f does - it "saves the appearances" - plus it explains the problematic observation). I think this also means that Popper's concept of verisimilitude is basically used as the fitness function in the genetic program.
All this is to say that symbolic regression still seems Popperian and problem driven to me, even if at first sight it appears inductivist in the sense that it is "data driven". Hypotheses/explanations have their counterpart in functions, initial conditions have their counterpart in x-coordinates, and explicanda have their counterpart in x-y coordinates, and we are interested in deriving the explicanda from our function/theory, just like in science.
I would like to know if and where I am wrong, because if symbolic regression really does mimic Popperian epistemology (albeit only for mathematical formulae), maybe it has something to teach us about how humans think, and how this could be turned into a broader algorithm that works for more than just numbers.