On way to "solve" the representation problem is to have not much of an a priori model and let the system decide. Like, agents are made of some pseudo-code and this kind of code can work on games and on other agents code as well. Maybe you put agents on different levels according to the material they work on:
Unfortunately, the fewer ad hoc representations you put into it, the more computing power you'll need to equal nature :^)
[JPN] There always is the possibility of some simple machine language where all bytes are both code and data. Admittedly most random lists of bytes are going to be non-sense, but learning how to write meaningful programs is quite similar to learning how to make meaningful and useful moves on a board. The main difference will be in the dimension of the state space (the set of meaningful programs seems hugely greater than the set of good strategies)
Players are programs that optimize their strategies, i.e., have their own methods to learn by experience of auto-games or by analyzing the game rules. In this sense, players and games are structurally different!
Oh, I wasn't going that far. I was thinking of players close to your strategies, without any further goal in their short life than winning a game. But since you had the idea of libraries and university of knowledge and I didn't know who would be able to isolate parts of strategies as valuable knowledge that could be reused, I thought you could add a layer of dumb genetic experimenters or scholars above the dumb players.
They would experiment with pools of players and deal with the mutation and maybe the selection. Competing scholars (and their best players) would then represent something like your evolving library of knowledge. I thought you could add a layer of dumb genetic experimenters or scholars above the dumb players.
and the best alchemists get they way for the next turn
I would rather forget about any look ahead so as to play quickly and go through more games and more evolutionary steps. To increase the chance of seeing something emerging that wasn't there.
I would like that also, but the point - in many games - is not the search procedure, is the evaluation that you make on a certain position. That seems to be the place where the knowledge must emerge... but I may be wrong. In fact, that is not the way to play well the game of Go, but hey, I'm not that ambitious!
I agree it rather looks hopeless since the players are going to start and probably stay very dumb. But I believe it would be even worse if you add an alpha-beta skeleton unfortunately, Because of speed. And even if moves at depth 0 and moves at depth 5 were played at the same speed, I think the quantity and quality of evolution observed might be the same. Like learning to swim versus learning to sail. It looks like you go faster on a boat, but are you smarter and more experienced?!
Actual data so far points that that is not true (while some specific theoretic results about pathologic games shows the opposite). Check Deep-Blue. Also, evolutionary Checkers seemed to need depth 4 to get somewhere. Information at depth zero is too thin for the computer to see (or most people for that matter, even chess GM's need to see below the surface to make a decision)
[Check Chaunier's page on N players Alpha-Beta]
I just finished "A Jump Ahead" by Jonathon Schaeffer, about the Chinook checker program. He said that he liked all the "computer science" type stuff: the searching, pruning, calcualting end-game databases, but he didn't like adding and tuning the "knowledge" part of the program. Devising and tuning the knowledge is exactly what I like. To me, everything else is brute force.
Precisely my point of view! It's good to have competitive game players, like it's good to have fast calculators. But to enlighten the obscure path of learning, those methods don't contribute with a single bit of useful data.
One of the things that has concerned me about this kind of heuristic scheme is, how do you prevent the learning of the strategies from going off into some dead-end, ignoring effective techniques, due to the effects of random events early in the learning process.
There are no magic solutions to that, no free lunch. Just insert some mutation into the pool of solutions and hope to find a new hill to climb
Perhaps the mutation rate could vary over time, high in the beginning, and lower toward the end, but the standard deviation of the mutations would make a difference.
A possible idea. The same used in simulated annealing, less changes as the system converges.
Is the knowledge in the academy placed there by you, or by high-performing agents?
By some fast change on the winning ratio of the agent, for e.g.. Who does that, right now, I don't now, perhaps the agent itself, or by some higher judge
For real world applications, you'd want to initialize the system and go out drinking, and come in the next afternoon and examine the "gene pool" and the "academy" for the result. After you get it working and tuned, it might be fun to alter the environment to see how the agents respond. To take an example from chess, start with Shatranj, and then change the rules one-by-one until you get FIDE chess, and see what happens. You'd be creating a situation where knowledge in the academy was wrong, and would have to be corrected or ignored. This could have real-life applications.
Think real world. Any pure scientific study could ultimately result in a useful application. Your agent scheme could be used to make decisions where there was no human input available, like long-range space exploration. So the agents are addressing some problem with exploring Uranus while the scientific programs are gathering data. New or refined data could be discovered while the agents where still "evolving". You would want them to be able to "switch gears" based on the new data without outside help. That would be artificial intelligence!
I see! In fact, maybe the system would then learn how and when to learn, since besides the facts that it learns are dynamic (it will only search and evaluate a small set of all possible positions), the rules that create those facts are changing too (the rule-set changed)
I think of your agents, in real applications, as a computational device to create knowledge of complex systems. The goal is the creation of the Academy, which would then be a knowledge template for some control process that was taking action in the physical world.
Yes, and of course, a side effect is that it also give the best performance agent for the specified task.
An unavoidable point, is that some knowledge must be inserted! The solution is to reduce it to the minimum possible. That is itself a worthy investigation point.
But doesn't that compromise at least part of the investigation? Don't you want to see what knowledge is generated by the LUDĘ system alone, without "hints"?
I will insert vague meta-game knowledge, and the minimum possible. There will be no specific game knowledge on the system, besides validMoves() and endgame(). If I get nice results, I can try to strip the meta-knowledge off, bit by bit, and see what happens! This is a curious dilemma: from when am I inserting knowledge? Since I start programming, the effect is turned on, but since it's obvious that I cannot wait for a quantum effect to build a meta-player, I must do something :-) That's the all point of the weak theory of the game domain, give tools as abstract as possible, in order to get specific information that the system could not know before the process begins.
I have no idea how a "pre-experience" evaluation could be coded without actually building in knowledge. What you are proposing is what we humans all do naturally: before we ever play we try to make a plan toward the goal, rather than play randomly.
Precisely! Of course, the planning structure could itself evolve... and if we don't stop it, an endless meta-planning phases arise!
That's like the real world. Eventually you have to do something, not just think. Else it would be like those old cartoons of chess players with cobwebs on them. As a first approximation, a simple time deadline would be enough: think for two minutes, then you must start playing!
I think the idea of designing a program able to play different games and learn winning strategies is rather old [check the introduction]. When I was a student (1975-1982!) I programmed several strategy games and tried to design one such program. At that time, a popular technology was abstract data types, so I used abstract data types to represent the board and valid moves, such that generating the basic code for a given game was "simply" a matter of compiling abstract data types to a virtual machine (at that time, a popular target was P-code... but it was actually not very different of the currently fashionable Java VM). The core algorithm was a mini-max algorithm, but the key question was "how to design an abstract value function that will be also able to learn".
The agent should be able to a) get some active immediate knowledge from the game rules (mobility...); b) Learn from self-play and incorporate that in its evaluation function, or c) just use natural selection to fetch the best architecture!
I made some attempts with polynomial decomposition with self varying coefficient, but that did not prove very successful. Retrospectively, I think I was trying to find a concept like the self learning neural network.. but without the massive effect that makes it (relatively) successful!
One possible architecture is evolving neural nets. Check Fogel's work.
Restricting to square boards and one or two kinds of pieces (which I did not even think to do in my ambitious young days), one can use a product of matrices, with a matrix of varying coefficients. This approach has been used by several Othello/Reversi programs (including one I wrote at that time), but it was more to adapt the program response to the opponent strategy than to actually learn.
Concerning the repository of knowledge question, I think that declarative rules have been used, at least for some small games, at the time expert systems and other inference engine were popular. I thought of using Prolog for such a purpose but did not made it.