PyStage preview: a single-process robot simulator

Here’s a preview of something I’ve been working on, that’s almost finished. It’s a version of the Stage robot simulator that can be loaded into Python as a module, rather than running as a server in a separate process. My motivation for this is to be able to run simulations in batch mode as fast as possible. The standard Stage server is designed as a backend to Player, the generic robot driver and interface system that uses a networked client/server communications module. Essentially, Stage runs as a set of Player drivers, so that the fact that it’s a simulator is abstracted away, making the simulator virtually indistinguishable from a real robot. This is a great model for those who are using the simulator to test and debug code that will eventually be ported to a real robot. The downside is for those of us who want to use a simulator to speed up time, so we can run simulations that would be prohibitively long on physical robots (days or weeks).
Stage does allow speedup using the -f and -u flags, but since it’s running as a server communicating through a socket, it’s difficult to maintain synchronization if it’s sped up more than 3 or 4x. In my new library, that I’m tentatively calling PyStage (like PyPlayer), the Python program that’s running the “client” also controls exactly when the world’s Update() method gets called, and so it can take as long or as short as it wants between world updates.

This single-process, synchronized execution model provides some added benefits besides just running fast. First, Running in a single process w/o network communication allows simulation runs to be submitted to distributed batch queueing systems like Condor, easily allowing many simulations to run in parallel. While Condor does allow some limited network communication in a job, keeping a TCP connection open to a simulator process for a whole job isn’t really feasible. Second, the synchronization should make it possible to use Stage with simulations of cognitive or perceptual processes that are too computationally intensive to run in real time on current computing hardware (e.g. large scale neural simulations). In this case, instead of speeding up time in the simulator, you could slow down time, allowing a cognitive process that might take many seconds to simulate to only use a fraction of a second of simulator time. Again, this is technically feasible with client/server Stage, but for it to work you need to know in advance how long to make each cycle, and if there’s variability in computation time from one cycle to another, you need to add extra “padding” to the cycle time that will often go wasted.

I now have my initial prototype working, and have managed to run a few simulations, getting about a 3-4x speedup in raw cycle time (100-120Hz vs 33hz) over my client-server runs, however because of some other changes in the simulator, I’m able to do computation on every cycle (as opposed to about every 3 cycles before), for an effective speedup of 9-12x. Of course, then the top-speed was communications-limited, and now it’s processor-limited, so I should be able to get even more speedup on faster machines. Once the code has been cleaned up, and I learn to use the GNU build system, I should be able to release a 0.1 version (or should it be 1.0?). Or, if the Stage authors are willing, I’d be willing to fold the whole thing into Stage proper. Right now, I maintain my own slightly modified version of the Stage source for use with PyStage. Another possibility is to expand to allow clients to be written run in other languages like Java or Scheme. Since I used SWIG to generate the python wrappers, this shouldn’t be too difficult, though someone else would have to do the non-Python-specific work.

Footnote: This whole experience has been a great reinforcement of the value of Free Software. Because the source code of the program was free, I was able to modify it to serve a need that the authors hadn’t anticipated, and probably would have considered low- or no-priority, and the whole process only took me a couple of weeks. I can’t imagine this would have been possible if I’d been using a closed, proprietary system.

Read the rest of this entry »

Forum for AI: Michael Littman

FAI has started up again for the spring. This spring we’re folding the AI-Lab‘s invited speaker series into FAI, so we’ll have lots of good speakers from other universities early in the semester. Hopefully Matt and I will get some local speakers to close out the series in April/May.

Anyway, I totally forgot to blog semester’s first FAI talk by Michael Littman of Rutgers University. The talk was in ACES Auditorium Here’s the title and abstract:

Reinforcement Learning for Autonomous Diagnosis and Repair

This talk describes ongoing work on an application of reinforcement learning in the context of autonomous diagnosis and repair. I will present a new formal model, cost-sensitive fault remediation (CSFR), which is a simplified partially observable environment model. CSFR is powerful enough to capture some real-world problems — we’ve looked at network, disk-system, and web-server maintenance. However, it also admits simplified algorithms for planning, learning, and exploration, which I’ll discuss.

It was a very cool talk. The agent he used as an example in the talk was a network connection repair agent, equipped with some corrective actions (e.g. renewing the DHCP lease) as well as some diagnostic actions. All the actions have associated costs, that may be different depending on whether the action succeeds. They then use a case-based reinforcement learning method to learn a policy for how to repair a network failure at the least cost. (In this case, cost=time) Having diagnostic actions, that is, actions that are taken only to give information, is one of my favorite ideas. It is by no means new, but it strikes me in many ways as the right way to do perception: the agent acquiring information as it needs it, and integrating it into its internal state representation. In this case, the state representation was simply the history of the current repair episode (i.e. the actions taken and their results). Episodes were compared for compatibilty with cases stored in memory, producing a probabilistic belief distribution over compatible stored states. This distribution was used to choose the action that will minimize the expected cost for the current episode.

Read the rest of this entry »