题名

Spiking Neural Networks with Different Reinforcement Learning (RL) Schemes in a Multiagent Setting

DOI

10.4077/CJP.2010.AMM030

作者

Chris Christodoulou;Aristodemos Cleanthous

关键词

spiking neural networks ; multiagent reinforcement learning ; reward-modulated spike timing-dependent plasticity

期刊名称

The Chinese Journal of Physiology

卷期/出版年月

53卷6期(2010 / 12 / 01)

页次

447 - 453

内容语文

英文

英文摘要

This paper investigates the effectiveness of spiking agents when trained with reinforcement learning (RL) in a challenging multiagent task. In particular, it explores learning through reward-modulated spike-timing dependent plasticity (STDP) and compares it to reinforcement of stochastic synaptic transmission in the general-sum game of the Iterated Prisoner's Dilemma (IPD). More specifically, a computational model is developed where we implement two spiking neural networks as two ”selfish” agents learning simultaneously but independently, competing in the IPD game. The purpose of our system (or collective) is to maximise its accumulated reward in the presence of reward-driven competing agents within the collective. This can only be achieved when the agents engage in a behaviour of mutual cooperation during the IPD. Previously, we successfully applied reinforcement of stochastic synaptic transmission to the IPD game. The current study utilises reward-modulated STDP with eligibility trace and results show that the system managed to exhibit the desired behaviour by establishing mutual cooperation between the agents. It is noted that the cooperative outcome was attained after a relatively short learning period which enhanced the accumulation of reward by the system. As in our previous implementation, the successful application of the learning algorithm to the IPD becomes possible only after we extended it with additional global reinforcement signals in order to enhance competition at the neuronal level. Moreover it is also shown that learning is enhanced (as indicated by an increased IPD cooperative outcome) through: (i) strong memory for each agent (regulated by a high eligibility trace time constant) and (ii) firing irregularity produced by equipping the agents' LIF neurons with a partial somatic reset mechanism.

主题分类 醫藥衛生 > 基礎醫學
参考文献
  1. Pareto, V. Manuale di economia politica. Milan: Societa Editrice, 1906.
  2. Baxter, J.,Bartlett, P. L.,Weaver, L.(2001).Experiments with infinitehorizon, policy-gradient estimation.J. Artif. Intell. Res.,15,351-381.
  3. Bugmann, G.,Christodoulou, C.,Taylor, J. G.(1997).Role of temporal integration and fluctuation detection in the highly irregular firing of a leaky integrator neuron with partial reset.Neural Comput.,9,985-1000.
  4. Christodoulou, C.,Banfield, G.,Cleanthous, A.(2010).Self-control with spiking and non-spiking neural networks playing games.J. Physiol.-Paris,104,108-117.
  5. Christodoulou, C.,Bugmann, G.(2001).Coefficient of Variation (CV) vs. Mean Interspike Interval (ISI) curves: what do they tell us about the brain?.Neurocomputing,38-40,1141-1149.
  6. Cohen, W.(ed.),Hirsh, H.(ed.)(1994).Proc of the 11 Int Conf on Machine Learning (ICML).San Francisco, PA:M. Kaufmann.
  7. Faries, M. A.,Fairhall, A. L.(2007).Reinforcement learning with modulated spike timing-dependent synaptic plasticity.J. Neurophysiol.,98,3648-3665.
  8. Florian, R. V.(2007).Reinforcement learning through modulation of spiketiming-dependent plasticity.Neural Comput.,19,1468-1502.
  9. Hu, J.,Wellman, M. P.(2003).Nash Q-learning for general-sum stochastic games.J. Machine Learning Res.,4,1039-1069.
  10. Izhikevich, E. M.(2007).Solving the distal reward problem through linkage of STDP and dopamine signalling.Cereb. Cortex,17,2443-2452.
  11. Lánský, P.,Musila, M.(1991).Variable initial depolarization in Stein's neuronal model with synaptic reversal potentials.Biol. Cybern.,64,285-291.
  12. Legenstein, R.,Pecevski, D.,Maass, W.(2008).A learning theory for reward-modulated spike-timing-dependent plasticity with application to biofeedback.PLoS Comput. Biol.,4,e1000180.
  13. Littman, M. L.,Brodley, C.(eds.),Danyluk, A.(eds.)(2001).Proc of the 18th Int Conf on Machine Learning (ICML).San Francisco, PA:M. Kaufmann.
  14. Nash, J.(1950).Equilibrium points in N-person games.Proc. Natl. Acad. Sin. U.S.A.,36,48-49.
  15. Potjans, W.,Morrison, A.,Diesmann, M.(2009).A spiking neural network model of an actor-critic learning agent.Neural Comput.,21,301-339.
  16. Rappoport, A.,Chammah, A. M.(1965).Prisoner's dilemma: a study in conflict and cooperation.Ann Arbor, MI, USA:Univ. of Michigan Press.
  17. Seung, H. S.(2003).Learning in spiking neural networks by reinforcement of stochastic synaptic transmission.Neuron,40,1063-1073.
  18. Softky, W. R.,Koch, C.(1993).The highly irregular firing of cortical cells is inconsistent with temporal integration of random EPSPs.J. Neurosci.,13,334-350.
  19. Sutton, R. S.,Barto, A. G.(1998).Reinforcement Learning: An Introduction.Cambridge, MA, USA:MIT Press.
  20. Xie, X.,Seung, H. S.(2004).Learning in neural networks by reinforcement of irregular spiking.Phys. Rev. E.,69,41909.