Modified Reinforcement Learning Infrastructure

The reinforcement learning (RL) model has been very successful in behavioural sciences, artificial intelligence and neuro- science. Despite its fruitfulness in many simple situations, the RL model does not always cope well with real life situations involving a large space of possible world states or a large set of possible actions. We propose a modified version of the RL learning model. The benefit of this model is that the temporal difference prediction error can be used directly to update not only the value of the latest action of the learning agent, but the values of many possible future actions. An example application of this modified reinforcement learning infrastructure (MRLI) is presented for a customer behaviour in a complex shopping environment.

Kokoelmat

Julkaisut