Learning, Risk Attitude, and Hot Stoves in Partially Observable Markov Environments

This research examines decisions from experience in partially observable Markov decision processes (POMDPs). Two experiments revealed four main effects. (1) Risk neutrality: The typical participant did not learn to become risk averse, a contradiction to the hot stove effect. (2) Sensitivity to the transition probabilities that govern the Markov process. (3) Positive recency: The probability of risky choice to be repeated was higher after a win than after a loss. (4) Inertia: The probability of a risky choice to be repeated following a loss was higher than the probability of a risky choice after a safe choice. These results could be described with a simple contingent sampler model, which assumes that choices are made based on small samples of experiences contingent on the current state.