We consider the problem which informally can be formulated as follows. Initially a finite set of independent trials is available. If a Decision Maker (DM) chooses to test a particular trial she receives a reward depending on the trial tested. As a result of testing a random finite set (possibly empty) of new independent trials is added to the set of available trials, and so on, but the total number of potential trials is finite. A DM knows the rewards and transition probabilities of all trials. On each step she can either stop testing or continue. Her goal is to select a testing order and a stopping time to maximize the expected total reward. This problem has a long history and is related to the Multi-armed Bandit Problem with independent arms. We prove that an index can be assigned to each possible trial, and the optimal strategy is to use on each step a trial with maximal index among those available. We give a simple procedure for constructing this index.
This is joint work with Isaac Sonin.