Absent Minded Gambler

Dragon God

Introduction

The following is a decision problem I created today, and which I am unable to resolve. I would state the problem in the general form, and not assign specific payoffs to any of the possible actions, leaving them as variables. There are two classes of the decision problem, with the same two types under each class. The Absent Minded Gambler is greatly inspired by the Absent Minded Driver decision problem.

Scenario

You are a greedy gambler, the utility of money grows linearly for you (feel free to substitute money for utilons from this point on), and you aim to make as much money as possible. You find yourself in the Casino of Infinity and are about to start a gambling game with peculiar rules:

  • You would play this game an unknown number of times say $N$. Whilst the amount of times you play the game is unknown to you, (it is determined by you).
  • Between each round of the game, your memory of all previous rounds would be erased and overwritten as needed.
  • You start the game with a certain amount of money the Casino provides you with, let's call this your starting fund $F0$. After round $i \, (Ri)$, your cumulative winnings through all previous rounds will be your starting fund for round $i + 1$ $(Fi)$. However, at $R{i + 1}$ you would only recall being provided with $F{i}$, and be unable to tell if the amount you're provided with is your starting fund $F0$, or your total cumulative funds after $i$ rounds $i > 0 \, (F_i)$.
  • You play this game by betting among a set of outcomes $O, #O = m, \, m | m \ge 2$ each round. The outcomes are mutually exclusive, and you can only bet on one outcome. One of the outcomes will always occur. You would be told the outcomes and their payoff (as a multiplier of the amount you bet). At each round, it is possible to decide not to bet. If you bet $bi$ on $Ri$ for outcome $ok$, and the payoff of $ok \, (uk)$ is $5.0$, then if $ok$ occurs, you will lose $bi$, but gain $5bi$, for a total profit of $4bi$. Thus, your profit on each $Ok (Gk)$ is given by $Gk = bi(uk - 1)$.
  • The Casino has an unlimited supply of money.
  • $\forall ok \in O, uk > 1$
  • You would be told the empirical probabilities of each outcome $(Pr(ok) = Prk, \, \sum{k = 1}^m {Prk} = 1$. You are betting on the same set of outcomes each round, and the outcomes have the same corresponding payoffs across all rounds.
  • You would be told both the class and type of game you're playing, and you'll play the same type and same class for all the rounds you play.
  • You are immortal while in the Casino, and do not age, get fatigued, require sustenance on any form, or otherwise come to harm. You can continue playing in the Casino forever.
  • You can quit and leave the Casino before the start of each round.
  • At the start of each round, you would be provided with these rules.

Classes of Game

The classes of the game can best be explained if the game is viewed as drawing coloured balls from a barrel. There are $m$ different colours for the balls. The number of balls is $v$. $\forall ok \in O, Prk \in \mathbb{Q}$. Since all the probabilities are rational numbers, we can rewrite them as: $\forall ok \in O, Prk = \frac{ck}{dk}$. $$v = {\prod{k = 1}^m {dk}}$$ For each $ok$, the number of balls of the corresponding colour in the barrel $(nk)$ is given by: $$nk = {ck \times \frac{\prod{h = 1}^m {dh}}{dk}}$$ The above is gotten from the definition of empirical probability. $$Prk = \frac{ck \times \frac{\prod{h = 1}^m {dh}}{dk}}{\prod{w = 1}^m {dw}}$$ There are $v$ balls in the barrel. Each ball is randomly drawn from the barrel, and there is always a uniform distribution over the balls in the barrel.

Independence Class

If each of the rounds is thought of as an experiment, then the probability of each outcome is the same in all experiments. and the outcomes have the same probability across all rounds. Each round is independent of all other rounds.
This class can be viewed as drawing coloured balls from a barrel with replacement of the drawn ball before the next round.

Non-independence Class

The number of rounds you play is a multiple of $v$. The rounds are divided into sets, with $v$ rounds in each set. The probability of an outcome changes across each round. This class can best be viewed as drawing coloured balls from a barrel without replacement.
Let the $i^{\text{th}}$ round in a set be denoted $si$. The first round is $s1$.
A set starts as follows:

  1. You place a bet on the colour of the next ball, or choose to refrain.
  2. A coloured ball is randomly drawn from a barrel.
  3. If you get the colour right, you win.
  4. The ball is not returned to the barrel.
  5. Balls are continuously drawn from the barrel until the barrel is empty.
  6. At the start of each $si$ you are told the starting probabilities of each $ok$, and the value of $v$. You do not know what $s_i$ you are on, and what the number of balls in the barrel currently is.

The payoffs of each outcome are the same across all $si$.
At $s
i$ there are $v - i + 1$ balls in the barrel.


Types of Game

There are two types of the game, with the former merely being a special (and an interesting one at that) case of the latter.

Bernoulli Gamble

There are only two outcomes $(m = 2), \, O = {o1, o2}, \, o2 = \neg o1, \, o1 = \neg o2$. You either bet on $o1$, bet on $o2$ or don't bet at all. $Pr1 = 1 - Pr2 , \, Pr2 = 1 - Pr1$.

Non-Bernoulli Gamble

There are $m (m | m \ge 2)$ outcomes. $(O = {o1, o2, ..., ok, ..., om})$. You can bet on any $ok$, or refrain from betting. $$\sum{k = 1}^m {Pr_k} = 1$$


Thoughts

The Non-independence class seems to me like the clearly more interesting class.
It seems to me that regardless of what $N$ is (and especially for large $N$), the strategy to maximise total payoff is simply to maximise your expected utility. This is because the empirical probability of an event $A (Pr(A))$ is given by the formula:
$$Pr(A) = \lim{N \to \infty} \frac{N(A)}{N}$$ Where $N(A)$ is the number of times $A$ occurred in $N$ trials.
The expected utility of an act is the limit of the average utility of that act as $N$ tends to infinity. Let $a
i$ be any particular act.
Let $sj$ be any state of the world.
Let $m$ be the number of states of the world.
Let $o
{ij}$ be the outcome that occurs when you choose $ai$ and the state of the world is $sj$.
Let $u{ij}$ denote the utility of $sj$.
Let $u{ij}$ denote the utility of $o{ij}$.
Let $N(sj)$ denote the number of times $sj$ occurs in $N$ instances of the decision problem.
Let $E(ai)$ denote the expected utility of $ai$.
$$E(ai) = \lim{N \to \infty}{\frac{\sum{j = 1}^m{N(sj) * u{ij}}}{N}}$$ Total payoff $TN^i$ after $N$ rounds of choosing $ai$ is:
$$T
N^i = \sum{f = 1}^N{u{id}}$$ Where $sd$ is the event that occurs.
Given the frequency of the different $s
j$, the above becomes:
$$TN^i = \sum{f = 1}^N{\frac{\sum{j = 1}^m{N(sj) * u{ij}}}{N}}$$ $$TN^i = N \times {\frac{\sum{j = 1}^m{N(sj) * u{ij}}}{N}}$$ Thus, as $N \to \infty$, $TN^i \to N \times E(a_i)$.
Thus for large $N$ maximising expected utility maximises total payoff.

The above is clearly the case for the Independence Class of the above decision problem, however, I am no sure if it is the case for the Non-independence Class--mostly because I have no idea how to coherently work out an expected utility maximising strategy for the Non-independence class.
If we consider the Non-independence class as rounds of sets, then maximising expected utility of each set becomes the rational strategy.

Another issue worth noting is the fact that at some point, you'll have to know when to quit. A gambler who always bets each round and continues making money but never leaves the Casino is doomed to failure (as the money is only useful outside the Casino). Whatever strategy is chosen has to include the decision to quit at some point.

The Kelly Criterion seems relevant to deciding which outcome to gamble on.

I am not sure how it would apply to the Non-independence class, but for the independence class, some ideas I have are:

*Ignore all outcomes for which the Kelly criterion recommends not betting on.
* Pick outcome with highest Kelly bet and bet on it consistently (I am not sure if this is the best strategy as opposed to some mixed strategy involving outcomes with different Kelly bets).
* Assign $p < 1$ as the probability that you would continue the game for the next round. If $p = 1$, you would be trapped in the Casino for eternity. If $p < 1$, you would almost surely leave the Casino at some point. This satisfies the requirements of eventually leaving the Casino.

For the Non-independence class, you would need some way to quantify your uncertainty regarding the current $s_i$, and the expected probability distribution over the balls in the barrel.


I would be immensely grateful for a solution to the above problem.

Monday, 23 October 2017 14:41 GMT