- The paper demonstrates that an evolutionary algorithm can efficiently optimize Blackjack basic strategies by simulating thousands of rounds.
- It encodes strategies as an 800-bit chromosome and applies elitist selection with crossover and mutation over 1000 generations.
- Results show that evolved strategies, especially from Thorp's initialization, gain a statistically significant edge in longer simulations.
This paper explores the use of evolutionary programming to automatically discover and optimize the "basic strategy" for the game of Blackjack (Hatui et al., 2017). The basic strategy dictates whether a player should hit, stand, double down, or split, based solely on their hand and the dealer's visible card.
Problem and Approach:
- Finding the mathematically optimal basic strategy is complex, and brute-force searching the vast parameter space (estimated at  2620 possibilities in this encoding) is computationally impossible.
- The paper proposes using an evolutionary algorithm (EA) as a heuristic search method. Strategies are treated as "1" and populations evolve over generations based on their performance (fitness) in simulated games.
Implementation Details:
- Strategy Encoding (Chromosome):
- A Blackjack basic strategy is represented as a binary vector (chromosome) of length d=800.
- This vector is formed by serializing (row-by-row) five decision tables:
- Split (Player Pair vs. Dealer Up-card): 10x10 table = 100 bits
- Soft Double Down (Player Ace + Other Card vs. Dealer Up-card): 10x10 table = 100 bits
- Hard Double Down (Player Hard Total vs. Dealer Up-card): 20x10 table = 200 bits
- Soft Stand (Player Soft Total vs. Dealer Up-card): 20x10 table = 200 bits
- Hard Stand (Player Hard Total vs. Dealer Up-card): 20x10 table = 200 bits
- A '1' in the table means "perform the action" (split, double, stand), and '0' means "do not perform the action" (don't split, don't double, hit).
- Note: Some genes are redundant or have no "phenotypic expression" (e.g., soft stand rules for totals below 12, or stand rules for hands where doubling down always takes priority).
- Fitness Evaluation:
- The fitness of a strategy is determined by simulating its performance.
- Each strategy plays N=104 rounds of Blackjack against a dealer, starting with a bankroll $B_0=\$104andbettingb=\$2</sup>perround.(Theinitialbankrollforevolutionishigherthanfortestingoptimalstrategiesbecauseearlyrandomstrategiesloseheavily).</li><li>Thefitnessscoreistherelativereturnonthebankroll:\phi = (B_T - B_0) / B_0,whereB_Tisthefinalbankroll.</li></ul></li><li><strong>EvolutionaryAlgorithm:</strong><ul><li><strong>Population:</strong>M=5000$ strategies (chromosomes).</li>
<li> <strong>Initialization:</strong> Two methods tested:
<ul>
<li> Random: Each gene initialized to 0 or 1 with 50% probability.</li>
<li> Thorp: All strategies initialized to Thorp's well-known basic strategy.</li>
</ul></li>
<li> <strong>Selection:</strong> Elitist selection. In each generation, the top $\alpha=5\%(250)strategieswiththehighestfitnessscoresareselectedtobreedthenextgeneration.</li><li><strong>Breeding(Replication):</strong><ul><li>Parentsarechosenfromtheelitepoolwithprobabilityproportionaltotheirfitness\phi_i.</li><li>Anoffspringchromosomeciscreatedfromtwoparentsv_i, v_j.Foreachgenepositiona:<ul><li>Ifv_i^a = v_j^a:c^agetsthisvaluewithhighprobability\pi = 1 - 10^{-4},andmutates(flips)withlowprobability1-\pi = 10^{-4}.</li><li>Ifv_i^a \neq v_j^a:c^a$ inherits the gene from one parent, chosen with probability proportional to that parent's fitness score.</li>
</ul></li>
</ul></li>
<li> <strong>Generations:</strong> The process repeats for $\tau=1000$ generations.</li>
</ul></li>
</ol>
<p><strong>Simulation Rules:</strong></p>
<p>The simulations used specific, player-favorable Blackjack rules:</p>
<ul>
<li>Single deck.</li>
<li>Dealer stands on soft 17.</li>
<li>Natural Blackjack pays 3:2.</li>
<li>Double down allowed on any first two cards.</li>
<li>Split allowed for any pair.</li>
<li>Cannot double down after splitting Aces.</li>
<li>Can hit split Aces only once.</li>
<li>No re-splitting allowed.</li>
<li>Deck reshuffled after 1/3 depleted.</li>
<li>Player never takes insurance.</li>
</ul>
<p><strong>Results and Practical Implications:</strong></p>
<ol>
<li><strong>Evolution from Random:</strong>
<ul>
<li> The population rapidly evolved profitable strategies, with fitness saturating within about 100 generations.</li>
<li> The resulting best-evolved strategy was similar to Thorp's basic strategy, especially in stand decisions, but differed in some split and double-down rules.</li>
<li> Performance (measured over $M=1000trialsofN=10^4roundseach,startingB_0=\$1000)wascomparabletoThorp′sstrategy.ThedifferenceinaveragefinalbankrollwasnotstatisticallysignificantwithN=104$</sup> rounds.</li>
</ul></li>
<li><strong>Evolution from Thorp:</strong>
<ul>
<li> Starting with Thorp's strategy, the EA found slightly modified strategies that performed better.</li>
<li> Evolution proceeded via rarer, more distinct improvements compared to the smoother climb from random initialization.</li>
<li> The "Evolved Thorp" strategy differed only slightly (e.g., not splitting 6s vs 6, doubling A7 vs 2).</li>
<li> Performance Improvement: This evolved strategy showed a statistically significantly higher edge (~0.33%) compared to the original Thorp strategy (~0.26%) <em>when tested over longer simulations</em> ($N=10^5$ rounds).</li>
</ul></li>
<li><strong>Sensitivity to Simulation Length:</strong>
<ul>
<li> A key finding is that the fitness evaluation's sensitivity depends heavily on the number of simulated rounds ($N).</li><li>WithN=10^4$ rounds, the performance difference between Thorp's strategy, the strategy evolved from random, and the strategy evolved from Thorp was often statistically insignificant.</li>
<li> Increasing the simulation to $N=10^5$ rounds provided enough data to reliably distinguish the performance differences, revealing the superiority of the "Evolved Thorp" strategy.
- Practical Implication: When using simulation-based fitness functions for optimization, sufficient simulation length is critical to accurately differentiate high-performing candidates, although this increases computational cost.
Conclusion:
The paper successfully demonstrates that evolutionary programming is a viable and effective method for automatically generating and optimizing Blackjack basic strategies. It can discover high-performing strategies from scratch or refine existing ones. The study highlights the practical consideration of balancing simulation fidelity (longer simulations) against computational cost during the evolutionary process. The code used was reported to be efficient enough for practical use on standard hardware of the time.