Optimizing Python
February 15, 2013
Pretty soon after I had my poker bot mostly up and running I realized that nearly the entire runtime was dominated by a single function: finding the hand value (e.g. trip 8’s with a KJ kicker) of a five card hand. The code was being called again and again by my monte carlo simulator for millions of hands, and my naive Python was just too slow.
My first try was to tweak and fiddle with the algorithm, which helped cut the runtime down to approximately 12 seconds for my benchmark set. Next, I used python slots to prevent Python from creating dictionaries for my simple objects. Since my Card object essentially just held an int for value and an int for suit I was able to greatly cut down on memory consumption by adding:
This worked great, and cut my runtime down to just over 5 seconds per benchmark iteration. I still wanted things to go faster, so I decided to try moving away from pure Python. After some poking around, Cython looked the most appealing. Using it I could keep Python almost everywhere except my few core classes, and those would only need minor type annotation updates. It was actually as easy to set up as the tutorial claimed! My Card class now became:
cards.pxd:
cards.pyx:
By splitting the class into two files, other Cython classes could use the C version of Card directly without Python overhead. One snafu I hit was that I had to replace Python’s @functools.total_ordering annotation with Cython’s richcmp method. Cython saves you time by wrapping the comparators automatically once you define one method that takes an operator as an argument. I didn’t see a canonical example of this during my googling, so I’ll post what I ended up with:
And a reusable helper function to return the comparison result:
After compiling the Cython in place to .so files my benchmark ran in 2.5 seconds, or just 20% of the original implementation’s time. There’s still definitely things I can do to make it faster (malloc arrays instead of using Python lists) but I’m satisfied for now.
Comments