No limit: AI poker bot is first to beat professionals at multiplayer game

524 points by Anon84 6 years ago

thomasfl 6 years ago

One of the researchers, Tuomas Sundholm, has a real badass CV. Former pilot in the Finnish airforce. Finnish windsurfer champion. Snowboarder. Professor at Carnegie Mellon. Speaks four european languages, including swedish. And now at the age of 51, he has created the best AI powered poker bot.

https://www.cs.cmu.edu/~sandholm/cv.pdf

jacquesm 6 years ago

Not to belittle the man's other achievements but speaking four languages is pretty normal in Europe, except when you're from the UK.
- loup-vaillant 6 years ago
  
  > speaking four languages is pretty normal in Europe
  Northern Europe, maybe. French people for instance tend to suck at foreign languages. We rarely go beyond 3 languages (French, English, then German or Spanish. The last two are often forgotten after school.)
  I suspect Spain and Italy are similar.
  
  ryanlol 6 years ago
  
  For example Spanish, Catalan, English and French would hardly be an unusual combination.
  
  amval 6 years ago
  
  While Italy, France and Spain are pretty much tied in their English proficiency (Spain might be ahead, but not significantly like Portugal), there are 4 official languages in Spain, and several regions where pretty much everyone is bilingual.
  I recall something like a 2.2 average.
  
  bryanrasmussen 6 years ago
  
  Italy 1.8 seems accurate, in my experience most Italians know only Italian, although younger generations are likely to know a bit of English.
  The surprising thing for me is Germany having 2. Seems unlikely.
  
  loup-vaillant 6 years ago
  
  > The surprising thing for me is Germany having 2. Seems unlikely.
  Germany is big. I've heard that the proficiency in foreign languages tends to decrease as your country gets bigger. Because the bigger the country, the less likely you are to interact with foreign languages. Bigger countries also tend to have foreign works translated (or dubbed) into their own language more often.
  So, no, I'm not surprised.
  
  bryanrasmussen 6 years ago
  
  I was surprised it was not less.
  
  y4mi 6 years ago
  
  At least a quarter of my city barely speaks german.
  But they need to be able to get citizenship afaik... So basically everyone can speak two languages on paper, though their knowledge of the native one is extremely rudimentary
  You're also required to learn 2 foreign languages in school if you want to go to university
  
  jfengel 6 years ago
  
  As an American, I am now going to bang my head into a wall.
  
  xxs 6 years ago
  
  >into a wall.
  I thought it was somewhat delayed, not paid, yet.
  
  stronglikedan 6 years ago
  
  Nothing to do with being American, since you're afforded the luxury to learn other languages for free through public schooling. If anything, bang your head because you chose not to.
  
  jfengel 6 years ago
  
  The offer is made, but the reason for doing so isn't made clear. I didn't understand it at the time; I availed myself of it in a minimal way. Most don't do that.
  Some of that is the accident of geography: it simply wasn't necessary. Today, we are more connected to our Spanish-speaking neighbors, and the value of learning that language is becoming increasingly obvious. I don't know whether the schools are doing a better job of stressing that than they did when I was in school.
  I have indeed chosen to learn other languages, several of them. I wish I'd done it in school, at a time when my brain was more open to it. Unfortunately, that was also a time when I didn't know very much and put my priority on other things that ended up making less of a difference in my life.
  
  Kaiyou 6 years ago
  
  It's a myth that you learn languages easier earlier in life. Mastering a language takes about 10 years, it's just that when you start at age 6, you could be done by age 16.
  
  jchallis 6 years ago
  
  Speaking as an American who speaks a handful of languages, very few Americans achieve any proficiency with foreign languages based on school from public school classes. Indeed I'm willing to take to zero those that don't have an active speaking component (most).
  
  JaimeThompson 6 years ago
  
  The quality of said language is highly variable which also has an impact. It simply isn't a priority to a lot of schools.
  
  objectivetruth 6 years ago
  
  Eh, not for many, many Americans. My school, and most of the schools in my county, offered only Spanish and my understanding is that four years of it still wouldn't qualify a person for AP credit.
  It's hard to find more data beyond my anecdata -- an EdWeek article I found reported that less than 50% of schools report world language enrollment data.
  Also, the Europeans who learn three or four languages in school also have the luxury to learn those languages for free* through public schooling, so I'm not sure I understand your point.
  I am sure that your implication that every American kid can get a quality free foreign language skill in school is false: just like almost every single other educational outcome in the US, it's generally great in the good (wealthy, suburban) schools and terrible in the bad (poor, rural or urban) schools.
  
  Kaiyou 6 years ago
  
  Public schooling is a waste of time and not where people learn foreign languages. I learned my second and third language purely through the Internet. One of them I also had in school, but like I said it was a waste of time. The method is just completely wrong, since in school they do the two things that are the most detrimental to learning a foreign language. Those two things are correcting mistakes (since the emphasis will be on the mistake, which will be remembered) and learning grammar. Grammar is useless overhead when learning. Once you know the language you can bother with grammar, if you care. I never did.
- diggan 6 years ago
  
  > speaking four languages is pretty normal in Europe
  Clearly we have different experiences (swedish person living in spain currently) but I haven't met that many people who speak four languages and are from a european country (but have yet to been in eastern europe).
  That finns speak swedish is a special case though, as AFAIK, they learn swedish in school and being finn-swedish is a thing too.
  
  bjoli 6 years ago
  
  I am a classical musician, and in my profession it is quite common. I speak a lowly 3 languages, but many colleagues speak 4+. It is a very international market, and if you leave your home country to study, it is not uncommon to work in yet another country before returning home.
  Our solo flute speaks a whopping 6 languages well, and I suspect our harp player knows even more.
  
  bjoli 6 years ago
  
  I should have mentioned: this is in Sweden.
  
  jvanderbot 6 years ago
  
  Id love to know how you earned those downvotes.
  
  Beltiras 6 years ago
  
  In Iceland it's pretty normal. We know Icelandic (ofc.) and learn English, one Scandinavian language (Danish, Swedish, Norwegian or Finnish), then at 15 one of German, French, Italian or Spanish. We are on an island in the middle of the Atlantic. I'd expect more linguistic pluralities on the mainland.
  
  Geimfari 6 years ago
  
  I feel that's a bit of an overstatement, having studied them a bit is one thing, but most people here cannot comfortably communicate at all in Danish or a 4th language, and cannot read a book in these languages.
  
  nso 6 years ago
  
  Being Swedish I bet you at minimum can understand and communicate proficiently with speakers and writers of Swedish, Norwegian, Danish and English. Probably you learned either Spanish, French, or German in school as well?
  Nordic countries are a special case.
  Norden er et spesielt tilfelle.
  Norden är ett speciellt fall.
  Norden er et specielt tilfælde.
  
  bryanrasmussen 6 years ago
  
  Nobody can communicate proficiently with the Danes
  https://www.youtube.com/watch?v=s-mOy8VUEBk
  
  vietbold 6 years ago
  
  Some of us have Danish heritage you insensitive rod.
  Nøgen pige lærer sku å su pig
  Danish for the transporting win
- duchenne 6 years ago
  
  A quick google search seems to contradict your statement: https://jakubmarian.com/wp-content/uploads/2014/10/number-of...
  Average number of languages spoken: France: 1.8 , Germany: 2.0 , Spain: 1.7 , Portugal: 1.6 , Italy: 1.8 , Greece: 1.8 , Poland: 1.8 , Sweden: 2.5 , Finland: 2.6 , UK: 1.6
  
  jacquesm 6 years ago
  
  That's averages. And 'pretty normal' is not a mathematical thingy it just means: that this isn't rare or noteworthy at all.
- thomasfl 6 years ago
  
  Most norwegians only speak 2 languages. Swedish and danish is very similar to norwegian, more like dialects, so it doesn't count.
  
  kyleblarson 6 years ago
  
  I am on holiday in Norway right now and have been super impressed by the english fluency of most people I have spoken with. It goes far beyond basic conversational fluency.
  
  vietbold 6 years ago
  
  Go to the fucking outskirts and see how well it goes.
- fasicle 6 years ago
  
  I've met a lot of people while working and travelling around Europe the past couple years, I would say 2-3 is more common.
  I rarely met someone who could speak four languages fluently.
- quakenul 6 years ago
  
  Getting in touch with two foreign languages in school is not uncommon, but speaking up to four (including your mother tongue) with any sort of sophistication definitely is not normal, at least in western Europe.
  
  Madzen1 6 years ago
  
  Not uncommon in Scandinavia, if you know one of the languages you can learn the other easily. Some people from Finland have swedish and finnish as their mother tounge, the german most likely came from upper secondary school, together with english.
  
  Shaanie 6 years ago
  
  As a Swede I have little issue understanding Norwegian, but I would absolutely not claim I speak it. Yes, the languages are similar enough that we can understand each other, but no Scandinavian will be able to speak another Scandinavian language without practice as there are many differences.
- kami8845 6 years ago
  
  No it's not, what are you talking about? I've met thousands of young Europeans and ones that speak 4 languages are extremely rare. Unless they're from countries where they get 2 languages "for free" like Holland/Belgium/Switzerland. Definitely not "pretty normal".
  
  thecatspaw 6 years ago
  
  I am swiss and the only Languages I speak are German and english. I should have learned french as well (and had it for a few years in school), but things tend to not stick if you're beeing forced to learn it against your will.
- GuB-42 6 years ago
  
  French guy here: no.
  French people can usually speak basic English, and a third language is common if that person has ties with another country but that's it. At school, we are normally taught two foreign languages. The first one is usually English, few people actually practice their second one.
  The situation is completely different in Scandinavian countries. And it is indeed quite normal to speak 4 languages in Finland (usually Finnish, Swedish, English and a 4th one, often German). Because their native language is only spoken by a few, foreign languages are a necessity for international relationships. And as a Finnish friend told me, learning new languages is a popular way to pass time during long winter nights.
- dorgo 6 years ago
  
  If you want to keep your conversation private it is not enough to choose a rare language in Berlin. There is always somebody who understands what you are saying.
- world32 6 years ago
  
  No its not normal to speak four languages in Europe.
- reitoei 6 years ago
  
  Irish person here.
  It's not normal.
- blancheneige 6 years ago
  
  it's pretty normal in backwater countries that can't thrive on their own. otherwise not so much.
sakarisson 6 years ago

> Speaks four european languages, including swedish.
Judging by his name, I'd assume Swedish is his first language, so that particular aspect isn't that surprising to me
- krageon 6 years ago
  
  Just from the fact that he exists in Finland and is older than 20 it's basically a given that he speaks Swedish because they learn it in school.
  
  Dragory 6 years ago
  
  Learning it in school and actually speaking it are very different things though. Source: learned Swedish in school in Finland, rarely used it since, have practically forgotten all about it now.
  
  estomagordo 6 years ago
  
  That's stretching it a bit. Many Fins only speak very basic Swedish.
xxs 6 years ago

Finland has two official languages - Swedish and Finnish, his name suggests he is Swedish to begin with.
It's not uncommon to speak four languages (often C2 in couple of them) in the North Europe, esp. the Baltic region.
Like mentioned by sibling (sakarisson), that particular part is not impressive, the rest - sure
sails 6 years ago

CV is also 100+ pages, not bad!
sytelus 6 years ago

You gotta love resumes that says “founding of companies listed later” and have a dedicated chapter on “EVIDENCE OF EXTERNAL REPUTATION”.
stevespang 6 years ago

392 published papers

pesenti 6 years ago

Blog post: https://ai.facebook.com/blog/pluribus-first-ai-to-beat-pros-...

Science article: https://science.sciencemag.org/content/early/2019/07/10/scie...

YeGoblynQueenne 6 years ago

>> Pluribus is also unusual because it costs far less to train and run than other recent AI systems for benchmark games. Some experts in the field have worried that future AI research will be dominated by large teams with access to millions of dollars in computing resources. We believe Pluribus is powerful evidence that novel approaches that require only modest resources can drive cutting-edge AI research.
That's the best part in all of this. I'm not convinced by the claim the authors repeatedly make, that this technique will translate well to real-world problems. But I'm hoping that there is going to be more of this kind of result, singalling a shift away from Big Data and huge compute and towards well-designed and efficient algorithms.
In fact, I kind of expect it. The harder it gets to do the kind of machine learning that only large groups like DeepMind and OpenAI can do, the more smaller teams will push the other way and find ways to keep making progress cheaply and efficiently.
- kqr 6 years ago
  
  Yes! I work for a company that does just this: pull big gears on limited data and try to generalise across groups of things to get intelligent results even on small data. In many ways, it absolutely feels like the future.
  
  mooneater 6 years ago
  
  Interesting, are you using bayesian methods?
  
  kqr 6 years ago
  
  Does "Bayesian methods" mean anything specific? Parts of the core algorithms were written before I joined, and they are very improvised in the dog-in-a-lab-coat way. I haven't analysed them to see how closely they follow Bayes theorem and how strictly they define conjugate probabilities etc. (we are also heavily using simple empirical distributions), but the general idea of updating priors with new evidence is what it builds on, yes. I have a hard time imagining doing things any other way and still getting quality results, but that is probably a reflection on my shortcomings rather than a technical fact.
samfriedman 6 years ago

The FB post is much more detailed and I think the link on this post should be updated to point there.

gexla 6 years ago

It's easy to "take away" too much information from this. The focus is that an AI poker bot "did this" and not get too much into other adjacent subjects.

But what's the fun in that?

10,000 hands in an interesting number. If you search the poker forums, you'll see this is the number you'll see people throw out there for how many hands you need to see before you can analyze your play. You then make adjustments and see another 10,000 hands before you can assess those changes.

In 2019, it's impractical to adapt as a competitive player in live poker. A grinder can see 10,000 hands within a day. The live poker room took 12 days. Another characteristic of online poker is that players can also use data to their advantage.

So, I wouldn't consider 10K hands as long term, even if this was a period of 12 days. Once players get a chance to adapt, then they'll increase their rate of wins against a bot. Once you have a history of hand histories being shared, then it's all over. And again, give these players their own software tools.

Remember that one of the most exciting events in online poker was the run of isildur1. That run was put to rest when he went bust against players who had studied thousands of his hand histories.

This doesn't take away from the development of the bot. If we learn something from it, then all good.

Tenoke 6 years ago

>10,000 hands in an interesting number. If you search the poker forums, you'll see this is the number you'll see people throw out there for how many hands you need to see before you can analyze your play. You then make adjustments and see another 10,000 hands before you can assess those changes.
If you read the paper/facebook post[0] (no idea why this worse article is the link here) - you'll see they address this.
>Although poker is a game of skill, there is an extremely large luck component as well. It is common for top professionals to lose money even over the course of 10,000 hands of poker simply because of bad luck. To reduce the role of luck, we used a version of the AIVAT variance reduction algorithm, which applies a baseline estimate of the value of each situation to reduce variance while still keeping the samples unbiased. For example, if the bot is dealt a really strong hand, AIVAT will subtract a baseline value from its winnings to counter the good luck. This adjustment allowed us to achieve statistically significant results with roughly 10x fewer hands than would normally be needed.
0. https://ai.facebook.com/blog/pluribus-first-ai-to-beat-pros-...
crazypyro 6 years ago

>Remember that one of the most exciting events in online poker was the run of isildur1. That run was put to rest when he went bust against players who had studied thousands of his hand histories.
Perhaps more famously, Jungleman compiled hand histories from many different people while he was playing Tom Dwan in the 'durrrr' challenge (which I guess technically isn't over....)
csa 6 years ago

You clearly didn’t read the additional links they posted. They mentioned why they chose 10k (AIVAT), and it goes far beyond any of the variables you mentioned.
For any number of hands, my money is on the bot.
- Traster 6 years ago
  
  That really doesn't address the point that was raised. It's not that the bot wins through luck and that 10k is too small a sample, it's that a good professional poker player isn't good over 10k hands, they're good over 5 years.
  Any good player will have their play analyzed and responded to, so there's a feedback loop there - any good player will have their play analyzed, exploited and will have to re-adjust their strategy to respond to exploitative play. The question is: How does the AI strategy adapt over time to players who know the hand history of the AI strategy. That's an extremely important part of being a top level player. To give you an example - if you watch Daniel Negreanu's vlog about his time at the WSOP he actively talks about changing his strategy in response to his analysis of different players' profiles. This is especially important in Sit & Go where at high stakes you'll have regular grinders who build up reputations - less so in tournaments where you're less likely to meet any given player.
  
  hdkrgr 6 years ago
  
  This will be interesting to see.
  Brown and Sandholm's algorithm aims to play a Nash Equilibrium which by deifnition _cannot_ be exploited by a single opponent player as long as all players are playing the equilibrium strategy. As they note in the paper this gives you a strong optimality guarantee in the 2-player setting. It was unclear whether this would transfer to real-world winnings in the multi-player case, and while it looks like it does for now (for current strategy-profiles of human players) humans might be able to adapt to the strategy played by the bot. Given the fact that the bot wins against current human strategy-profiles in the n-player setting, it's likely (but not a sure thing) that human players will have to team-up against the bot to exploit it. That seems rather unlikely to me.

noambrown 6 years ago

I'm one of the authors of the bot, AMA

n3k5 6 years ago

What took you so long? I mean not the Pluribus team specifically, but Poker AI researchers in general.
The desire to master this sort of game has inspired the development of entire branches of mathematics. Computers are better at maths than humans. They're less prone to hazardous cognitive biases (gambler's fallacy etc.) and can put on an excellent poker face.
As a layperson who's rather ignorant about both no-limit Texas hold 'em and applicable AI techniques, my intuition would tell me that super-human hold 'em should have been achieved before super-human Go. Apparently your software requires way less CPU power than AlphaGo/AlphaZero, which seems to support my hypothesis. What am I missing?
Bonus questions in case you have the time and inclination to oblige:
What does this mean for people who like to play on-line Poker for real money?
Could you recommend some literature (white papers/books/lecture series/whatever) to someone interested in writing an AI (running on potato-grade hardware) for a niche "draft and pass" card game (e.g. Sushi Go!) as a recreational programming exercise?
- noambrown 6 years ago
  
  I think it took the community a while to come up with the right algorithms. So much of early AI research was focused on beating humans at chess and later Go. But those techniques don't directly carry over to an imperfect-information game like poker. The challenge of hidden information was kind of neglected by the AI community. This line of research really has its origins in the game theory community actually (which is why the notation is completely different from reinforcement learning).
  Fortunately, these techniques now work really really well for poker. It's now quite inexpensive to make a superhuman poker bot.
  
  amelius 6 years ago
  
  So will this be the end of online poker?
  
  tempestn 6 years ago
  
  It's pretty easy for good players to recognize other good players. And since the house takes such a large cut, the only way for pro players to have positive expected value online is to seek out games with poor players. So even if they couldn't recognize the bots as such, they would see them as tough players and avoid them.
  That said, I suppose it would be possible for the bots to become so prevalent that all this sort of opportunity is effectively used up, so the return vs time and risk for a human player is no longer worthwhile. (That already happened long ago for most players, as the initial online poker boom faded and most casual players left.)
  On the other hand, all the major platforms have terms prohibiting using bots, so their numbers might be sufficiently limited to prevent that scenario.
  
  oppiz 6 years ago
  
  It's my understanding the big sites have some pretty sophisticated bot detection systems, so in theory a bot that would be successful at beating online poker couldn't be a huge winner, it'd presumably raise too many red flags. However, if it were a near break-even player, with dozens, if not hundreds, of instances running at any given time, it's going to slowly grind out a substantial figure. You'd also have to take into account that the sites are monitoring things like reaction times to bets and raises, hand range consistency, etc. I'm not a coder, but it seems like it'd be a tremendous undertaking to code a bot that would be a substantial threat to players. Then again, maybe I'm naive about the level of scrutiny the poker sites are employing.
  
  waste_monk 6 years ago
  
  One of the professors I used to work with some years ago was involved in stylometry research on human-computer interactions such as keystrokes and mouse input (for example, to determine if a user who had authenticated successfully earlier is not the same person currently typing based on keystroke cadence and pattern analysis - e.g. if someone sat down at an unlocked workstation and started typing, you could detect it and force them to reauthenticate).
  It would probably be possible to figure out the types of detection being performed by the poker sites and use adversarial training methods to train a machine learning solution to mimic human input patterns. Or, more pragmatically, have the bot analyse the state of the game and give orders for a human to perform at their own natural pace.
  
  everdev 6 years ago
  
  Poker sites mainly detect bots based on their login times, number of tables, time per action, etc.
  A successful bot shouldn't get caught for "playing like a bot" because the moment it's actions are that predictable it would presumably no longer be effective.
  But it will get caught for operating like a bot. So, don't run it 24hrs a day. Sites also randomize things to keep bots at bay, even card imagery.
  If your performance and success drops whenever they randomize something that gives the bot false inputs, then you might get caught.
  Inputting all of the poker events manually would be really tedious I'd imagine.
  Of course, if you're winning millions, they can interview you about your poker history and how you got so good.
  It sounds like easy money, but probably not.
  
  Angostura 6 years ago
  
  Just play as you normally would, with the bot advising moves from the laptop next to you.
  
  everdev 6 years ago
  
  Right, but the bot needs to know who is in what position what the bets are, who folded, etc. Try inputting all of that information manually to the laptop next to you and you'll quickly get frustrated. Online poker is a fast game with lots of data points.
  
  NetBeck 6 years ago
  
  TensorFlow, PyTorch, Caffe, Keras, MXNet, and OpenCV could copy the game if you split the video input for the player and the bot.
  
  everdev 6 years ago
  
  Yes, but see my previous comment.
  People have tried it and online poker sites know they've tried it, so they'll randomize images and other data. If you take a dive when the randomizations are triggered and outperform otherwise good luck trying to collect your winnings.
  
  alienallys 6 years ago
  
  An external camera with Image processor does that
  
  vietbold 6 years ago
  
  Or fucking screen grabbing the screen of the active computer. Less artifacts
  
  tempestn 6 years ago
  
  Not to mention, if you get caught, there could be worse consequences than just having your account locked. The site could (and likely would if the scale was significant) sue you for not only all your winnings, but damage to their business. They would likely win (since you're flagrantly breaking their terms of use contract), and bankrupt you.
  Edit: In fact, if we're talking worst case, circumventing their anti-bot restrictions would presumably be illegal under the CFAA. So if you're in the US you could even be charged criminally, although I expect in reality that would be less likely.
  
  tuesdayrain 6 years ago
  
  >You'd also have to take into account that the sites are monitoring things like reaction times to bets and raises, hand range consistency, etc.
  You might be surprised by the lengths people go to in order to bypass bot-detection just for ordinary games. All of the things you mentioned are pretty standard. Considering there is serious money on the line here, I am positive that plenty of poker bots will be virtually indistinguishable from professional players, if they aren't already.
  
  Phillipharryt 6 years ago
  
  The same argument of money being on the line applies to the detection. Poker software is already pretty damn impressive with its tracking. The online casinos actually stand to lose more money than the bot creators could make, so the detection has a greater incentive, and is likely to triumph.
  
  pbhjpbhj 6 years ago
  
  They only lose if there are less plays, surely? I assume they take a cut of all winnings, they're not putting up stakes.
  
  Phillipharryt 6 years ago
  
  Yes, I'm assuming that if bots work their way into everyday online poker that people will stop using it, so there would be less players.
  
  oppiz 6 years ago
  
  I guess the real threat isn't a "bot" but something in the way of a program that interprets the data on the screen real-time and whose output instructs the player of the "optimal" play, given the circumstances. How the hell would you deter that as a site operator?
  
  chongli 6 years ago
  
  No, I think your earlier example of a swarm of just-above-break-even bots would be much more difficult to combat. Even if they can be detected, the anti-detection countermeasures can evolve, turning it into an arms race. Anything you can model in your bot detection algorithm, the bot-maker can model too.
  Reaction times ought to be one of the easiest things to fake. All it would take is a bunch of monitoring of large numbers of games to create a nice model of real player reaction times, which in all likelihood are normally distributed anyway.
  
  disgruntledphd2 6 years ago
  
  Not normally distributed, as negative reaction times are unlikely. You could use log-normal, but I believe that a mixture of exponential and gamma tends to be used by reaction time researchers (search ExpGamma).
  
  chongli 6 years ago
  
  negative reaction times are unlikely
  Oh, right. I was thinking along the lines of 100m dash, where people often do have negative reaction times (which we penalize as false starts).
  In poker we don't have much of an incentive to react instantly to any play.
  
  BigJono 6 years ago
  
  Pretty sure I've read a long time back on 2p2 that large consistant winners on certain sites have been asked to submit camera footage of their play with a clear view of screens and inputs. So this is probably something that companies like Pokerstars have been dealing with for years already.
  
  colordrops 6 years ago
  
  It would be pretty easy to hide something signalling you on what to play from cameras.
  
  tempestn 6 years ago
  
  True, but ultimately if they're unsure they'll just ban you from the platform anyway. Consistent, winning players aren't really where they make their money, and they're free to ban anyone they like. (I realize technically they take a cut from all players, but more money gets sloshed around for them to skim off of if winning players aren't removing it from the system.)
  
  wastedhours 6 years ago
  
  That was what I was thinking, the bot augmenting a human's playing ability rather than playing itself.
  
  RomanBob 6 years ago
  
  > So even if they couldn't recognize the bots as such, they would see them as tough players and avoid them.
  The problem would be if i was a pro i would rather run 1000 bots than play myself. Which means the only players left are AI and fish. Once the fishies learn of this fact, they will abandon in drove.
  It's all gonna go back to live poker soon.
  
  ggggtez 6 years ago
  
  No, having losing odds never stopped anyone from gambling.
  
  olalonde 6 years ago
  
  That's simply not true... I don't play casino games because of the losing odds. I play poker because of the winning odds. I guess you meant "having losing odds doesn't stop everyone from gambling".
  
  traderjane 6 years ago
  
  Even with a magical human test, you couldn't know whether it was human + robot performance.
  
  salty_biscuits 6 years ago
  
  Just bet on bots playing each other.
  
  drjesusphd 6 years ago
  
  So... like Wall Street!
  
  MRD85 6 years ago
  
  I'm sitting here considering the possibility of making my own bot to play low stakes online poker ($1.50 sit n go). Run it on 6 tables at once and I imagine it would be facing really poor opponents and would have a steady flow of cash.
  
  moate 6 years ago
  
  until your bot gets caught (possibly quickly) and then you're banned from the sites.
  
  kyleblarson 6 years ago
  
  Even if it is, it means a new live poker boom which is a very good thing
  
  joaomacp 6 years ago
  
  It must be. It is way too hard to prevent humans from using an AI. Some chess services try to check if you're playing "too perfect", but in poker that's harder to do, and there's way more money on the line.
  
  icelancer 6 years ago
  
  >> Some chess services try to check if you're playing "too perfect", but in poker that's harder to do, and there's way more money on the line.
  Not really. With perfect information you know the correct strict equity plays assuming normal opponents. This doesn't give you the ultimate answer, because a player's reads and inference about another player is definitely an input - especially at the highest level - but it is more than enough to give you a winning/losing player at the small/midstakes.
  source: worked for an online poker company that had these tools... and far more available to us
  
  badfrog 6 years ago
  
  > a player's reads and inference about another player is definitely an input - especially at the highest level
  I think player-dependent strategy is more important at lower levels because the players are much further away from what you call "normal opponents", so there's far more opportunity to exploit their mistakes.
  
  chance_state 6 years ago
  
  >Some chess services try to check if you're playing "too perfect" [...]
  That's interesting, could you share an example? Most of my search results are anecdotal Reddit threads about how many people cheat in online chess.
  
  MikeHolman 6 years ago
  
  All the major online chess websites have anti-cheat mechanisms. They don't publish details of how they detect cheaters though, and I don't know how good they are.
  From what I've read, they work by comparing the player's moves against chess engines, and if the player is picking engine's choice too often in positions where there are multiple roughly equal moves, they get flagged.
  
  meruru 6 years ago
  
  I always found weird that someone would want to cheat in a game like online chess. I mean, what's the point? Does anyone have insight on what's going on in the head of cheaters?
  
  Jach 6 years ago
  
  A few reasons come to mind. One is simply that if you have any metrics (ranking, win/loss ratio, greater site access..) it's going to feel nice to see them improve. Another is that losing at anything can be ego-hurting (similar reason good players sometimes sandbag with new accounts / lower ranks they can't possibly lose to, they need to 'win' more). Or reverse sandbagging/trolling with a bot might be amusing. Another is the cheater may justify it as a self-teaching game, and might not always play the strongest move but see if their move is even in consideration or try to improve ability to see the better moves by having them always pointed out -- but why not just play the bot, or save that for post-game analysis? I like to run my go games through gnugo's annotated analysis at the end (as I'm very weak I assume even the weak gnugo can teach me things), it'd be too troublesome to use it in a live game.
  
  sakarisson 6 years ago
  
  Other players justify cheating by convincing themselves that everyone else is cheating.
  
  baq 6 years ago
  
  it's where the enjoyment comes from. cheaters don't enjoy the game as much as they seeing their ELO/MMR go up or in the worst case they're psychopaths who just want to mess with other people's heads.
  
  root_axis 6 years ago
  
  People enjoy the the feeling of having power over others.
  
  soup10 6 years ago
  
  Even so most people don't bother with standard games online since its way too easy to cheat by mirroring the game and basically undetectable if they are good enough to not play lines that look like "computerish" moves.
- icelancer 6 years ago
  
  >> Computers are better at maths than humans.
  OP discussed it but while this is true, it is not necessarily true or straightforward when it comes to games with hidden information like poker. This is more of a game theoretical problem (Economics) than it is a purely mathematical one, which had less support in the AI/ML community, hence the delay.
  The lower CPU/GPU/resource use supports that fact as does your intuition. Breaking poker required a lot of manual work and model design over brute force algorithms and reinforcement learning.
b_tterc_p 6 years ago

The bot does not seem to consider previous hands in its decisions. That is to say, it does not consider who it is playing against. Should this affect how we perceive the bot as “strategic” or not? Bots that play purely mathematically optimally on expected value aren’t effective or interesting. But it feels like this is playing on just a much higher order expected value.
It feels like a more down to earth version of the sci fi super human running impossible differential equations to predict exactly what you will do given knowledge that he knows what you know what he knows... etc. ad Infinitum. But since it doesn’t actually consider the person it’s predicting, it may simply be a really really good approximation of the game theoretic dominant strategy.
At what complexity of game and hidden information should we feel like the bot can’t win by running a lookup table?
- noambrown 6 years ago
  
  The bot bluffs, and understands that when its opponent bets it might be a bluff. I would consider that to be strategic behavior. The fact that its strategy is determined by a mathematical process doesn't change that in my opinion.
  
  b_tterc_p 6 years ago
  
  It does bluff, but that’s not my point. My issue is that it bluffs without consideration of its opponent. High level strategic play of most games is about adapting to your opponents play. This bot does not do that. It is secretly a giant lookup table of game state to response.
  In the case of poker, it appears that adaptability is not as good as pure mathematical optimization. Humans can adapt their strategy, but it’s basically just worse regardless because this thing has cracked the code.
  I’m surprised that you managed to beat pros without adaptability. It’s pretty impressive and says a lot about how we define strategy. If human adaptability is just not as good as machine optimality across all games, we could imagine discovering that an adaptable poker AI can’t outperform this one. It raises a whole lot of interesting questions because lots of criticism towards something like Starcraft AI is that it is strategically stupid and doesn’t adapt. Now the Starcraft Ai is admittedly kind of stupid now, but we may hit a wall on its creativity simply because creativity is, despite human intuition, a dumb idea.
  
  Cybiote 6 years ago
  
  If you think about it, any AI that's stopped learning and is now efficiently doing pattern matching or pattern completion (assuming memory and attractor states), instead of running a complex search, is arguably a fancy lookup table hashed by similarity. This includes humans. In other words, lookup table isn't the slight most think it is. But the bot does do real time search so it's not "merely doing" a look-up.
  Because of how Poker is not sub-game solveable (it is not possible to self-locate within the tree), this bot's play has to get into its opponent's mindspace in a sense. To not be exploitable, it essentially has to infer the other player(s) hidden state and paths from observed actions. This isn't something I've seen in Dota, Starcraft, Chess, Go bots.
  It's true that it doesn't learn online to find exploitable patterns of other players, but doing this without also making yourself exploitable in turn is a very difficult other problem. Low exploitable near optimal play according to game theoretic notions is considered strategy.
  While you're correct that online learning is powerful and something machines are not currently good at (in complex spaces), you can avoid being exploited without learning if your experience is rich enough and you know how infer what your opponent is trying to do and anticipate them. I'd argue this lineage of poker bots are the closest to playing that way of the major game playing bots.
  
  b_tterc_p 6 years ago
  
  I don’t mean look up table as a bad thing. I mean it’s a lookup table on game state, without incorporating any information about the players. But good points
  
  tialaramex 6 years ago
  
  > High level strategic play of most games is about adapting to your opponents play.
  Is this true in any meaningful sense?
  For heavily studied games there's usually a theoretically optimal play independent of the opponent's interior state, this is obviously true for all the "Solved" games, which includes the simpler Heads Up Limit Hold 'Em poker (solved by Alberta's Cepheus project) but it seem pretty clearly true for as-yet unsolved games like Go and Chess too.
  I'm very impressed by this achievement because I had expected good multi-player poker AI (as opposed to simple colluding bots found online making money today) to be some years away. But I would not expect "adaptability" to ever be a sensible way forward for winning a single strategy game.
  
  Cybiote 6 years ago
  
  Adaptability is certainly not necessary (almost by definition) if you're playing a near to equilibrium strategy but adaptability is a useful skill to have in a general non-stationary world.
  That said, for this bot, I wouldn't say it's playing completely independent of the other players's interior state. Pluribus must infer its opponents strategy profile and according to the paper, maintains a distribution over possible hole cards and updates its belief according to observed actions. This is part of playing in a minimally exploitable way in such a large space for an imperfect information game.
  
  b_tterc_p 6 years ago
  
  > Pluribus must infer its opponents strategy profile
  This is what interests me. It doesn’t do this. In fact because it played against itself only, it is should be assumed that the only strategy profile it considers is its own.
  
  Cybiote 6 years ago
  
  You're right that it uses itself as a prototype for decisions but the fact that it also maintains a probability distribution over possible hole cards and that it updates according to observed actions is already richer than the local decision only approach taking most all other bots. This is sort of forced by the simplicity of poker's action space combined with the large search space and imperfect information. Here, the simplicity ends up making things more difficult! They also use multiple play styles as "continuation strategies" so it's a bit more robust. And to be fair, I suspect much of human play does use themselves and experience as a substitute too.
  
  lmm 6 years ago
  
  > For heavily studied games there's usually a theoretically optimal play independent of the opponent's interior state, this is obviously true for all the "Solved" games, which includes the simpler Heads Up Limit Hold 'Em poker (solved by Alberta's Cepheus project) but it seem pretty clearly true for as-yet unsolved games like Go and Chess too.
  In an n-player game, a table can be in a (perhaps unstable) equilibrium which the "optimal" strategy will lose at. This has been demonstrated for something as simple as iterated prisoners' dilemma (tit-for-tat is "best" for most populations, but there are populations that a tit-for-tat player will lose to). I don't play poker but I've definitely experienced that in (riichi) mahjong - if you play for high-value hands the way you would in a pro league, on a table where the other three players are going for the fastest hands possible, you will likely lose.
  
  Phillipharryt 6 years ago
  
  Well in online poker high level players make great use of player tagging, taking notes about players they have played before and what they've done in important hands or their patterns. Software exists to track how opponents behave in any given situation, and if it pops up again you use that.
  I would think if professional players are utilising this information, a bot could benefit from it. I don't see how they would ever lose out from this information, even if it only uses situations where the opponent has a history of 100% of the time responding a certain way.
  I am impressed by the bot but I have to laugh a bit because years ago I joked with a friend about making an "amnesiac bot" that had no recollection of previous hands, it seemed so useless we obviously didn't make it, we've evidently been proven wrong. (pointless tangent there)
  
  tialaramex 6 years ago
  
  Player tagging just makes you exploitable. I play one way now, you tag me "Haha, fool bet-folds way too much" and then I change it up to exploit you, "Huh, I keep trying to fold him out with worse and he doesn't bite even though my notes say he will".
  The theoretically optimal play just skips that meta and meta-meta play and performs optimally anyway. Because poker involves chance the optimal play will be stochastic and so you can stare at the noise and think you see a pattern, that just means you'll play worse against it, because you're trying to beat a ghost.
  For example, suppose in a certain situation optimally I should raise $50 10% of the time. It so happens, by chance, that I do so twice in a row, and you, the note-taker, record that I "always" raise $50 here. Bzzt, 90% of the time your note will be wrong next time.
  
  madog 6 years ago
  
  You would be a fool to act based off only 2 instances of seeing a particular behaviour. That's why you have to weigh up how many instances you've seen. Sometimes if it's less than X instances it's not worth considering that particular statistic as valid.
  Now say I have thousands of hands viewed against you, and you raise pre-flop 50% of the time. That is pretty significant information about the types of hands you play. If I have only 10 hands I've observed, that same stat means nothing.
  The theoretical optimal play depends on who you're playing, as more value could be extracted in certain situations vs certain players.
  For example, if I've seen you face a pre-flop 3-bet 1000 times and you've folded 99% of the time. That would be a good opportunity to recognise that 3-bet bluffing this player more often would have value, and be a more optimal play than some default. Contrast playing someone who called pre-flop 3-bets 75% of the time it wouldn't be optimal to 3 bet bluff here. Different opponents, different optimal plays.
  
  agent008t 6 years ago
  
  I think we need to make a distinction between two kinds/styles of play:
  1. Coming up with an unexploitable strategy, then scaling it up by playing as many hands as you can, earning the slim expected value each time.
  2. Picking a good table / card room / 'scene', and then trying to extract as much value from it as possible.
  You most often see 1 online, and 2 live, for obvious reasons.
  A skilled human would be a lot more successful, I believe, than a bot in case 2. For 2, important skills are:
  1. Be entertaining. You have to play in a way that is entertaining to those playing with you, such that they want to continue playing with you (and losing money to you). Good opponents (i.e. that are bad at poker but want to play high stakes) are hard to find, it is vital that you retain them.
  2. Cultivate a table image, then exploit it. Especially important for tournament play, where you have the concept of "key hands" that you really need to win to potentially win the tournament. With the right table image, you may be able to win hands you otherwise wouldn't have won.
  3. Exploit the specifics of the players you are playing against. Yes, that also makes you exploitable, but the idea is to stay one step ahead of your opponents.
  
  Cybiote 6 years ago
  
  Note that 1) is only true if your opponent is also not making many mistakes. Which fails to be true for most humans, where the combination of randomization and calculating state appropriate ranges is very difficult. This means that weak players can still lose heavily from mistakes/poor play within a reasonable number of hands, it need not be slim.
  Furthermore, you can kind of account for such players by including more random or aggressive profiles in the inference/search stage.
  
  Phillipharryt 6 years ago
  
  Player tagging is more complicated than a single game, and goes far deeper than playing a few hands one way and then switching it up. You can have player stats based on thousands of hands, you can know things about your opponent even they don't know.
  I don't think you play very much, which is fine, but makes this discussion a bit pointless.
  
  barry-cotter 6 years ago
  
  > In the case of poker, it appears that adaptability is not as good as pure mathematical optimization. Humans can adapt their strategy, but it’s basically just worse regardless because this thing has cracked the code.
  Adaptability is beaten by perfect strategic play in games with clear victory conditions.
  My familiarity with optimal control theory is nil but Kydland (1977) applied it to monetary policy to show that the right rules dominate discretion. What the right rules are for monetary policy is still an open question though, because while the victory conditions in economic policy are clearly defined the surrounding environment is very far from static so you deal with out of training set data regularly. Once AI can deal with these kind of out of context problems it seems plausible GAI is a matter of time.
  http://www.finnkydland.com/papers/Rules%20Rather%20than%20Di...
  > Rules Rather than Discretion: The Inconsistency of Optimal Plans
  > Even if there is an agreed-upon, fixed social objective function and policymakers know the timing and magnitude of the effects of their actions, discretionary policy, namely, the selection of that decision which is best, given the current situation and a correct evaluation of the end- of-period position, does not result in the social objective function being maximized. The reason for this apparent paradox is that economic planning is not a game against nature but, rather, a game against rational economic agents. We conclude that there is no way control theory can be made applicable to economic planning when expectations are rational.
  
  slg 6 years ago
  
  "Strategic" is probably the wrong word, but I think there is a valid question here regarding the approach the AI is taking. One of the key things for a good poker player is having the ability to adapt and adjust their strategy depending on how others at the table are playing. Sometimes you can have the exact same cards in the exact same position and in one game it is smart to fold and in another game it is smart to raise. From the description in the article, it doesn't appear that this AI takes those ebbs and flows into consideration. Instead it seems to play "purely mathematically optimally on expected value" that was honed through trillions of simulations.
  There is a cliche about how poker is about playing your opponents and not the cards. Is this AI is only focusing on its cards and ignoring its opponents?
  
  noambrown 6 years ago
  
  The AI doesn't adapt to the opponents, and that's still an interesting challenge for AI research. That said, at the end of the day, it was making quite a bit of money playing against elite human pros. I think that suggests the cliche is, at least in part, wrong.
  
  slg 6 years ago
  
  Making "quite a bit of money" still leaves open the possibility that the AI is leaving a lot of money on the table by not taking opponents into consideration.
  Also I would be curious to see how it performs against people that aren't "elite human pros". Would this AI win at a higher rate in a game against average recreational players compared to the rate a pro would win?
  Lastly it is also possible that the pros simply didn't have enough time to adapt to the AI which would be extra important considering the AI plays unlike humans and therefore is harder to predict.
  
  noambrown 6 years ago
  
  I think the bot would make a lot of money playing against average recreational players, but it's absolutely true that if you can exploit bad players' weaknesses, then you can make more money than what the bot would earn.
  We played 10,000 hands over 12 days in the 5 humans + 1 AI experiment. That's quite a long time, and there's no indication that they even began to uncover any weaknesses in that time period. So I'm fairly confident the AI is robust to exploitation, and I think that's a very important quality to have in any AI system.
  
  slg 6 years ago
  
  That 10,000 total hands number isn't particularly meaningful on the point of adaptability because the humans aren't sharing information with each other. The important number is how many hands each individual human played against the AI. Another question would be whether the pros knew which player was the AI? Because if they didn't, you are basically throwing a modified Turing Test against the pros before they can even begin to try to find tendencies in the AI. Predicting opponents is a huge part of how people play poker. If the AI plays unlike any human, pros are at huge disadvantage against an AI compared to how they would fair against a similarly skilled but more traditional human player.
  None of this is meant to diminish what you all accomplished, I'm just highlighting areas of poker in which this AI would be less successful than humans even if it is more successful overall.
  
  noambrown 6 years ago
  
  The humans knew the whole time which player was the bot.
  
  hajile 6 years ago
  
  There was an interesting IRL poker game a few years ago. The player who was running behind started going all in on every hand without even looking at their hand (with a huge amount of success).
  Out of curiosity, how does a bot deal with oddities things like this?
  
  bostik 6 years ago
  
  This is a solved problem. Open-shoving is a feature of sit-n-gos, so of course people have simulated these and compiled so called "pushbot tables". The parameters are basically pot size and winning probabilities against a random hand.
  While this particular bot may not have those programmed in, a more powerful variant eventually will.
  
  dodobirdlord 6 years ago
  
  The mathematical theory explored in the paper is that if multiplayer poker isn't one of the multiplayer finite state games that pathologically fails to converge to a Nash equilibrium, then it has one, and this strategy should approximate it. Intuitions about adaptability and the advantages thereof aren't applicable in the scenario where the opponent is playing to a Nash equilibrium. You can perform equally well by participating in the other side of the Nash equilibrium, but anything else is a losing strategy. The fact that this approximation converges to a strategy that's actually really good suggests that there is a Nash equilibrium, and that the converged-upon strategy is converging on it.
  You can't out-think or adapt to a rock-paper-scissors opponent who selects at random. All you can do is also select at random and accept that the two of you have even odds.
- icelancer 6 years ago
  
  >> Bots that play purely mathematically optimally on expected value aren’t effective or interesting.
  Interesting is up to you, but effective is definitely wrong.
  ICM-perfect bots crush small tournaments, which do not take into account opponent behavior - merely modeling the gamestate. The faster the blinds and the smaller the stacks, the better, but even normal structures get killed by these so-called "expected value" only bots.
  Game Theory Optimal (GTO) attacks are incredibly effective at all levels of the game. The AI need not incorporate opponent feedback to be a winner. It can make it better, but it is not at all required.
bluetwo 6 years ago

First of all, I laughed at the 20-second average per game in self-play, since I ran into the same thing and have been trying to speed up the algorithm but haven't been able to get it faster (without throwing more hardware at it).
Second, I haven't read everything, but I believe you are playing a cash-game and not tournament-style. Is that correct? If that is the case, any chance you will be doing a tourney-style version?
[For those who don't play, in cash, a dollar is a dollar. In Tourney play, the top 2 or 3 players get paid out, so all dollars are not equal, as your strategy changes when you have only a few chips left (avoid risky bets that would knock you out) or when you are chip leader (take risky bets when they are cheap to push around your opponents).]
Also, curious how much poker you folks play in the lab for "research".
- noambrown 6 years ago
  
  We're doing cash games in this experiment. At the end of the day, this is about advancing AI, not about making a poker bot. Going from two-player to multi-player has important implications for AI beyond just poker. I don't think the same is true for cash game vs tournament.
  There's a cash game almost every night at the FBNY office! I don't usually play though -- I'm not nearly as good as the bot.
- wallawe 6 years ago
  
  > In Tourney play, the top 2 or 3 players get paid out
  Or top 2 or 3 thousand... depends on the tournament but it's usually the top 15% ish.
  
  bluetwo 6 years ago
  
  True, I am thinking "sit and go" tournament where you would have 6 players like in this research.
  
  icelancer 6 years ago
  
  Is there much to do here? ICM bots have this space covered pretty effectively.
  
  Phillipharryt 6 years ago
  
  But ICM is only a model that helps you evaluate information in the tournament, players will use it often to cap their bets or as a tipping point on a call, but I've never seen it used as a complete basis of play.
snarf21 6 years ago

How do you think these same pros would do in a follow-up match? As described in the article, the bot put players off their game with much more varied betting and with donks. Do you think the margin would decrease as players are exposed to these strategies?
Players face mental fatigue and have so over-learned their existing strategies that it takes time to adapt new strategies and even more time for those new strategies to become second-nature.
It reminds me of sports in a way. Teams start running a new wrinkle of offense in the NFL like the wildcat and it takes a few seasons for teams to instinctively know how to play defense correctly against that option.
- noambrown 6 years ago
  
  In the paper we include a graph of performance over the course of the 10,000-hand 5 humans + 1 AI experiment that was played over 12 days. There's no indication that the bot's performance decreased over time (there is a temporary downward blip in the middle, but that's likely just variance). Based on discussions with pros, it sounds like they didn't find any weaknesses and they didn't seem to think they'd find any given more time.
  
  TheChosenZygote 6 years ago
  
  I think it would be hard for the pros to find exploits against the bot, but they could definitely lose less. When using solvers, pros generally only input a couple of sizings for bets, and avoid 2x+ pot sizings, which from the video it seemed like the bot used at much higher frequencies than other pros.
  
  asdfman123 6 years ago
  
  I'm not great at poker, but I did play a decent amount and I know a lot of my strategy involves probing for other people's weaknesses and shifting my strategy mid game to throw people off.
  I feel like a lot of trained ML models have a lot laughable weaknesses, but perhaps they've been trained on every game they're well prepared for any tomfoolery.
  
  TheChosenZygote 6 years ago
  
  The bot is trained to play Game Theory Optimal, aka it's playing to be breakeven at worst, which is why I believe it would be hard for a human to beat it. It's not playing perfectly, but the edges it's giving up is so marginal to perfect play that a human is going to lose simply by making a mistake at some point, even if a human were to use a solver to completely optimize their strategy.
- MFLoon 6 years ago
  
  I also suspect it would not be able to maintain a ~40bb/100 hand win rate. The thing about human players is, while the best are capable of learning and employing truly balanced GTO strategies, in practice they rarely adhere to these because other humans (even good pros) will still have exploitable flaws in their strategies, and attempting to exploit these will be more profitable than sticking to the unexploitable strategy; of course it also opens the exploiter to counter-exploitation, creating a fluctuating cycle of players trying to exploit, getting exploited, then moving back towards playing unexploitably. That's the normal state of a pro's strategy in a given game - so to switch to a steady state of always playing unexploitably would be a fairly big adjustment even to top tier pros who are capable of it.
  
  snarf21 6 years ago
  
  Yeah, that is kinda what I was trying to tease out. These 10K hands are nothing compared to the XM of hands these pros have already played. It would be interesting to see how well they did after 1M hands. I'm sure the bot would likely still have an edge but I'd assume the players would adjust their strategy and but less confused by the random sized bets.
  I was also confused by the sample videos where everyone had $10K at the start of each of the demo hands. It was unclear to me if that just the simulation of the hands or actual game play. If everyone starts every hand with $10K, then the feat seems less strong as going all-in has less risk.
  
  splonk 6 years ago
  
  Stacks are reset to 10k at the beginning of each hand, so they can use every hand to train a single model with the same starting state.
  
  MFLoon 6 years ago
  
  The fixed stack size doesn't really discount anything to me - it makes sense as an experimental control; and it's a cash game so there's no additional risk to going all in regardless of stack size.
  But yea the sample size is definitely too small imo; when tested the heads up version of the bot some years ago they had it play a bigger sample (50 or 100k iirc?).
  
  bostik 6 years ago
  
  In online poker (at least with 100BB stacks) it's customary to top up between hands if you're below full stack.
  The reason is simple: with table stakes, your maximum win for a hand is constrained by your own stack size.
- asdfman123 6 years ago
  
  I remember reading in the mid-to-late aughts that a lot of old-school poker players that used more swagger and intuition were starting to be run out of the game by kids who applied statistical methods.
tc 6 years ago

Could you perhaps speak to some of the engineering details that the paper glosses over. E.g.:
- Are the action and information abstraction procedures hand-engineered or learned in some manner?
- How does it decide how many bets to consider in a particular situation?
- Is there anything interesting going on with how the strategy is compressed in memory?
- How do you decide in the first betting round if a bet is far enough off-tree that online search is needed?
- When searching beyond leaf nodes, how did you choose how far to bias the strategies toward calling, raising, and folding?
- After it calculates how it would act with every possible hand, how does it use that to balance its strategy while taking into account the hand it is actually holding?
- In general, how much do these kind of engineering details and hyperparameters matter to your results and to the efficiency of training? How much time did you spend on this? Roughly how many lines of code are important for making this work?
- Why does this training method work so well on CPUs vs GPUs? Do you think there are any lessons here that might improve training efficiency for 2-player perfect-information systems such as AlphaZero?
- noambrown 6 years ago
  
  We tried to make the paper as accessible as possible. A lot of these questions are covered in the supplementary material (along with pseudocode).
  - Are the action and information abstraction procedures hand-engineered or learned in some manner?
  - How does it decide how many bets to consider in a particular situation?
  The information abstraction is determined by k-means clustering on certain features. There wasn't much thought put into the action abstraction because it turns out the exact sizes you use don't matter that much as long as the bot has enough options to choose from. We basically just did 0.25x pot, 0.5x pot, 1x pot, etc. The number of sizes varied depending on the situation.
  - Is there anything interesting going on with how the strategy is compressed in memory?
  Nope.
  - How do you decide in the first betting round if a bet is far enough off-tree that online search is needed?
  We set a threshold at $100.
  - When searching beyond leaf nodes, how did you choose how far to bias the strategies toward calling, raising, and folding?
  In each case, we multiplied by the biased action's probability by a factor of 5 and renormalized. In theory it doesn't really matter what the factor is.
  - After it calculates how it would act with every possible hand, how does it use that to balance its strategy while taking into account the hand it is actually holding?
  This comes out naturally from our use of Linear Counterfactual Regret Minimization in the search space. It's covered in more detail in the supplementary material
  - In general, how much do these kind of engineering details and hyperparameters matter to your results and to the efficiency of training? How much time did you spend on this? Roughly how many lines of code are important for making this work?
  I think it's all pretty robust to the choice of parameters, but we didn't do extensive testing to see. While these bots are quite easy to train, the variance is so high in poker that getting meaningful experimental results is relatively quite computationally expensive.
  - Why does this training method work so well on CPUs vs GPUs? Do you think there are any lessons here that might improve training efficiency for 2-player perfect-information systems such as AlphaZero?
  I think the key is that the search algorithm is picking up so much of the slack that we don't really need to train an amazing precomputed strategy. If we weren't using search, it would probably be infeasible to generate a strong 6-player poker AI. Search was also critical for previous AI benchmark victories like chess and Go.
andr3w321 6 years ago

Any chance of the code being released or a cepheus style answer key being provided? http://poker.srv.ualberta.ca/strategy
- noambrown 6 years ago
  
  I don't think the poker world would be happy with us if we did that. Heads-up limit hold'em isn't really played professionally anymore, but six-player no-limit hold'em is very popular.
  
  andr3w321 6 years ago
  
  It depends who you ask. I think it's inevitable that it's released one day. By not releasing you're just delaying it.
  All the top high stakes players already have solvers that they've spent a lot of money developing and studying privately. They would definitely be upset with you, but by releasing the code you are democratizing the information to all the midstakes pros who want to study but don't have the resources to pay developers and solve the game privately.
  
  floodyberry- 6 years ago
  
  If you're already using programs to help you, I don't see how you can be upset if someone else is cheating better than you are.
  
  asdfman123 6 years ago
  
  Someone watch this guy and see if he buys any fancy watches or nice cars in the next few years. ;)
  
  CamperBob2 6 years ago
  
  Doesn't that make it a rather poor candidate for a scientific paper? Chest-thumping without data and code is, well, chest-thumping without data and code.
  
  ewhauser421 6 years ago
  
  Have you thought about open sourcing the non-AI pieces? It would be great for other researchers so they wouldn’t have to build the poker pieces from scratch
  
  noambrown 6 years ago
  
  There is some open-source code in this area, and hopefully there will be more going forward. Here's one example: https://github.com/EricSteinberger/Deep-CFR
  
  home_project123 6 years ago
  
  a. Is CFR applicable in single player hidden-information games? (e.g. state is initially hidden, gradually revealed to the agent, but there is not adversary)
  b. How much more efficient is the improved search algorithm? the $150 number sounds like a couple of order of magnitudes..
  
  noambrown 6 years ago
  
  a. There was this paper a couple years ago applying CFR to single-agent settings: https://arxiv.org/abs/1710.11424
  b. It really depends on the game and the situation. It can be several orders of magnitude in six-player poker. In other games, it can be even more.
  
  nradov 6 years ago
  
  Why are you concerned about the happiness of the poker world?
  
  anbop 6 years ago
  
  Well if they upset the poker world do you think they would have top pros willing to go on record endorsing them?
  
  nradov 6 years ago
  
  Top pros will endorse whatever they're paid to endorse.
  
  icelancer 6 years ago
  
  This is falsifiable by any number of cases, but Isaac Haxton spurning PokerStars is probably one of the best examples so others see your comment is not universally applicable.
  https://upswingpoker.com/isaac-haxton-pokerstars-partypoker/
  
  pbhjpbhj 6 years ago
  
  >However, Haxton isn’t accepting PokerStars’ olive branch as he was among the victims defrauded by the online giant for millions of dollars.
  I'm not sure the really provides strong opposition to the GP's claim.
  
  icelancer 6 years ago
  
  PokerStars offered to make him - and him alone - whole through sponsorship dollars. Haxton used to be their lead pro and is widely considered one of the very best players in the world.
  
  nickpsecurity 6 years ago
  
  It could just be for ethical reasons. I think anbop has a good reason even for unethical folks: hitting the best players hard in their wallets will definitely make it harder to recruit them for comparisons that validate these experiments. My prediction is that releasing this software will lead to profitable cheating like what people do with Blackjack at casinos.
  
  unityByFreedom 6 years ago
  
  Why not run the bot, post its proceeds transparently online, and donate everything to charity?
  By not releasing it, you're ensuring a higher concentration of money in the hands of a few, IMO.
  Anyone with access to this source code could run a bot themselves, or employ someone to do so.
  Plus, if you've accomplished this, no doubt someone can replicate it.
  
  wolco 6 years ago
  
  By not releasing it, it doesn't validate the experiment. How can we be sure there wasn't human support?
  
  Avamander 6 years ago
  
  As other commenters have said, I do too think you're delaying the inevitable but releasing now would mean you get credited with the first free solution.
isaacg 6 years ago

In your Science paper, you mention playing 1H-5AI against 2 human players: Chris Ferguson and Darren Elias. In your blog post you also mention playing 1H-5AI against Linus Loelinger, who was within standard error of even money. Why did Linus not make it into the Science paper?
- noambrown 6 years ago
  
  That took place after the final version of the Science paper was submitted. It would have been nice to include but it takes a while to do those experiments and we didn't feel it was worth delaying the publication process for it.
spenczar5 6 years ago

The article makes it sound like the AI is trained by evaluating results of decisions it makes on a per-hand basis. Is there any sense in which the AI learns about strategies that depend upon multiple hands? I’m thinking of bluffing/detecting bluffs and identifying recent patterns, which is something human poker players talk about.
- noambrown 6 years ago
  
  The bot handles each hand independently. How the players play in one hand does not affect how the bot plays in future hands at all.
  That said, it did train by playing against itself (before the experiment against the humans began).
  
  kyberias 6 years ago
  
  Interesting. Does this mean that it cannot adjust to human players "switching gears"? Isn't this a huge leak?
  
  rjldev 6 years ago
  
  It’s not a leak, it just means it cant beat the opponent for the maximum it could by playing the exploitative counter strategy vs their tendencies. Instead it just plays gto which will win against any given non gto strategy, though not for as much as the exploitative counter strategy. Playing an exploitative strategy however leaves you open for exploitation and this goes back and forth until the players converge onto gto, assuming the players are (very) good.
  t. former poker pro
Jach 6 years ago

Was Judea Pearl's work relevant for the counterfactual regret minimization, or is there some other basis? I've added CR to the list of things to look into later but skimming the paper it was exciting to think advances are being made using causal theory...
- noambrown 6 years ago
  
  The CFR algorithm is actually somewhat similar to Q-learning, but the connection is difficult to see because the algorithms came out of different communities, so the notation is all different.
throwamay1241 6 years ago

Who were the pros? Are they credible endbosses? Seth Davies works at RIO which deserves respect but I've never heard of the others except Chris Ferguson who I doubt is a very good player by todays standards (or human being, for that matter), but I've never heard of the others when I do know the likes of LLinusLove (iirc, the king of 6max), Polk and Phil Ganford.
Is 10,000 hands really considered a good enough sample? Most people consider 100k hands w/ a 4bb winrate to be an acceptable other math aside. However, as your opponent and yourself play with equal skill, variance increases to the point where regs refuse to sit each other.
- noambrown 6 years ago
  
  LLinusLove was one of the players. Chris Ferguson was in one of the 5 AI's + 1 Human experiment but not the 5 Humans + 1 AI experiment.
  We used AIVAT to reduce variance, which reduces the number of samples we need by roughly a factor of 10: https://poker.cs.ualberta.ca/publications/aaai18-burch-aivat...
- icelancer 6 years ago
  
  What? The pros chosen were definitely highly skilled players. They're fairly well known in the online poker community.
  Furthermore, Chris Ferguson, scumbag aside, is absolutely still a very good player by today's standards, and one way higher than the mean participant in a research experiment.
  10,000 hands is an effective enough sample at a certain win rate and analysis of variance of play; the n-value alone is not enough to tell you if it was enough hands.
- splonk 6 years ago
  
  They're credible enough. I'd like the sample sizes to be bigger as well but they're enough to verify that even if the bot got lucky over the sample size, it's close enough that it doesn't really matter. Add a bit more compute, optimize some algorithms a little, and you'd make up the difference. The real point is that they have a technique that scales to 6-max, and whether it's 97% or 99% is kind of immaterial in the grand scheme of things.
  FWIW, they did some variance reduction techniques that dramatically reduce the number of hands needed to be confident in your results, so the number of hands may be bigger than you think. e.g. the results of 10k HU hands have much higher variance than the results of 10k HU hands where everyone just collects their EV once they're all in.
- ayemeng 6 years ago
  
  Jimmy Chou, Jason Les, Dong Kim are affiliated with Doug Polk.
  It is an interesting point that these are pros but their specialities are either tournament or heads up. The current 6 max pros are LLinusLove, Otb_RedBaron, TrueTeller.
kapurs151 6 years ago

I'm very late to this post, so not sure if you're still around.
What are your thoughts on a poker tournament for bots? Do you think it could turn into a successful product? I've always wanted to build an online poker/chess game that was designed from the ground up for bots (everything being accessible through an API), but have always worried that someone with more computational resources or the best bot would win consistently. Is it an idea you've thought about?
tasubotadas 6 years ago

Congrants on the bot!
I have a few basic questions. I would like to implement my own generic game bot (discrete states). Are there any universal approaches? Is MCMC sampling good enough to start? My initial idea was to do importance sampling on some utility/score function.
Also, I am looking into poker game solvers - what would be a good place to start? What's the simplest algorithm?
Thanks
haburka 6 years ago

Why did you optimize for using less cpus? Was it a happy accident or a goal?
- noambrown 6 years ago
  
  A little bit of both. We didn't think we needed the extra computing power. And we really wanted to convey how cheap it is to make a superstrong poker AI with these latest algorithms.
waynecochran 6 years ago

Knowing when to bluff often depends on the psychology of the opponent, but since it trained playing itself it doesn't seem that knowing when to bluff would be learned. Did it bluff very often?
- noambrown 6 years ago
  
  The bot does bluff, and in fact it learns from self-play that bluffing is (sometimes) the optimal thing to do. At the end of the day, bluffing is simply betting when you have a weak hand. The bot learns from experience that when it bets with a weak hand, the opponent (another copy of itself) sometimes folds and it makes more money than if it hadn't bet. The bot doesn't view it as deceptive or dishonest. It just views it as the action that makes it the most money.
  Of course, a key part of bluffing is getting the probabilities right. You can't always bluff and you can't never bluff, because that would make you too predictable. But our self-play and search algorithms are designed to get those probabilities right.
  
  albedoa 6 years ago
  
  > when it bets with a weak hand, the opponent (another copy of itself) sometimes folds and it makes more money than if it hadn't bet.
  This makes no sense. If I am betting for thin value with a weak hand, then I make less money when my opponent folds. Does the bot not know whether it is bluffing or value betting?
  
  Denzel 6 years ago
  
  It makes complete sense. There’s a component of value and a component of bluff for a given hand in front of you. They’re related.
  Value betting and bluffing aren’t defined by the outcome of a hand — action yet to be completed. Poker is a game of hidden information so betting with “thin value” implies that your component of bluffing is larger. You want your opponent to fold more often than not when you have thin value because more often than not you’re actually beat.
  QQ can get KK to fold based upon board texture, street, and prior action. But you don’t know the other person is holding KK when you’re betting for “thin value” on the river.
  
  albedoa 6 years ago
  
  > You want your opponent to fold more often than not when you have thin value because more often than not you’re actually beat.
  No, that is simply not true. If I am betting for value, then I want my opponent to call no matter how weak I am or how thin it is.
  > But you don’t know the other person is holding KK when you’re betting for “thin value” on the river.
  Then it's a value bet. As you said, it's not defined by the outcome.
  
  maehwasu 6 years ago
  
  “Value betting” and “bluffing” are human heuristics to simplify complicated situations.
  The bot doesn’t “know” whether it’s value betting or bluffing—it’s not a relevant question. The relevant question is whether to bet, and what amount, in order to maximize value of the particular hand it has, with reference to the board and opponent actions taken.
  
  albedoa 6 years ago
  
  Right, we agree on that, but the above comment lumps all of what you describe (“betting with a weak hand”) under “bluffing” and says the bot learns that it makes more money when its opponent folds.
  
  kevinwang 6 years ago
  
  Where does your quote say that the bet is a value-bet? I read it as saying that the bot learned to bluff (not value bet) by betting when it has a weak hand (I.e. The bot has a weak hand, so it's getting better hands to fold by betting). The phrase "value bet"was not used.
  (This, in addition to what the other comments have said about there being spots where a bet can get better hands to fold with some probability AND get worse hands to call with some probability - see the chapter "The grey area between value betting and bluffing" in Applications of No Limit Hold Em)
  
  albedoa 6 years ago
  
  "At the end of the day, bluffing is simply betting when you have a weak hand."
  I was the one who introduced the term "value betting" to the conversation, applied specifically to weak hands.
  
  albedoa 6 years ago
  
  I mean, unless only those who interpret it wrong would respond, then I must be the one reading it wrong. Because these responses aren't lining up with how I read it or what I meant.
- femto113 6 years ago
  
  At the highest levels of play psychological factors are pretty minimal. Before a showdown which cards you actually hold aren't particularly material, as the only information you convey is through your bids. This means if you predict that you're more likely to win a hand by bidding (and inducing a fold) than by calling and going to a showdown it makes mathematical sense to "bluff". I'm sure AIs have no trouble learning that fact.
  
  elcomet 6 years ago
  
  The issue is that you don't know exactly the probability of your opponent folding.
  This is psychology.
  
  NhanH 6 years ago
  
  The probability of the opponent folding doesn't matter. The goal of bluffing in modern games is so that optimal players are indifferent in their decision (no matter how they play, you can't lose money). And because this is a zero sum game, if you can't lose money then you win if the opponent makes mistake.
  You only need to know the probability of the opponent folding so that you can deviate from the theoretical optimal strategy to win even more money if they are a biased player
  
  waynecochran 6 years ago
  
  I'll have to go back and watch Data playing poker on Star Trek NG -- what do sci fi writers think of this.
samfriedman 6 years ago

Are there any ethical considerations relating to the prospect of use of this bot for cheating in real-money games? Either from your internal team or after public replication?
- noambrown 6 years ago
  
  We're really focused on advancing the fundamental AI aspect. We're not here to kill poker. The popular poker sites have quite sophisticated anti-bot measures, but it's true that this is an arms race.
- mensetmanusman 6 years ago
  
  There are no ethical reasons why a game like poker must exist. In fact, poker gives a false sense of hope to the thousands of gambling addicts that enter casinos. It is a fun game, but there are an unlimited potential number of fun games..
  
  kzzzznot 6 years ago
  
  1 ethical reason it ‘must exist’ is that it is a man-made game that some people enjoy without causing harm to themselves or others. Not quite sure what you’re suggesting, but “banning poker” is not going to solve the problem of gambling addiction.
  
  srkigo 6 years ago
  
  I saw people who were going occasionally to casino without problems because nothing makes you lose and tilt so much as poker. I witnessed poker destroying families and people more than other games. There are people who don't like other casino games but lose heaps on poker and before they started poker their lives had more quality and meaning. I don't play other casino games but poker had a really bad influence on my life and the lives of people around me. Also, majority of money from poker comes from the players, not from the viewers and sponsors like in other sports, like football, baseball etc.
pogopop77 6 years ago

Very impressive. If my understanding of how the AI works is correct, it is using a pre-computed strategy developed by playing trillions of hands, but it is not dynamically updating that during game play, nor building any kind of profiles of opponents. I wonder if by playing against it many times, human opponents could discern any tendencies they could exploit. Especially if the pre-computed strategy remains static.
- noambrown 6 years ago
  
  We played 10,000 hands of poker over the course of 12 days in the 5 humans + 1 AI experiment, and 5,000 hands per player in the 1 human + 5 AI's experiment. That's a good amount of time for a player to find a weakness in the system. There's no indication that any of the players found any weaknesses.
  In fact, the methods we use are designed from the ground up to minimize exploitability. That's a really important property to have for an AI system that is actually deployed in the real world.
darse 6 years ago

A hearty congratulations, Noam, on finishing another chapter of the story i opened in the early 1990s...
Another person asked "What took you so long?", and i had the same question. :) I really thought this milestone would be achieved fairly soon after i left the field in 2007. However, breakthroughs require a researcher with the right amount of reflectiveness, insight, and determination.
Well done.
hoerzu 6 years ago

The progress you have made in this research field is amazing. What do you think will be next step or where do you the the future of your research?
- noambrown 6 years ago
  
  Thanks! I think going beyond two-player/team zero-sum games is really important. This was a first step, but it's definitely not the last. I'm hoping to continue in this direction, and maybe start looking at interactions involving the potential for cooperation in addition to competition.
splonk 6 years ago

I haven't finished digging through the paper and the supplement yet, but I'm curious about how many hands were multiway to the flop (and whether the percentages differ significantly between 1H/5AI and 5H/1AI). I'd guess that it's a pretty small fraction of the total hands, and I'm wondering what the performance is like in those particular cases.
- noambrown 6 years ago
  
  I don't have the exact percentages but I think it's less than 10%. It's not really possible to measure the bot's performance just in specific situations, but my feeling is the bot performs relatively well in these situations. Multi-way flops were basically impossible to do in a reasonable amount of time for past AI's. Our new search techniques make these situations feasible to figure out in seconds.
  
  splonk 6 years ago
  
  Cheers, thanks. One of the reasons I asked about 1H/5AI vs 5H/1AI is that historically the new bots for a given form of poker have played a bit wider than the accepted wisdom of the time, so I was curious if there were relatively more multiway pots with 5AI than with 5H.
  
  noambrown 6 years ago
  
  The pros described the bot's preflop strategy as very sensible, so I think it's unlikely there were more multiway pots with 5 AI's.
clavalle 6 years ago

What table information does the bot take into account? Position? Other player's stack size?
>Regardless of which hand Pluribus is actually holding, it will first calculate how it would act with every possible hand .
Is this information used to form an idea of what other players might be holding based on how the other player acts and how closely that action matches Pluribus's 'what if' action?
- throwamay1241 6 years ago
  
  No, it's to mask actions. If you bet big with monsters and check with air 100% of the time, you opponent knows when to fold and bet.
  iirc, the frequency of bets in that spot is roughly equivalent to the frequency of times you're definitely in front of your opponent in that particular spot, but not always with the hands that are beating your opponent.
  The concept is called Game Theory Optimal (GTO) and it's pretty popular in higher stakes games.
eries 6 years ago

Can you share some about what strategies the bot prefers and how these compare with common professional human strategies?
- noambrown 6 years ago
  
  We talk about this a bit in the paper. Based on the feedback from the pros, the bot seems to "donk bet" (call and then bet on the next round) much more than human pros do. It also randomizes between multiple bet sizes, including very large bet sizes, while humans stick to just one or two sizes depending on the situation.
  
  TheChosenZygote 6 years ago
  
  Is there a way to see the EV the bot is calculating when it's deciding between checking and donk betting? When you place these spots in solvers, they actually advocate for a significant amount of donk betting on certain boards, but pros don't do it because the EV is marginal and it's better for pros to simplify their strategy so they make less mistakes. If you have a flop donk bet strategy, you also have to develop a corresponding turn and river strategy, which makes it extremely difficult.
  
  MFLoon 6 years ago
  
  When human players donk bet it's almost always a weak player employing an extremely exploitable strategy, whereas pros almost never do it because the metagame has evolved around the presumption that nobody ever donk bets. I'd love to see what the bot's balanced GTO donking strategy looks like.
  
  splonk 6 years ago
  
  It's basically been true along every step of the the poker bot evolution (HU limit, HU NL, and 6-max NL) that the bots donk a lot more than the humans. 10 years ago you could find pros arguing that donking in any situation is always wrong. That's been shifting for years, but still not to the level that the bots do it.
  My personal belief is that the "no-donk" strategy is an adaptation by fallible human minds to reduce the branching on the decision tree to something tractable.
  
  patrickfreed 6 years ago
  
  Your personal belief is likely correct. Balancing a donking range is incredibly difficult for humans and doing so perfectly likely yields only a very small EV bonus over just always checking. For humans it makes a whole lot of sense to reduce the branching in a case like that whereas for computers it doesn't really matter.
  Another good example is varying continuation betting sizes. A true GTO strategy would mix in a number of different sizings (and I'm sure the bots adapted to do this), but you only sacrifice a very tiny amount of EV by basically betting the same size every time. Doing the latter limits humans risk for making errors which is far more valuable than squeezing out .05bb/100 more by varying the sizes.
  
  icelancer 6 years ago
  
  If true in cash games, it is funny since it is a not uncommon strategy in high-level tournament play to control pot size.
  
  newfangle 6 years ago
  
  Donk bets exist in the meta, ie when the turn is extremely good for your range but is horrible for your opponent. ( if you have a fd on the flop and it hits on the turn you can overbet the pot on the turn with your bluffs and foushes then just go all in on the river) if they have top pair its pretty hard to play against that
  
  MFLoon 6 years ago
  
  Oh, sure. I more meant flop donk bets; I guess it doesn't specify which street the donking was happening.
  
  russelldavis 6 years ago
  
  The same logic can apply to flop donk bets. Some flops favor the donking player's range more than their opponent.
  
  MFLoon 6 years ago
  
  Yea I'm not saying it's impossible to devise an unexploitable flop donking strategy. I think the reason thinking players generally don't is because of the complexity of adding significantly more branches early in the game tree - basically going from 3 (check-{fold,call,raise}) to 6 (those 3 plus donk-{fold,call,raise}).
  
  splonk 6 years ago
  
  The other issue is that increasing the number of branches also decreases the number of hands that go into each bucket, to the point where it might not be effective any more without being able to randomize the branch choice for specific threshold hands. Most pros I know just have a hard cutoff for each branch and don't worry too much if they're slightly out of balance, but smaller bucket sizes could magnify errors. If you have 31 combos for one action when you're supposed to have 30.5, then whatever, but if you have 6 when it should be 5.5, that could become a problem faster.
ewhauser421 6 years ago

Neal - super interesting stuff. Couple of questions:
1) What were the reasons for choosing 6-handed play (assuming logistical and costs)? It would be interesting to see how the bot’s strategy would differ in a full ring game. 2) Are there any plans to commercialize the bot as a tool for training human players?
- noambrown 6 years ago
  
  1) The goal was to show convincingly that we could handle multi-player poker. The exact number of players was kind of arbitrary. We chose six-player because that's the most common/popular format. Considering training the 6-player bot would cost less than $150 on a cloud computing service, I think it's safe to say these techniques would all work fine in other formats.
  2) I'm quite happy working on fundamental AI research and plan to continue in that direction.
- zone411 6 years ago
  
  6-handed is a very common format online.
DennisP 6 years ago

Are any papers available yet?
Is the bot going for game-theory-optimal play, or trying to exploit weaknesses in other players?
- noambrown 6 years ago
  
  The paper is here: https://science.sciencemag.org/content/early/2019/07/10/scie...
  It's going for game-theory-optimal play. It doesn't adapt to its opponents' observed weaknesses. But I think it's cool to show that you don't need to adapt to opponent weaknesses to win at poker at the highest levels. You just need to not have any weaknesses yourself.
  
  pezo1919 6 years ago
  
  I thought myself the same. However if players do expose each others weaknesses fast enough it could lead to a chip gain which might be hard to overcome right? Just in theory ofc. :)
  
  Bartweiss 6 years ago
  
  This is a great question. I wonder how this bot would do in a game with a couple of pros and a couple of reasonably skilled amateurs?
  Still well, I suspect, since straightforward theoretically-correct poker will take money off the amateurs efficiently. But it seems possible that playing to wipe out weaker or less consistent players could provide enough margin to bully the more stable AI player.
  
  bcassedy 6 years ago
  
  This is true in tournament play. In a cash game it doesn't matter since there is no elimination, you can always rebuy.
  And it's not like in the movies where if you don't have the money to call a bet, you lose. You simply are considered all in for the main pot and then sidepots that you aren't eligible to win will be created for any bets you can't cover.
  
  timClicks 6 years ago
  
  Yeah that is a really interesting insight. I presume that also makes optimization much simpler. The rules are fixed. Opponents are not.
  
  gfody 6 years ago
  
  > you don't need to adapt to opponent weaknesses to win at poker at the highest levels
  that may be true for limit poker, but in a no-limit tournament the best this bot could do is not lose. as the pressure increases with the blinds and the players are forced to bluff and call bluffs how does this bot avoid folding itself to death from a run of bad cards?
  I could see this bot doing well at cashing but I don't see how it could consistently place 1st the way the top human players do.
  
  DennisP 6 years ago
  
  Optimal play includes bluffing. It's "optimal" according to game theory.
  For example, game theory may tell you that in a particular situation, you can't be exploited if you bluff 10% of the time. If the opponent bluffs less than that, you can come out ahead by more often folding when he bets. If the opponent bluffs more than 10%, you can call or reraise when he bets. But if he bluffs the optimal amount, it doesn't matter either way, you can't take advantage of him.
  So this bot would bluff at 10% to avoid getting exploited, but wouldn't try to detect whether the opponent is exploitable. (The latter is risky since a crafty opponent can switch up strategies, manipulating you into playing an exploitable strategy.)
  
  wallawe 6 years ago
  
  To add onto this, some players that truly abide by the GTO strategy will use a prop, for example a watch, to determine what play to make.
  If you want your perceived range to be balanced and make x play 50% of the time and y play the other 50%, you look at the watch and if the second hand is in the first 30 seconds, you make x play, 30-60 seconds, y play.
  That's just an example but your point is 100% accurate.
  
  Bartweiss 6 years ago
  
  I think this comes down to ambiguity over what "optimal play" means.
  There's a poker strategy we might call 'deterministically' optimal play, which consists of precisely assessing each hand's expected value with little to no bluffing. This is already common in online cash games with both bots and players running multiple games at once. And you're right - it's excellent at running net-positive and not losing, but unlikely to win significant tournaments.
  Pluribus, though, is playing something close to game-theoretically optimal poker. In playing against itself, it's attempting to develop a takes-all-comers strategy with no exploitable weaknesses. That includes bluffing and calling bluffs - the goal is simply to find a mixed-strategy equilibrium where those moves are made some percentage of the time, in proportion to their expected payoffs. This can involve doing all of the same basic operations as pro players, like valuing button raises differently than donks or attempting to bluff based on how many players remain in the hand. The distinctive limitation is simply that Pluribus plays 'locally' optimal poker with no conception of opponent's identities or behavior in prior hands.
  
  gfody 6 years ago
  
  that's a helpful explanation thank you! I was misunderstanding the statement about Pluribus not modeling its opponents between hands as between rounds - it's definitely modeling its opponents and detecting bluffs by understanding when a bluff is likely strategically based on each opponents actions so far in the hand, it's just not taking anything it learned into the next hand.
  I could see this being an effective strategy in a WSOP, that ability to perfectly forget the previous hand is probably more valuable than anything the way WSOP champions play. I could see it coming down to whether or not the ability to exploit a reliable tell during a pivotal hand matters more than 10% of the time.
  
  MFLoon 6 years ago
  
  I couldn't find it confirmed in the primary or secondary article, but I would bet the bot is just playing cash at a fixed stack depth rather than a tournament; just like in the wild, bots are much more of a problem in online cash than online tournaments. Dynamically adjusting strategies by stack depth, number of players, and pay jumps, would probably be several orders of magnitude more complex.
  
  bcassedy 6 years ago
  
  Smaller stack sizes reduce possibilities and thus reduce complexity. Pay jumps result in chips having different utility to each player which forces some situational playstyles to be more optimal. I would guess that this also reduces the complexity of the game.
  Since tournaments don't often spend much time with stacks much deeper than 100bb, I would guess that tournaments would be more easily solved. Though tournaments are much more frequently run with 9-10 players rather than 6 at a table.
  https://www.cardplayer.com/poker-news/18226-explain-poker-li...
  
  MFLoon 6 years ago
  
  You're right that a single short stack hand in a vacuum has fewer game tree branches, and that factoring in chip utility is also fairly straightforward. But I strongly disagree that it reduces the overall complexity of the game. The model in the article played every single hand with 100bb; to be an effective tournament player it would have to be able to fluidly adjust strategies between big, medium and short stack play, as well as reasoning about the stack sizes of other players at the table. It's basically 4 different games at >100bb, 50-100bb, 25-50bb, and <25bb, so it would have to develop optimal strategies for each. And even if the shallower stacked games are generally simpler in isolation, there's a meta strategy of knowing which one to apply in a given hand with heterogenous stack sizes. To paraphrase Doug Polk "If cash game play is a science, tournaments are more of an art."
  
  maehwasu 6 years ago
  
  The bot could likely just be trained on the 4 or so different games. You’re likely increasing the complexity by a constant factor, nothing exponential here.
  
  dmoy 6 years ago
  
  > There were two formats for the experiment: five humans playing with one AI at the table, and one human playing with five copies of the AI at the table. In each case, there were six players at the table with 10,000 chips at the start of each hand. The small blind was 50 chips, and the big blind was 100 chips.
  In the fb article linked above.
  
  MFLoon 6 years ago
  
  Ah thanks. As I suspected, cash game with fixed 100bb stacks.
  
  WA 6 years ago
  
  Isn’t this survivorship bias, or do you know which player repeatedly will place 1st beforehand? Granted that poker is pretty popular, there must be quite a few people who always become first place.
  Or to turn this around: given enough bots, some bots will place 1st a lot more than others. It’s just unclear which one.
  
  newfangle 6 years ago
  
  The game actually becomes simpler when you have less blinds to the point where if you have 15 blinds or fewer you actually just follow a chart and go all in or fold preflop
  
  kzzzznot 6 years ago
  
  The blinds don’t increase it’s a cash game not tournament
confidantlake 6 years ago

Why would you choose Chris Ferguson to participate? Don't you know his terrible history?
- XalvinX 6 years ago
  
  maybe the fact that he hold a computer science degree figures into his involvement on some level? regarding his terrible history, was that ever really proven or admitted to? and if it was, why is he not in jail (or dead, for that matter...)?? my not-very-informed take on that situation was more like he was in the wrong place at the wrong time, black friday came along, and the money dried up really fast making for some unpaid debts for the company... perhaps i don't know the full story but still, he didn't seem to ever be charged with any wrongdoing, which seems odd to say the least
umanwizard 6 years ago

Congrats! As soon as I saw the title I thought “I wonder if this is the project Noam works on...”
- noambrown 6 years ago
  
  Thanks!
meuk 6 years ago

Congratulations on the win! Can you recommend any papers, blog(post)s, or books for the interested layman? (I am currently scanning though the facebook post, which is great, but personally I am looking for something more technical).
doctorpangloss 6 years ago

Do you want to do a Hearthstone / CCG bot? I have an engine and testers for you.
vagab0nd 6 years ago

Very interesting results. From the paper it sounds like the algorithms you used are very similar to Libratus (pre-solved blueprint + subgame solving). What change made it so that the computation requirement is much lower now?
- noambrown 6 years ago
  
  There were several improvements but the most important was the depth-limited search. Libratus would always search to the end of the game. But that's not necessarily feasible in a game as complex as six-player poker. With these new algorithms, we don't need to go to the end of the game. Instead, we can stop at some arbitrary depth limit (as is done in chess and Go AI's). That drastically reduces the amount of compute needed.
RivieraKid 6 years ago

Can you share more details about the abstraction? The paper is kind of vague on it. How does it decide if it should use 1 or 14 bet values? Is it a perfect recall abstraction? How many information sets are there?
- noambrown 6 years ago
  
  We give more details on this in the supplementary material.
baq 6 years ago

When do you solve bridge? :)
- canistel 6 years ago
  
  It is in a way disappointing that this question gets so little attention, and yet, it might be the most significant. If a bot can false-card - if it can discern the strategy that the opponents have in mind, and deliberately mislead them to its own advantage - we have a real world AI. However, skills of computer bridge programs remain at club level standards.
ryandrake 6 years ago

Interesting that the conventional wisdom of never open limping emerged as confirmed through self-play. What other general poker “best practices” were either confirmed or upended through this research?
yalogin 6 years ago

For someone not in the AI field, can you explain why AI is needed and an elaborate code with conditional blocks is not enough? Where does AI fit in with a poker game.
- b_tterc_p 6 years ago
  
  Conditional blocks would work, but it would be an impossibly detailed and granular tree to setup. The AI component simply helps you arrive at the decisions to create the complex tree.
smallgovt 6 years ago

This is super interesting! What steps would you recommend a professional poker player take in order to use AI to improve his/her personal poker skills?
o_p 6 years ago

Does it beat poker by reaching Nash eq (where you cant make profit and no one can profit from you) or exploits opponents weakness to seek profit ?
- noambrown 6 years ago
  
  It doesn't exploit its opponents' weaknesses. Its focus was on not having any weaknesses that its opponents could exploit. However, the algorithms are not guaranteed to converge to a Nash equilibrium in this setting because it's not a two-player zero-sum game (and in either case, it's not clear that playing a Nash equilibrium would provide much benefit in this setting).
mfwebser 6 years ago

What sort of defense applications could this sort of technology be used for? The last line of the Facebook blog post sparked curiosity.
estomagordo 6 years ago

Do you expect the human players to play at the best of their ability when they're not playing for actual real money?
- noambrown 6 years ago
  
  There was real money at stake in this experiment. The pros were guaranteed $0.40 per hand just for participating, but that could increase to $1.60 per hand depending on how well they did.
  To answer your question, no, I don't think human players would play at their best when not playing for actual money.
  
  estomagordo 6 years ago
  
  Sorry, I meant for way less than they typically play.
grizzles 6 years ago

Any chance you could put Libratus / Pluribus online for people like me to try to beat it?
- noambrown 6 years ago
  
  Unfortunately we don't have any plans to do that currently.
codefiddler 6 years ago

Are all the hands posted online somewhere for analysis. I would be very interested!
abstract7 6 years ago

How many games did the bot beat the same 5 players? And how many games were played?
- noambrown 6 years ago
  
  We played 10,000 hands of poker in the 5 humans + 1 AI experiment. The number of hands won isn't a useful metric in poker. If you win only 10% of your hands and make $1,000 on those hands, while losing only $1 on the other 90% of hands, then you're a winning player. The bot won at a rate of 4.8 bb/100 ($4.8 per hand if the blinds are $50/$100). This is considered a large win rate by professionals.
  
  badfrog 6 years ago
  
  > This is considered a large win rate by professionals.
  It depends on context. 4.8bb/100 is quite good for high-level online play, but wouldn't be enough to make a living at live poker. The biggest game that runs on a regular basis in most areas is $5/10. At ~33 hands per hour, that's 1.6bb or $16 an hour.
  And I'd assume there was no rake in your game? That would take a big chunk out of the rate.
  
  GregoryPerry 6 years ago
  
  For one player...
  $16/hr X (VM|microservice thread) could become astronomical profit.
  
  badfrog 6 years ago
  
  That's why it's a good rate online where you can play multiple tables. Unlimited VMs don't help you in a live casino.
logical42 6 years ago

Any chance you’ll consider releasing the hand history of the session?
- skater 6 years ago
  
  they're in the extra data section of the science mag article. formatting is terrible for importing into hand history viewers, so i'm trying to get a friend to re-format
hmate9 6 years ago

What was the most challenging part about implementing this?
- noambrown 6 years ago
  
  Honestly, probably debugging. Training this thing is very cheap, but the variance in poker is huge (even with the best variance-reduction techniques) so it takes a very long time to tell whether one version is better than another version (or better than a human).
dillonmckay 6 years ago

When will you test it with 10 total players in a game?
- noambrown 6 years ago
  
  The number of players is kind of arbitrary given the techniques we're using. We chose 6 because that's the most popular/common format for poker. I don't think there's any scientific value in also doing 10.
  
  dillonmckay 6 years ago
  
  I am obviously a human, not a bot, but in my experience playing poker, it seems much more likely for me to be successful, personally in a 6 player game, whereas a 10 player game, I never seem to do well.
ikeboy 6 years ago

Any plans to make money using this in online games?
- noambrown 6 years ago
  
  No, I don't have any plans to do that. This is really about advancing fundamental AI research.
cambaceres 6 years ago

What are the names of the poker pros the AI beat?
cambaceres 6 years ago

Are all the hands available to the public?
- noambrown 6 years ago
  
  The hand logs from the 5 humans + 1 AI experiment are included in the supplementary material of the Science paper.
  
  cambaceres 6 years ago
  
  They are missing the stack sizes of the players. Would love to have logs that include that info!
w_s_l 6 years ago

will you release the source code?
- noambrown 6 years ago
  
  Our goal is to make the research as accessible as possible to the AI community, so we include descriptions of the algorithms and pseudocode in the supplementary material. However, in part due to the potential negative impact this code could have on online poker, we're not releasing the code itself.
  
  downandout 6 years ago
  
  While you are not releasing the code to the general public, some people who worked on it obviously have access to it and someone will likely use it in the wild. The potential profits are astronomical - Rob Reitzen solved limit hold 'em and made what is rumored to be over $100 million hiring women to play online poker using his system from his house in Beverly Hills [1].
  Did you guys set any rules as to whether or not members of the team that worked on this are allowed to use it?
  [1] https://www.cigaraficionado.com/index.php/article/robotic-po...
anbop 6 years ago

Is this publicly available? How can I use it?
stevespang 6 years ago

Noam, I know you can't discuss the all the specific proprietary details of the algorithm, I just graduated Comp. Sci UT Austin and I'm totally captivated by any and all of the more technical details of the algorithm . . .
ProAm 6 years ago

What's the name of the bot? Please say its Poker McPokerface
- rotred 6 years ago
  
  This is literally in the second sentence of the article
  >A superhuman poker-playing bot called Pluribus has beaten top human professionals at six-player no-limit Texas hold’em poker...
  
  ProAm 6 years ago
  
  It was a joke.

auggierose 6 years ago

This is fascinating stuff. So do I understand this right, Liberatus worked using computing the Nash equilibrium, while the new multiplayer version works using self-play like AlphaGo Zero? Did you run the multiplayer version against the two-player version? If yes, how did it go? Could you recommend a series of books / papers that can take me from zero to being able to reprogram this (I know programming and mathematics, but not much statistics)? And how much computing resources / time did it take to train your bot?

noambrown 6 years ago

Training was super cheap. It would cost under $150 on cloud computing services.
The training aspect has some improvements but is at its core similar to Libratus. The search algorithm is the biggest difference.
There aren't that many great resources out there for helping new people get caught up to speed on this area. That's something we hope to fix in the future. Maybe this would be a good place to start? http://modelai.gettysburg.edu/2013/cfr/cfr.pdf
- dharma1 6 years ago
  
  Is Oskari Tammelin still working on this stuff? I remember he wrote some very fast CFR optimisations a few years ago

JaRail 6 years ago

So let me see if I understand this. I don't believe it's hard to write a probabilistic program to play poker. That's enough to win against humans in 2-player.

With one AI and multiple professional human players sitting at a physical table, the humans outperform the probabilistic model because they take advantage of each other's mistakes/styles. Some players crash out faster but the winner gets ahead of the safe probabilistic style of play.

So this bot is better at the current professional player meta than the current players. In a 1v1 against a probabilistic model, it would probably also lose?

Am I understanding this properly? Or is playing the probabilistic model directly enough of a tell that it's also losing strategy? Meaning you need some variation of strategies, strategy detection, or knowledge of the meta to win?

rightbyte 6 years ago

Interesting article. Too bad a don't have a subscription to read the paper.
The bot played like 10 000 hands. There is no way that is enough to prove it's better or worse than the opponents.
More so in no-limit where some key all-ins can turn the game up side down. The variance is higher than limit or fixed, right?
I did a heads up Texas holdem fixed bot with "counter factual regret minimization" like 8 years ago from a paper I read. It had to play like 100 000 hands vs a crappy reference bot to prove it was better.
Strategy detection in so short games is probably worthless.
The edge is probably in seeing who are tired or drunk in paper poker.
- junar 6 years ago
  
  They mention that they use AIVAT to reduce variance.
  > Although poker is a game of skill, there is an extremely large luck component as well. It is common for top professionals to lose money even over the course of 10,000 hands of poker simply because of bad luck. To reduce the role of luck, we used a version of the AIVAT[1] variance reduction algorithm, which applies a baseline estimate of the value of each situation to reduce variance while still keeping the samples unbiased. For example, if the bot is dealt a really strong hand, AIVAT will subtract a baseline value from its winnings to counter the good luck. This adjustment allowed us to achieve statistically significant results with roughly 10x fewer hands than would normally be needed.
  [1] https://arxiv.org/abs/1612.06915

GCA10 6 years ago

Hi Noam: I'm intrigued that you trained/tested the bot against strategies that were skewed to raise a lot, fold a lot and check a lot, as well as something resembling GTO. Were there any kinds of table situations where the bot had a harder time making money? Or where the AI crushed it?

I'm thinking in particular of unbalanced tables with an ever-changing mixture of TAG and LAG play. I've changed my mind three times about whether that's humans' best refuge -- or a situation that's a bot's dream.

You've done the work. Insights welcome.

cyberferret 6 years ago

With the advent of AI bots in Poker, Chess etc., what happens to the old adage of "Play the player, not the game". How do modern human players manage when you don't have the psychological aspects of the game to work with?

I see on chess channels that grand masters have to rethink their whole game preparation methodology to cope with the "Alpha Zero" oddities that have now been introduced into this ancient game. They literally have to "throw out the book" of standard openings and middle games and start afresh.

pk2200 6 years ago

The chess channels you're visiting are grossly overstating Alpha Zero's impact. AFAICT, it hasn't made any impact on opening theory at all. AZ's strength is in the middlegame, where it appears to be slightly better than traditional engines (like Stockfish) at finding material sacrifices for long term piece activity and/or mating attacks.
friedman23 6 years ago

> what happens to the old adage of "Play the player, not the game". How do modern human players manage when you don't have the psychological aspects of the game to work with?
I would say that it's thoroughly rebounded to play the game not the player in poker and this isn't because of super bots like the one used in this paper.
Ever since game theory invaded poker players that play in highly visible events such as tv tournaments try as hard as possible to make their game unexploitable.
yelloworld 6 years ago

Like already stated, saying that Alpha Zero has forced the chess world to seriously reconsider the basic principles of chess openings etc. is a bit of a stretch. But interestingly enough, the current world champion (Magnus Carlsen) is having the chess streak of his life as we speak. On the side, he's been openly joking about Alpha Zero being one of his biggest chess idols. It's safe to say the streak is probably mostly related to his preparation from the last world championship match half a year ago carrying over to all the tournaments after.
However, even according to the former world champion (Viswanathan Anand) the run he's been on is something quite shocking: “His results this year is simply [great].... difficult to find words. [It’s been] completely off the charts. I think the chess world is still in a bit of a shock. The rest of the players are struggling to deal with a phenomenon [like him]. Even in 2012-13, his domination was less than it is this year. Everyone is still processing this information.” [1]
Carlsen is basically on route to breaking 2900 Elo - at 2882 Elo with a clear upwards trend - while there's only two other active players even above 2800 Elo and struggling to keep it above that treshold. (Elo is the rating system used in chess. Above 1500 Elo is an average player, 2000 Elo is a good player, 2500 Elo is a grandmaster. Anything above 2700 Elo is basically godlike.)
Oddly enough, instead of playing more like a machine, it seems like Carlsen has been playing chess that is much more about the human aspect of the game rather than trying to find the top ranked engine move on every turn. (The current traditional top engine - Stockfish - makes an assumption of each move's validity using a point system, which the chess world has been more or less obsessing over for the past decade. Alpha Zero doesn't have such a point system whatsoever.) He's been playing a drastically more aggressive and dynamic variety of chess compared to what has been seen in a long time at the top tournaments.
He's been playing to create dizzying positions on the board, making a few moves that aren't necessarily liked by the traditional top engines, but still finding himself in a winning position several moves after. It definitely looks like some sort of black magic, but it seems like the big thing Alpha Zero has brought to the general philosophy on how to approach chess at the top level is that it's possible to play aggressive chess, take risks and win in 2019. Magnus Carlsen is the first player to successfully reinvent that style of play, more than likely partly inspired by Alpha Zero. So, I'd say the big thing about Alpha Zero isn't necessarily that it could beat the other top engines, but more importantly that the 'artistic' aspect of its play is something that has never been seen from another chess engine. The fact that it proved that sort of style superior to the play ever before played by another chess engine is just the icing on the cake.
Garry Kasparov on Alpha Zero's chess persona: "I admit that I was pleased to see that AlphaZero had a dynamic, open style like my own. The conventional wisdom was that machines would approach perfection with endless dry maneuvering, usually leading to drawn games. But in my observation, AlphaZero prioritizes piece activity over material, preferring positions that to my eye looked risky and aggressive." [2]
[1] https://sportstar.thehindu.com/chess/viswanathan-anand-on-ma... [2] https://science.sciencemag.org/content/362/6419/1087

neural_thing 6 years ago

How long until a slightly worse version of this model is reverse engineered and appears at every table in online poker?

DevX101 6 years ago

Slightly worse versions are already out in the wild. Bot using the published technique will be live in a couple of months tops.
- rightbyte 6 years ago
  
  Colluding bots are the main worry if you play online though.
jessriedel 6 years ago

Plenty of systems already exist that can win against weaker players and/or at limit (especially limit heads-up).
pennaMan 6 years ago

I'm wondering how long until poker games will require a captcha on every round
- baq 6 years ago
  
  Captchas are easier than the game
- stri8ed 6 years ago
  
  Will not help against human/AI hybrids. Use machine vision to decipher state of the game, and covertly suggest moves via audio or perhaps vibrations.
  
  throwamay1241 6 years ago
  
  Pokerstars who suspect accounts are bots actually require the player to film 360 degrees around themselves, then play for an hour with a camera focusing on input devices and the screen, they check for differences between current behavior and historic behavior
ehsankia 6 years ago

This is something I've always wondered, how come bots haven't taken over online poker considering how much money there is to be made, and all you need is to be slightly better than average right? Is high level poker really that hard to achieve?
- friendlybus 6 years ago
  
  Post blizerian's time in poker, the human players in online games use statistics to make insane bets on odds. They play many many games at once and just look for the opening and make the insane bets when the openings come up. They've done the math that it's worth it to do those kinds of bets.
  
  csa 6 years ago
  
  I did this with great success.
  For reference, I once folded top boat to quads (he showed) to a river all in raise in PLO to a dude who had a 100% win showdown when raise river stat over several thousand hands. Other stats confirmed he was a nit, so it was an easy fold. Iirc, this was PLO 200 or PLO 400 — I never saw anyone that nitty at the PLO 1000 or PLO 2000 tables.
  FWIW, I did a lot more than “look for an opening”, although I did a lot of that. I tried to play GTO as much as possible, but I would adjust to people who were exploitable when they called too much, folded too much, or were too aggressive into weakness.
  I spent a lot of time away from the table analyzing stats of the regulars to find leaks to exploit. It was worth the time, and it made it much easier to play 8-12 tables of PLO.
  
  dbancajas 6 years ago
  
  do you still play online? I used to railbird FT back then, watching patrick antonius/sahamies/durr/jungleman etc. exciting times. How many hours did you put in for you to be good?
  
  csa 6 years ago
  
  > do you still play online?
  No longer online. I quit with UIGEA. I hate Bill Frist.
  > I used to railbird FT back then, watching patrick antonius/sahamies/durr/jungleman etc. exciting times.
  I never played with those folks. For reference (and maybe I was vague, and maybe I show my age), plo 200 is 1/2 blinds plo, and 2000 plo is 10/20 blinds plo. My heyday was when 10/20 blinds were the max. When they created the 50/100 and higher limits, they killed the 10/20 blind (former max) games. I never played over 25/50, because I didn’t consider myself properly rolled for it. That said, in retrospect, I should have gone modified Kelly criterion and take shots at the higher stakes — some of those dudes were total donks.
  I did play against huck seed on FT (nit), and I played against Doyle and Todd on Players Only (I think their room was a skin). They also played tight, but they may have been doing required hours. People donated to them religiously with light calls.
  > How many hours did you put in for you to be good?
  I would argue that I am still not “good”. There’s a hierarchy in the poker world, and you don’t feel comfortable until you’re at the top — and even that is fleeting.
  To answer your question, though, I was profitable at 5/10 and 10/20 after maybe 1000 table hours (usually 4-8 tables) and 500-700 study hours. Note that this was when poker was super soft, and note that I am a specialist in learning (degrees, experience, and whatnot), so I learn things like new games more quickly than most people. My job lent itself to a lot of study away from the table, so I availed myself of that time.
  I remember several breakthroughs for my game:
  1. I had a dream one night in which I finally understood the bidirectionality of plo8. This was the game that I built my bankroll on (after cashing in a few free rolls). That took me to plo8 100 ($0.50/$1 blinds) in short order. After that, I just grinded to 200 and 400 plo and plo8.
  2. I remember getting crushed by a LAG player in plo 400 one night. I went to four plo 50 tables and played 12 hours straight playing with a 55% or so VPIP. I broke even in that session, but it helped me understand LAG players a lot better. In retrospect, that session helped me understand how to exploit LAGs really well, and that paid off a lot at higher stakes. It also helped my SLAG game a lot.
  3. The next big leap was realizing that there were three lines to exploit in poker — players who are too weak (fold too much), too passive (call to much), and too aggressive (bet/raise too much). Being able to exploit these tendencies is optimal. Being able to induce these tendencies is insanely profitable. The above is easy to say, but not always so easy to do.
  4. The last phase of my development was understanding “gears”. Changing gears is the ability to switch between being passive/aggressive and tight/loose depending on the context. Most people change gears predictably — for example, if they lose a big hand, they tighten up (or some players loosen up). I played my best when I was able to adjust to the texture of the game and play the way that my opponents least expected me to play and/or wanted me to play. It’s a lot of psychology, but when I mastered this, I felt like I owned the table. No one could read me, and I read them like an open book. This is the high that skilled poker players live for, imho.
  To close, I twice considered becoming a pro poker player. Once before UIGEA, and once after.
  Before UIGEA, I didn’t because I realized that I was only good for about 20 top notch hours per week, and I could play those hours after work. Furthermore, the tables were only juicy for maybe 30 non-consecutive hours, so I didn’t feel like i was missing much. I was also worried about the non-legalization of poker in the US, so I wanted to keep my day job.
  After UIGEA, I thought about moving to Thailand or Canada, but I (rightly) thought that games would get much worse without the US market. My earn would have been a solid $100-200k based on some of my former peers, but that’s not terribly exciting money for me. Anyone who can make $100k or more in online poker can make way more than that by being a programmer or by doing some sort of tech business (SaaS, e-commerce, consulting, etc.) or financier.
  Ok, that’s a wall of text. Feel free to ask follow ups.
  
  csa 6 years ago
  
  Aw fuck, how could I forget...
  I also played against Mike the Mouth (either party or FT). I think that this was when they limited the 5/10 and 10/20 games to two tables each.
  I played Mike in both plo8 and plo, and he was supposed to be a specialist. He was an absolute donator in the games I played in. He took really bad lines, and he was a net loser over a statistically insignificant number of hands. That said, if he was at the table, I wanted to play, and I wasn’t leaving until he got up. He was very exploitable.
  To be fair, I don’t know what his life situation was like at that time (it was up and down from what I heard). That said, I wanted him at my table 100% of the time.
  
  dbancajas 6 years ago
  
  what is post bilzerian time mean? I heard bilzerian's game was not gooda and rumor has it his using poker as a front of how he earned money??
asdfman123 6 years ago

Anywhere from 3 years ago to 5 years from now.

merlincorey 6 years ago

Pretty incredible that this has scaled down from 100 CPUs (and a couple terabytes of RAM) for their two player limit hold'em bot to just two CPUs for the no limit bot.

donk2019 6 years ago

Congrats Noam for the great breakthrough work!

I have a question about the conspiracy. For the 5 Human + 1 AI setting, since the human pros know which player is AI (read from your previous response), is it possible for human players to conspire to beat the AI? And in theory, for multi-player game, even the AI plays at the best strategy, is it still possible to be beat by conspiracy of other players?

Thanks.

asdfman123 6 years ago

So, is this the end of online poker?

Will it just become increasingly sophisticated bots playing each other online?

trishume 6 years ago

I'm really confused about why stock for the company that makes PokerStars hasn't moved at all today: https://www.google.com/search?tbm=fin&q=TSE:+TSGI#scso=_wqsn...
The fact that there's a published recipe for a superhuman bot that can be trained for $150 and run on any desktop computer sounds like an existential threat to their business.
The main mitigating factor I can think of is that you'd need to also adversarially train it so it isn't distinguishable from a skilled human. But that doesn't seem like it would be too difficult.
- auggierose 6 years ago
  
  Most people on sites like PokerStars are dumb money who just like to gamble. You are not going to win much money against the Pros anyway. If you create a bot like that yourself, then you are just one more of these Pros, basically. If you don't, then you use some published / commercial bot, and PokerStars will be able to detect it.
- asdfman123 6 years ago
  
  You know, now that we're talking about it I'm wondering if someone hasn't already come up with a better bot and has just been silently using it to win money online.
  I'm sure the sites have been crawling with bots as long as they've been around, some better than others. As long as it doesn't drive away too many customers I doubt the sites care. They still take a rake on bot games. However better AI could change that as the "dumb money" slowly dries up.
  
  bcassedy 6 years ago
  
  Dumb money has been drying up for years. There have been bots taking millions of dollars out of games for more than a decade. Even bots from 10 years ago were sophisticated enough to win money at mid-stakes poker (up to $2000 buy in 6max no limit games)
  
  dbancajas 6 years ago
  
  proof? I dont' believe this. 2K buy-in has a lot of regs that are pretty good overall in cash games. Plus Pokerstars/FT has a pretty good anti-bot policy. if you get caught bye bye to the $.
  
  bcassedy 6 years ago
  
  https://forumserver.twoplustwo.com/153/high-stakes-pl-omaha/...
  There are a bunch of such threads over the years where through statistical analysis, users have identified groups of dozens of bots.
  While years ago many of the pros could theoretically beat these bots, it may not have been by enough of a factor to overcome the rake. Of course if the bots are practicing any game selection they can take money out of the economy even if they can't beat pros.
  Anti-bot measures is an arms race and the sites aren't always ahead of the game.
  
  vannevar 6 years ago
  
  What would stop the sites themselves from operating those bots? They wouldn't even need AI, just deal favorable "cards" to their own player.
- 6nf 6 years ago
  
  There's already loads of bots online, this is just another incremental improvement in a very long line. This isn't some unexpected sudden death-knell for online poker.
vannevar 6 years ago

How do we know that online poker has ever been a fair game? Has anyone ever done a statistical study of verified real players to determine whether their collective historical winnings match what would be expected in a fair game? It seems like it would be much too easy for the operators to skim money in any one of a thousand ways. I've never understood the trust people place in online gambling in general.
- asdfman123 6 years ago
  
  What does trust have to do with it, though? When I played online poker for money, I was just content with the fact that I could win slightly more than I lost, and I was mainly only doing it for entertainment anyway. I mean, the whole game revolves around risk management. If you suddenly go against someone much better than you, you can always exit the game.
  Thinking about online poker again gives me ideas now that I actually know how to program. I actually thought up and wrote out a good way to subtly steal money from people, but I'm deleting it because I don't want someone else to do it. (And I wouldn't do it myself because I have ethics.)
- stri8ed 6 years ago
  
  In theory, you can implement a provably fair poker game on the blockchain.

solidasparagus 6 years ago

So Dota 2 doesn't count as a multiplayer game?

OpenAI Five beat the world champions in back-to-back games...

taejavu 6 years ago

Yes, Dota 2 is not a multiplayer poker game. I agree that the title is ambiguous, but it's not a stretch to imagine that "poker" is implied here.
- solidasparagus 6 years ago
  
  I don't think it's implied considering the articles compares the poker bot to go and chess bots (which are the non-multiplayer games the title is referring to).
filoleg 6 years ago

My guess is that by "multiplayer", they meant "free for all", as opposed to "N vs. N". In other words, multiple opposing factions.
noambrown 6 years ago

From an AI and game theory standpoint, there isn't much difference between two-team zero-sum and two-player zero-sum if the teammates are trained together. That said, the Dota 2 work is extremely impressive for a variety of other reasons.
- solidasparagus 6 years ago
  
  There is far more ambiguity when you are competing against five mostly-aligned strategies vs a single shared strategy.
elefanten 6 years ago

Agreed, the "first multiplayer" claim needs some walking back or caveats.
Cool achievement, but hollow marketing doesn't make it better.

r00fus 6 years ago

I was really hoping the article would go into more detail on how the AI engaged with the human players.

Was it online? the picture on the article seems to imply IRL.

If IRL, what inputs did it have, simply cards shown or could it read tells? Did those players know they were playing an AI?

noambrown 6 years ago

It was online. The players were playing from home on their own schedules. The bot did not look at any tells (timing tells or otherwise). The players knew they were playing a bot and knew which player the bot was.
cortesoft 6 years ago

Tells aren't really a thing for top level poker players.

grandtour001 6 years ago

Were the games played with real money? Nobody is going to take fake money games seriously.

slashcom 6 years ago

From the paper:
"$50,000 was divided among the human participants based on their performance to incentivize them to play their best. Each player was guaranteed a minimum of $0.40 per hand for participating, but this could increase to as much as $1.60 per hand based on performance."
So the humans weren't betting their own money, but they still made more money if they won.
zwendkos 6 years ago

This is the most important question.

rofo1 6 years ago

I'd love to see high-stakes heads-up bot vs Tom Dwan or Negreanu.

Maybe a bot technically qualifies as an opponent in durrr's challenge [0]? :)

How would bluffing influence the outcome? Both these players who are considered very strong, are known to play all kinds of hands.

[0] - https://en.wikipedia.org/wiki/Tom_Dwan#Full_Tilt_Poker_Durrr...

nishantvyas 6 years ago

I don't get this... Poker isn't pure mathematical... it has emotions involved (greed, fear, belief, reading others, manipulations (to fool the opponent)... and may be more... and all of these emotions arises differently for different people based on their time, place, their world view, their background and history...)

Are we now saying that a computer can do this all in simulation? if so, it's a great break through in human history.

throwamay1241 6 years ago

At the nosebleeds, poker hasn't been around those things in a long time.
Poker is about exploitative play against people who base their play off emotions, and unperfect game theory optimal against players who don't base their play off emotions. The more perfect the GTO play is, the higher the winrate against the latter group, but higher stakes games are built around one or more bad players - pros will literally stop playing as soon as the fish busts.

luckyalog 6 years ago

isnt it just possible that the bot got lucky. It plays good. Maybe really good but does it play as good as a pro??? Would it win 9 wp bracelets. Would It make it to day 3 of the world series of poker.

Chris Moneymaker got some damn good hands. Its part of the game. Its why this feat is unremarkable and why poker is a crap game for AI. The outcomes are very loose, especially when the reason these guys are pros is partially because of their ability to read.

You are taking away a tool that made their proker players great and then expect them to be a metric to test the AI. A better test would be to have pro players play a set of 1, 2, 4, 7 basic rule bots and the AI does the same. Then you compare differences in play. With enough data points you can compare situations that are similar but the AI did better or worse. This is a fair comparison of skill.

Also if there are professional players at a multiplayer game the AI is getting help from other players. Just like Civ V I get help from the AI attacking itself. Im sure this AI got help from the players attacking eachother (especially if they were doing so and making the pot bigger for the AI to grab up, think of a player reraising another player after the bot does a check all in).

awal2 6 years ago

Despite the luck/noise in Poker, there are reasonable measures of performance, and while I'm not an expert in this area, the bot seems to be doing very well (see paper for details). Poker is not a "crap game for AI" it's actually quite a good game. It's a very simple example of a game with a lot of randomness (a feature not a bug) and hidden information that still admits a wide variety of skill levels (expert play is much better than intermediate play is much better than novice play). This is a great accomplishment.
More links for reference: https://ai.facebook.com/blog/pluribus-first-ai-to-beat-pros-... https://science.sciencemag.org/content/early/2019/07/10/scie...
vecter 6 years ago

"In a 12-day session with more than 10,000 hands, it beat 15 top human players."
That's not luck. See also: https://news.ycombinator.com/item?id=20416099
Also, Chris Moneymaker is a good poker player. He's no Phil Ivey or Tom Dwan, but he's still very good and has had decent results after his WSOP win.

w_s_l 6 years ago

I would love to get a hands on the source. Hook it up to an API like https://pokr.live and then basically build a computer vision poker bot.

The trick is how to create natural mouse click movements or keyboard inputs. This is the part that I'm most shaky on but the pokr.live API works by sending screenshots which it will translate into player actions at the table

disclaimer: pokr.live API is a WIP

auggierose 6 years ago

You do that by letting a human play, informed by the bot.

DeathArrow 6 years ago

I was thinking a year ago about using Deep Reinforcement Learning in a poker bot what stopped me was the impossible amount of data and computation due to imperfect information nature of poker games. If I'll have the time I'll try to implement thing akin to the search technique described in the paper.

It might pay better than a full time job.

axilmar 6 years ago

"At each decision point, it compares the state of the game with its blueprint and searches a few moves ahead to see how the action played out. It then decides whether it can improve on it."

That's exactly how the brain operates.

gringoDan 6 years ago

Curious if we'll see human poker pros get much better in the coming years as they incorporate training regimens that involve bots (analogous to chess today vs. 50 years ago). Seems like this will be the trend in almost every game.

TylerE 6 years ago

As someone who plays both games....I doubt it.
Poker is an incomplete information game with crushingly high variance. The bots strategy is likely not quantifiable.
- gringoDan 6 years ago
  
  Can you expand on this? I'm a novice at both games, but the Facebook blog post mentioned that the bot exhibited some unconventional strategies:
  > Pluribus disagrees with the folk wisdom that donk betting (starting a round by betting when one ended the previous betting round with a call) is a mistake; Pluribus does this far more often than professional humans do.
  Is it overly simplistic to think that humans could improve their game by incorporating some strategies like this more/less often than they were previously?
- TheChosenZygote 6 years ago
  
  Bots have already influenced the poker meta. Libratus showed us how it was optimal to sometimes overbet bluff when you have nut blocker(s). I'm sure when these poker pros do hand reviews, they're not looking at how they can exploit Pluribus, but moreso how they can incorporate some of the lines/strategies it used to beat everyone else.

DeathArrow 6 years ago

I wonder what would be the impact of using Counterfactual Regret Minimization instead of training a neural network based on hands played by real players?

Whys is using CFR better than training based on real data?

Tenoke 6 years ago

It's not necesserily better but with CFR you can learn beyond what humans have learned, but on the other hand you dont learn their usual mistakes to more easily exploit them. Also in this approach you need CRM since at every point you are checking what would've happened if you picked something else, which is just impossible with a fixed dataset.

auggierose 6 years ago

Would you say it would be hard to expand this to tables with 9 players?

noambrown 6 years ago

No, it wouldn't be hard. We chose six players because it's the most common/popular form of poker.
Also, as you add more players it becomes harder and harder to evaluate because the bot's involved in fewer hands, we need to have more pros at the table, and we need to coordinate more schedules. Six was logistically pretty tough already.

User23 6 years ago

The most interesting thing about this to me is the lesson it teaches human players about bluffing.

indigodaddy 6 years ago

Was this cash or tourney format? How many blinds deep was the bot and the rest of the players at the start?

GCA10 6 years ago

From the sample hands, it looks as if it's a cash game with stacks equal to 200BB. Plenty of room to play real poker.

ayemeng 6 years ago

Curious, why was 100BB used for six max? If I recall right, the head ups experiment was 200BB?

noambrown 6 years ago

We considered both options but decided to go with 100BB because that is the standard in the poker world. It doesn't make a big difference for these techniques though.
- srkigo 6 years ago
  
  Could you try to run a training with ante included in the pot? I wonder if open-limping would be a viable strategy with some hands. No one knows that and it would be really interesting to find out. Ante should be equal to BB, like it was in WSOP Main Event.
- ayemeng 6 years ago
  
  Does Pluribus care about opponent's stack depth? From the examples, it appears stacks were reset after each hand.
bokonon12 6 years ago

Guessing it's because it's most similar to a regular 6max game. Also it should limit lower the number of possible ways to play a hand, less chips means the correct choices are easier, so maybe it's computationally easier
- ayemeng 6 years ago
  
  Yeah, my question for the author is if stack depth was relevant for this experiment. In headups, they exhausted the entire tree, in six max, they went to a fixed depth.

cklaus 6 years ago

Is the source code and data available for allowing others to play against this not?

anbop 6 years ago

Would love to wire this into some kind of device I could play with at a casino.

zzo38computer 6 years ago

I have seen fixed limit AI, and here is now no limit AI. Is there a pot limit AI?

IloveHN84 6 years ago

Basically, all the online Poker rooms are now rigged and leading to frauds

donk2019 6 years ago

Congrats Noam for the great breakthrough work!

david-gpu 6 years ago

Time to cross poker off the list?

[0] https://xkcd.com/1002/

Anecdotal 6 years ago

Poker is a game about adapting, therefore a poker pro could study the bot's gameplay and adapt enough in order to beat it, then in turn the bot could do the same thing.

alexashka 6 years ago

The title is misleading - bots have been beating no limit pros in 1v1 matches for quite some time.

This is for 6-man games. The article mentions 10,000 hands - this is a very small sample size to draw any real conclusions, as anyone who has dabbled in online poker for more than a few thousand dollars can attest to. Regardless - it's trivial to write a bot that'll beat 90% of the players, as site runners can all attest to (bots are a serious problem that is not new). What does it matter that a bot can beat 'the best' or 'professionals'? It's enough that it can do better than the vast majority, outside of dystopian woes about robots taking over or being 'superior' to human beings.

Glossing over all that - I am curious if this can be used for something other than ruining online poker, which has largely already been ruined by allowing multi-tabling professionals with custom software that gathers statistics on players (data mining), existing bots, US government and irresponsible (criminal) site runners (looking at you ultimate bet)