SCHNAIL dev diary: Moving the AIs to the cloud

Introduction

The SCHNAIL client was running locally so far – when you launch the client, it always downloads the latest versions of the AIs you can play against. This doesn’t take long, but poses multiple problems. During the beta, multiple issues have cropped up again again.

  • “It works on my machine”: Some bots work on certain machines, and some don’t. This is the most annoying, and hard problem to solve. There is no guarantee which OS/dependencies are present. This can be mitigated to a degree, but will be always a problem – launching bots with bwheadless.exe has been the cause of so much headache so far. It is sometimes impossible to debug when all the feedback “it not work”.
  • Handling of learning files: Many AIs use learning and opponent modeling. Currently these files are only stored locally, so if someone downloads the client in a different machine, the experience will be different. Originally there was a plan to just upload the learning files, but that takes agency away from authors – we can’t know how they want to handle versioning/conflicts on the SCHNAIL servers. Sure, we could configure preset strategies, but there will be always personal preferences here. Also, this can be just blocked from user side – not sure why would you do that, but the ability is there.
  • Cheating: The bot files are on the user’s computer, that messing around with them can affect the score of some matches – of course there are anti-cheating measures in effect, but it is a risk nonetheless. The less stuff we have on the user’s machines, the better is the chance of preventing malicious actions.
  • Performance: Different machines have different resources available. For example, having a GPU could allow developers to lean more into machine learning. Lower resources might account for some weaker play – not really a problem in the present. Also, a standardized amount of resources for every match would be easier to plan for, and optimize as a bot author – as to exactly what resources to give, that is a matter of debate, and cost/benefit.

All of these are solvable with the current infrastructure, but usually it is just treating the symptoms. The original proposal for the project was running the bots remotely anyway, and I decided it is time to return to that version.

Proposal

The first idea was to run the games in the cloud somehow – just on the remote server. But that quickly becomes a resource problem, and scaling a VPS is just not that great. Also, this adds unnecessary coupling – the server that makes and handles the games shouldn’t fight for resources used by the actual game.

Spawning a virtual machine is a better approach, but not even that is needed – just a container image, basically a dockerized environment for a bot to run in. This article is more of a brainstorm, and vague specification than implementation details – those will surely come too.

Considerations for the container image

Handling of persistence: There are many things that we want to store after a game.

  • Replays: Maybe the easiest, and handled pretty ok currently. Still, there are cases when the replay is not uploaded for one reason or other, and handling this from the AI’s side would eliminate one liability. It would make it a single point of failure, but it already is on the client side, so this would be an improvement. I think uploading from the player and the AI side both would be overkill here. I might still do it, as it could have other benefits.
  • Tournament manager data: Also handled pretty ok at the moment, but more control over it would be tremendously helpful. Also the potential cheater not altering those files is a nice bonus.
  • Learning data: The big one. I imagine each bot having a network folder of its own with learning files. These would be mounted before each match, so the AI can write and read from there. Network latency might be an issue here, as the learning network folder and the AI image are definitely not on the same server. What I’m hoping is that cloud providers are… kinda good at this. There is also the issue of concurrency, a bot can play multiple games at the same time, so this needs to be handled by the authors. Also, the network folder where the learning files are kept is accessible for the authors, so they can be modified manually if desired.
  • (Crash) logs: Writing a bot is a complicated and error-prone process, so crashes are inevitable. Having more detailed info about them would certainly help making them better overall. Also, StarCraft itself can crash too. This is one of the things I really want to narrow down before moving on to the next ranked season.

Bot versioning: The plan is that everytime a bot is uploaded/updated, an image of that bot with an incremented version is created. This does pose the problem of storing a lot of versions, but that’s just a matter of storage, which is not our bottleneck currently – also, you can just revert your bot to an earlier, (better) working one. Creating an image could be an end of some bot testing/CD pipeline.

Deployment, and latency: With running the bots remotely, we introduce network latency into the equation, and like anything to do with networking ever, it’s not an easy problem to solve/manage, but yet again, it has been done before. I plan to use some cloud service to deploy containers, such as Amazon EC – these already have some processes to choose a datacenter near you, but it is something we need to keep an eye out for.

Version of StarCraft: So far we’ve been using 1.16 – but here on the server side, OpenBW is an option. Eventually, SC:R support is the end goal.

Cost: Yes, it will cost money. I’ll cross that bridge when I come to it.

Thanks for reading! SCHNAIL is a lot of work, and most of it is boring background stuff that no one will see. I’m eternally grateful for my Patreon supporters, who helped tremendously in keeping it running. As for my other channels, feel free to follow me on Facebook, Twitter, or Twitch!

Leave a Reply