(This is an ongoing series. I’m referring to things I did in the previous installment. If you’re new to this, I recommend starting at the first episode.)
First part of this series | Previous | Next |Index
I ended the last episode on a cliffhanger: Will a very smart array-based approach for processed tiles work faster? It turns out that yes, but also no. At first glance, my arbitrary benchmark (10 runs per frame at 50 FPS) utterly failed, but then I realized that it is consistent. As in, seemingly no matter what, it runs at a set speed. Consistent but slow methods are actually better for me, because I’d rather not have any too long frames at all, but know what I can get away with.
The culprit was spotted immediately. 8 FPS, baby!
processedMap = new int[Main.bw.getBWMap().mapWidth()*4][Main.bw.getBWMap().mapHeight()*4];
That’s a pretty big array, and re-initializing it every time – turns out – is very costly. Before you scream “just fill it with zeroes, don’t reinitialize”, then first of all, congratulations, that is a very hard thing to scream, second, it might be not the fastest method here – with the scouting map, it was faster to re-initialize. (That reminds me, that needs to be converted to an array at some point. Meh).
Second approach: (Initialized once in onStart())
for (int x=0; x<processedMap.length;x++) {
for (int y=0; y<processedMap[x].length;y++) {
processedMap[x][y] = 0;
}
}
And it’s almots 50% faster! That is, 11 FPS, which is still abysmal. Can we go derper deeper? Arrays.fill() does the exact same thing as the code above. Staring at this piece of code, I just realized something. I have 8 directions. A byte has 8 bits. From Java 7, you can actually use binary literals. I definitely don’t need the int datataype’s range here. This is getting insane, but I can’t just not try this out.
Just changing the processedMap’s type to byte works – I need to do some casting, but that’s all. It is also just as slow as before – 11 FPS, steadily. Looking at VisualVM, it’s still the same problem – filling a huge amount of values just takes too long.
Since I’m working with bytes now, and bytes can represent a character… what if I just store the original (full zero) array value in a String, and use that to re-load the values? Stupid idea is stupid, but I had to try it.
for (byte[] chonk : processedMap) {
chonk = processedMapString.getBytes();
}
As you can see, I take care to use Clean Code principles when naming my variables. Surprisingly, this works, but is actually slower. Okay, back to the drawing board. I can’t just clone another array, because that makes a shallow copy. I did try the SerializationUtils.clone() method mentioned in the article, but it was slow too.
I tried the ObjectInput/Output stream method listed here too. Same outcome: Works, but slower. I found something called FastByteArrayOutputStream, which might be just the tool that I need, but I decided not to go that route. Maybe I can avoid filling this array somehow.
I called for help on the SSCAIT Discord, which is an awesome place if you like to shitpost all day talk about SC AI development.
Our Lord and Savior PurpleWaveJadien had a great suggestion: Since at every run of pathfinding I inspect a relatively small subset of WalkPositions, I’m only curious about those. If I have a global ID that is updated every time I run the pathfinding, I can just compare that to a value stored – which is the last update value of the given WalkPosition. If it equals the global ID, then it has been “reset” during this run.
Upon reading this, I cheered that senpai noticed me concluded that this is something that makes a lot of sense, and it’s not even that hard to implement. I need an array with a similar size for this. The global ID is incremented on every pathfind call – the exact value is not important, it just acts as an unique ID for the pathfinding in question. The modifications:
public static byte[][] processedMapUpdated;
public static int pathFindNumber;
public static void addToProcessedMap(JPSInfo jpsInfo) {
int x = jpsInfo.getWalkPosition().getX();
int y = jpsInfo.getWalkPosition().getY();
byte jVal = (byte) (1 << jpsInfo.getDirection().ordinal());
processedMap[x][y] = (byte) (processedMap[x][y] + jVal);
processedMapUpdated[x][y] = pathFindNumber;
}
public static boolean processedMapContains(JPSInfo jpsInfo) {
int x = jpsInfo.getWalkPosition().getX();
int y = jpsInfo.getWalkPosition().getY();
if (processedMapUpdated[x][y] == pathFindNumber) {
if (processedMap[jpsInfo.getWalkPosition().getX()][jpsInfo.getWalkPosition().getY()] != 0) {
int jVal = processedMap[jpsInfo.getWalkPosition().getX()][jpsInfo.getWalkPosition().getY()];
if ((jVal >> (jpsInfo.getDirection().ordinal()) & 1) == 1) {
return true;
}
}
} else {
processedMap[x][y] = 0;
}
return false;
}
Relatively simple – and after trying it out, I got 18-20 FPS, which is more than an 50% increase! Now more tests are needed, and I suspect that from now on, I’ll only get diminishing returns. I doubt the 50 FPS mark is achievable – but it’s ok, I value consistency more than that. Let’s get the Starcraft AI Measuring Stick, and try to identify pain points.
I consider the results a mixed bag. The fact that our multi-area pathfinding is quicker than getting the threat border in just one area is great, but on the other hand, the getThreatBorder() is horribly slow in this context. I thought it would get called relatively rarely, so I didn’t over-optimize it. Time to do that, I guess.
The performance problem has multiple root causes.
- I scan the whole map, and then compare the areaIds. I wonder if I’d store the tiles before by areaIds, would it be quicker? A HashMap with the areaIds as keys is an obvious choice here, but I need some kind of object to store that, probably. (I need to store x and y coordinates)
- I scan 4 neighbors of every tile. This obviously leads to massive redundancy. Come think of it, the outer loop is by x, and the inner is by y coordinates. This is a top to bottom, and then left to right processing of the tiles. I wonder if there is a more efficient way to do this.
Let’s take a look at the first point. I have a set width and height for the map, and I have X*Y tiles, and this depends on the map itself, I can’t know that in advance. I really would like to avoid storing that amount of WalkPosition objects. How can I represent these tiles with just one primitive value? Well, it’s actually pretty simple: Just store it as X+Y*height (A left-to right numeration of tiles). I can store these in some collection. Of course, an array would be best. Keep in mind, that the bw.mapWidth() is in tiles, not walktiles, so I have to multiply it by 4.
I already loop through the areaDataArray, so that’s a good place to do this grouping.
public static HashMap<Integer, HashSet<Integer>> walkTileNumbersByAreaIds= new HashMap<>();
//
for (int x=0; x<areaDataArray.length; x++ ) {
for (int y = 0; y < areaDataArray[x].length; y++) {
if (areaDataArray[x][y] == -1) {
Cartography.mostCommonNeighbor(x,y);
}
HashSet<Integer> t = walkTileNumbersByAreaIds.getOrDefault(areaDataArray[x][y], new HashSet<>());
t.add(x + (y * mapWidthInWalkTiles));
walkTileNumbersByAreaIds.putIfAbsent(areaDataArray[x][y], t);
}
}
This is the first draft. I wonder how can I change the HashSet here – that should be an array, but I don’t actually know the size before. In the Cartography class, I added a new method for dealing with areaIds, because I need to do a lot of stuff now.
//All the bookkeeping related to walktile Id management
public static void changeWalkTileAreaId(int x, int y, int oldAreaId, int newAreaId) {
Main.areaDataArray[x][y] = newAreaId;
HashSet<Integer> tileNumbers = Main.walkTileNumbersByAreaIds.get(oldAreaId);
if (tileNumbers != null && !tileNumbers.isEmpty()) {
int dnum = x+y*Main.mapWidthInWalkTiles;
tileNumbers.remove(x+y*Main.mapWidthInWalkTiles);
}
tileNumbers = Main.walkTileNumbersByAreaIds.getOrDefault(newAreaId, new HashSet<>());
tileNumbers.add(x+y*Main.mapWidthInWalkTiles);
Main.walkTileNumbersByAreaIds.put(newAreaId, tileNumbers);
}
I’m trying to keep consistent naming policies, because more than once, I was confused about the TilePosition/WalkPosition resolution, and forgot to multiply or divide by the ratios. Oh, well. The new getThreatBorder() looks like this:
public static Set<WalkPosition> getThreatBorder(boolean useActiveThreatMap, boolean useThreatMemory, int areaId) {
HashSet<WalkPosition> border = new HashSet<>();
HashSet<Integer> tileNumbers = Main.walkTileNumbersByAreaIds.get(areaId);
for (Integer tn : tileNumbers) {
int x = tn % Main.mapWidthInWalkTiles;
int y = tn / Main.mapWidthInWalkTiles;
if (Main.areaDataArray[x][y] == areaId) {
if (useActiveThreatMap && Main.activeThreatMapArray[x][y] == null) {
if ((isWalkPositionOnTheMap(x + 1, y) && Main.activeThreatMapArray[x + 1][y] != null)
|| (isWalkPositionOnTheMap(x - 1, y) && Main.activeThreatMapArray[x - 1][y] != null)
|| (isWalkPositionOnTheMap(x, y + 1) && Main.activeThreatMapArray[x][y + 1] != null)
|| (isWalkPositionOnTheMap(x, y - 1) && Main.activeThreatMapArray[x][y - 1] != null)) {
border.add(new WalkPosition(x, y));
}
}
if (useThreatMemory && Main.threatMemoryMapArray[x][y] == null) {
if ((isWalkPositionOnTheMap(x + 1, y) && Main.threatMemoryMapArray[x + 1][y] != null)
|| (isWalkPositionOnTheMap(x - 1, y) && Main.threatMemoryMapArray[x - 1][y] != null)
|| (isWalkPositionOnTheMap(x, y + 1) && Main.threatMemoryMapArray[x][y + 1] != null)
|| (isWalkPositionOnTheMap(x, y - 1) && Main.threatMemoryMapArray[x][y - 1] != null)) {
border.add(new WalkPosition(x, y));
}
}
}
}
return border;
}
I look like this:
Anyway, let’s see if this causes any performance gains. Previously, it was around 18-20. Should be able to add at least a few more, right?
This is – sadly – just the best case scenario. It was around 28-36 for the most part, but I managed to induce great lagspikes, if I blocked the chokepoints fully. It is a good improvement, and now my number-one resource hog is the JPS algorithm itself.
So, maybe the diminishing return thing is just starting. Switching to arrays instead of collections whenever possible seems like a good direction to take. I’ve been at this algorithm for a lot of time now, and you would think I’m getting bored with it, but it’s quite the opposite – it works. Lately, I haven’t seen any bugs, and I just need to perfect it – but it is usable at it’s current state(Basically I can stop whenever I want to, I just don’t want to).
Thanks for reading! If you liked this article, consider subscribing to the mailing list in the sidebar for updates, or to the RSS feed! Also, if you’d like to support me, my blog, or my podcast,
check out my shop here! – I decided to give you free shipping on T-shirts and mugs, just use the code STARCRAFT!