Revamping the test automation of a legacy mobile app

Overview

King, the creator of the Candy Crush franchise, is one of mobile gaming's most successful publishers, with its titles generating over $20 billion in lifetime revenue and hundreds of millions of downloads worldwide.

They recruited me to support their test automation team of one of their flagship title that is more than 10 years old.
I revamped the notification system to alert only the relevant engineers of test failures, and made the tests faster and less flaky.

Framework: Proprietary
Language: Java
CI/CD: Jenkins
Type: End-to-end tests

The Problem

There are not many tools to choose from when building test automation for video games. So everything needs to be built from scratch.
King is no exception and they built their own proprietary solution following a client/server architecture.

Wrong feedback from notifications

The first challenge was the amount of noise the engineering team was presented with. The pipeline was running multiple categories of tests for each platform, turning every change into dozens of slack notifications without a clear owner.

Flaky test running for too long

The second challenge was to optimize the tests themselves, both in terms of flakiness and speed.

The test suite was regularly raising obscure errors without clear causes after long execution times.

My approach

Tackling the noise generated by notifications

To improve the notifications, I made sure to bundle them into threads. All notifications relevant to a build could be found in a single thread. It was particularly useful to track the short term history of failing tests.

Not all tests were relevant to the same engineering team. After bundling the notifications into threads, I split the different categories for relevant channels. This way only the engineers interested in these particular tests and their failures were seeing these results.

This approach required heavy modifications of the CI pipeline (Jenkins). I had to rewrite a lot of the logic triggering the tests, and keeping track of the different channels for notifications to be sent to the right slack channel. These are skills that are often outside of what is expected of test automation.

Optimizing the tests

Video games are unusual apps, especially when it comes to testing them automatically. The UI is always different and there are no underlying standard allowing the tester to write robust test frameworks easily.
This result in a lot of flakiness.

To tackle this problem I came up with multiple patterns that are applicable for any application, not just video games. I introduced the following best practices to the test automation team:

Screen/Page objects: while there are no pages in video games, you can generalize the pattern at the screen or subscreen level to reuse code efficiently.
Explicit waits: a lot of problems when testing games come from synchronizing the application under test with the tests themselves. Writing custom explicit waits helps a lot with that. It clarifies what is waited for and helps failing early.
Checking the effect of interactions: Very often test automation engineers do not check the effect of the interactions they trigger. How do you know clicking on a button achieved what you expected? Unless you write the proper code for that, you might desynchronize your testing code with the application under test.
This is especially a problem in video games where input is often custom. But this is also applicable to other apps where custom widgets can have this problem.

Beyond flakiness one of my mission was to improve the tests speed. After validating some hypothesis, we decided to slowly move away from end-to-end tests running on real devices, and instead running them in VM where they can run faster with more stability.

Results & Outcomes

An improvement for engineers

Revamping notifications was a huge improvement for engineers. The noise turned into proper feedback they could act on. When a build failed, not only engineers were properly aware of the failure, but only the relevant teams got notified.
It made the test results clearer, and reduced the time needed to identify bugs.

Since changing notifications required a lot of work inside the Jenkins infrastructure, I became the 3rd top committer overall on the CI/CD repository.

Reduction of test execution time

As for the introduction of best practices and moving away from real devices, it reduced the test execution time under and hour for the entire test suite.
Not only the tests took less time to execute, when they failed, they failed early and with proper error message, improving failure investigation.

Contact me

Back to homepage