Overview
King, the creator of the Candy Crush franchise, is one of mobile gaming's most successful publishers, with its titles generating over $20 billion in lifetime revenue and hundreds of millions of downloads worldwide.
They recruited me to support their test automation team of one of their flagship title that is more than 10
years old.
I revamped the notification system to alert only the relevant engineers of test failures, and made the tests
faster and less flaky.
The Problem
There are not many tools to choose from when building test automation for video games. So everything needs
to be
built from scratch.
King is no exception and they built their own proprietary solution following a client/server architecture.
Wrong feedback from notifications
The first challenge was the amount of noise the engineering team was presented with. The pipeline was running multiple categories of tests for each platform, turning every change into dozens of slack notifications without a clear owner.
Flaky test running for too long
The second challenge was to optimize the tests themselves, both in terms of flakiness and speed.
The test suite was regularly raising obscure errors without clear causes after long execution times.
My approach
Tackling the noise generated by notifications
To improve the notifications, I made sure to bundle them into threads. All notifications relevant to a build could be found in a single thread. It was particularly useful to track the short term history of failing tests.
Not all tests were relevant to the same engineering team. After bundling the notifications into threads, I split the different categories for relevant channels. This way only the engineers interested in these particular tests and their failures were seeing these results.
This approach required heavy modifications of the CI pipeline (Jenkins). I had to rewrite a lot of the logic triggering the tests, and keeping track of the different channels for notifications to be sent to the right slack channel. These are skills that are often outside of what is expected of test automation.
Optimizing the tests
Video games are unusual apps, especially when it comes to testing them automatically. The UI is always
different
and there are no underlying standard allowing the tester to write robust test frameworks easily.
This result in a lot of flakiness.
To tackle this problem I came up with multiple patterns that are applicable for any application, not just video games. I introduced the following best practices to the test automation team:
- Screen/Page objects: while there are no pages in video games, you can generalize the pattern at the screen or subscreen level to reuse code efficiently.
- Explicit waits: a lot of problems when testing games come from synchronizing the application under test with the tests themselves. Writing custom explicit waits helps a lot with that. It clarifies what is waited for and helps failing early.
- Checking the effect of interactions: Very often test automation engineers do not check
the
effect of the interactions they trigger. How do you know clicking on a button achieved what you
expected?
Unless you write the proper code for that, you might desynchronize your testing code with the
application
under test.
This is especially a problem in video games where input is often custom. But this is also applicable to other apps where custom widgets can have this problem.
Beyond flakiness one of my mission was to improve the tests speed. After validating some hypothesis, we decided to slowly move away from end-to-end tests running on real devices, and instead running them in VM where they can run faster with more stability.
Results & Outcomes
An improvement for engineers
Revamping notifications was a huge improvement for engineers. The noise turned into proper feedback they
could
act on. When a build failed, not only engineers were properly aware of the failure, but only the relevant
teams
got notified.
It made the test results clearer, and reduced the time needed to identify bugs.
Since changing notifications required a lot of work inside the Jenkins infrastructure, I became the 3rd top committer overall on the CI/CD repository.
Reduction of test execution time
As for the introduction of best practices and moving away from real devices, it reduced the test execution
time
under and hour for the entire test suite.
Not only the tests took less time to execute, when they failed, they failed early and with proper error
message,
improving failure investigation.