R.I.P. Device Farm

I’ve been an Android developer at three different companies in my career. At each company I tried to set up a device farm and run instrumentation tests on that farm. After three attempts, I’m finally giving up.

At Big Nerd Ranch, we connected a spare laptop to a USB hub to a shelf of spare Android phones. At the time (2013) there was not any great software to coordinate with multiple devices, run tests on each, and collate test results. Some of the other developers built an in house tool. Eventually I learned about Spoon and started to run tests through it.

However there was no good way to hook our Spoon server up to our more-general C.I. server (Jenkins at the time), so all testing had to be triggered manually. It wasn’t a great setup, mostly for structural reasons. The device wall was prone to being pilfered if a developer needed to run tests on a specific device. There isn’t a whole lot of value to running instrumentation tests on a device farm if half the devices were missing at any given moment.

When I moved to Stable Kernel I tried again to setup a device farm. This time I only used a few phones at my desk, which no-one else had access to. I ran the Spoon tests from my personal laptop. I was now using BuddyBuild C.I. server and again I had no way to tie the device farm into the C.I.server.

At Orion I tried once more. This time I built my own wall. Orion has been great about supplying me personally with many phones. So I was able to run tests on lots of dedicated phones. I even built a custom board to hold all the phones and easily run USB cables to them. The wall itself is just peg board with 1/8 in. bungee cord woven through it. This way I can attach any phone of any size by slipping two opposite corners under a few loops of bungee. It was very satisfying to see the whole test suite running on fifteen different devices.

I once again started off using Spoon. Sometime in 2017, the Android tools team released Android Test Orchestrator, which solved some major testing problems. Specifically it allowed each test to be run in a fresh instance of an application without reusing local storage, database, or permission settings from previous tests. Support for Android Test Orchestrator never made it to spoon and I jumped ship some time there after.

I found Composer which seemed very promising. It was under active development by some very well respected Android open source contributors. Most importantly, Composer had support for Android Test Orchestrator.

Being a remote employee, my personal device farm was not accessible to other members of the team. So I was eager to hook it up to the C.I. pipeline. However Composer suffered from the same problems as Spoon: it was not easy to hook it up to an existing C.I. server. Like Spoon, Composer generated a rather nice looking HTML report. So I spent some time hacking together a script that uploaded Composer results to an AWS instance. However I couldn’t find a good way to actually host that HTML in a secure way for all of my teammates to view. Instead they would have to download a zip from AWS and manually view the HTML locally on their machine.

One thing that made both Spoon and Composer great was that they enabled screenshot capture throughout the test and then displayed them in the resulting report. Eventually though, Composer fell into disrepair and this screenshot feature eventually broke. It was never fixed before the authors finally gave up on the project, archiving it entirely.

By this time I had been forced to move on from BuddyBuild, after it was purchased by Apple. I switched to Bitrise which had existing support for running instrumentation tests on Firebase Test Lab (FTL). I had seen a demo of Firebase’s device server farms at Google I/O 2018 but had not yet invested time to learn more about it.

Luckily Bitrise’s integration handles all of the FTL configuration out of the box. At the time, Bitrise only supported the emulators available through FTL, not any of its physical devices. This was a great start. Running tests on emulators on previous C.I. tools (Jenkins, Circle, Travis) is a massive pain that deserves a separate rant. So I was grateful that Bitrise supported any such instrumentation testing, but I still didn’t have the physical device farm I was hoping for.

Eventually I learned about Flank and it’s Gradle plugin Fladle. At first, I was a bit reluctant. Flank was the side project of a few engineers at Walmart labs, in the same way that Spoon and Composer had been side projects. I did not want to invest in a new test aggregator only to have it fall into disrepair. After some investigation and experimenting I realized how easily Flank (via Fladle) integrates with my existing C.I. process. I also realized that it is designed to integrate with Firebase Test Lab’s device farm, not my in-house one. Lastly I learned that Flank was being spooled off to it’s own GitHub “organization” and that they had a working relationship with engineers at Firebase. I was instantly hooked: Google manages the physical devices, billing, and report generation. Flank integrates with FTL and handles APK uploading and result monitoring. Fladle reduces the whole thing to a single Gradle task, and Bitrise easily runs any arbitrary Gradle task. With Bitrise’s additional notification integrations, I can easily share link to results to a Slack channel.

I now had just about everything I needed: C.I. integration, good report creation, reliability, and confidence that these tools would see regular improvement. I’ve been using this tool for over a year and I am really happy with the results. I use Bitrise’s Virtual Device Testing task to run unit and integration instrumentation tests, since I’m fine if these only run on a few common emulators. I then use Bitrise’s generic Gradle Runner task to run a Fladle command which manages Flank settings, which in turn uploads APKs to FTL.

Regarding cost, I get indirect access to FTL through Bitrise’s Virtual Device Test step, which is free as part of the Bitrise bill. I did have to set up separate billing directly with Firebase to run tests through Flank. I share my API key with Flank/Fladle and get billed directly by Firebase. The emulators are really inexpensive (which is probably why Bitrise “gives” them away through it’s task). The physical devices are understandably more expensive. One great thing about Flank is that it does “smart sharding” which tries to continually update which tests get run on which shard, to minimize both run time and overall cost. My setup only triggers instrumentation tests when a PR is merged (not every commit push to a PR). So total costs are approximately $2 per PR merge.

In the end, I don’t think I will ever build another local device farm. The upkeep time was not worth it. Eventually most of my older phones experienced battery swelling from being permanently connected to a power supply. I think most newer high-end phones have circuitry to prevent battery swelling. But those older phones are exactly the type of phone I want to test against: phones that are not available on Firebase Test Lab. At Orion, we explicitly support some more rare phones and it was helpful to test on these phones. However there are many companies that will manage a device farm for you. You simply supply them with phones (or just model numbers) and they provide you a server URL to upload test APKs to. Their service takes care of the rest. I don’t know how easy it is to integrate an existing C.I. server with these tools but it should be possible.