Engineering at MindLink

Improving our CI tests

June 28, 2018

This is the story of how we:

Enabled parallel NUnit test execution
Migrated our CI test stack to a reproduceable and self-contained configuration
Reduced our CI test run time by 65%

Background

The MindLink .NET CI/test stack looks like:

TFS 2018
NUnit
Moq

For our main product repository, we have a CI build that runs on every server-side change. This builds our .NET components and then runs our tests. The tests are split between what we call “Unit Tests” and “Behaviour Tests”. The “Behaviour Tests” are still isolated in-memory tests, but run over chunky bits of the stack across component-level test seams. I’ll argue with you about what these should be called another time.

Our timings looked like this:

Unit Tests - 18982 tests - 7:56 minutes
Behaviour Tests - 1769 tests - 3:34 minutes

Whilst I’m sure there are definitely worse build times out there, a total time of 16 minutes definitely has room for improvement.

And yes, we can no doubt improve the 4:30 time for the core build, but that’s a different story.

Parallelising Take 1

Time to look at our test run configuration.

I was aware that the TFS Test runner will run tests in parallel - at the granularity of each test DLL. Our build servers have 2 processor cores. My presumption was that we could just enable parallelisation on our test task config, the test runner would farm off our 8 or so test DLLs to run in parallel (2 at a time), and we’d get a 2x speed up.

To my surprise/dismay - the “Run tests in parallel” box was already checked. So why are the tests still so slow? Is the parallelism not actually working?

I unchecked the box and queued another run. Same test timings. So what’s going on?

This Blog helped me on my way (sort of). The parallel test run behaviour is a convoluted function of both the TFS GUI settings and the test .runsettings file.

Our .runsettings looked like

<MaxCpuCount>0</MaxCpuCount>

which apparently enables parallelisation by default.

Removing the .runsettings file from the test config altogether gave me what I was expecting - The unit tests alone now took 13 minutes (just under twice as long as before).

In hindsight, I could observe the tests running in parallel as the test runner launched. This is the log with parallelisation enabled:

And this is the log with parallelisation disabled:

You can see it processing two DLL test containers at a time on test launch, and then subsequent test result logs overlap.

So - good news is: I’ve got to the bottom of what’s making the tests run in parallel. Bad news is: our tests runs are taking a long time even though they’re already running in parallel.

Time for some housekeeping

All this talk of test configuration got me thinking about our build dependencies.

We’re on the same path as a lot of teams with this one: we have several build servers that each have some global build dependencies installed. This means:

When we make changes to the build, we need to make the same changes across all build servers
If we upgrade a build dependency for one branch, typically all the other branches start failing to build

Our end goal however is:

Remove as many global build agent dependencies as possible
Define all build dependencies in source code as versioned package dependencies
Dynamically install and configure the build environment as part of the build itself

So couple of things to fix with our test configurations:

.runSettings file - This guy was defined at a well known path on the build server, and had to be kept in sync between all servers. He seems to have been superceded by the TFS test task configuration, and he wasn’t doing much anyway (other than making things more confusing). So time for him to go. Parallel execution of tests is enabled by the TFS task config.

Visual Studio Test Platform - This was installed as part of Visual Studio on the build server. Ultimately we want to reduce our dependency on needing Visual Studio on the build server, and let the agents upgrade themselves as required. We switched this to be installed by the TFS build agent.

This means adding another bootstrap step in the build tasks flow. In practice this task caches the installation between runs, so little extra overhead is added to the build process.

NUnit Test Adaptors - We had these installed globally on each build server using the Visual Studio extension installer. I think this was the right decision when we intially set up the build servers, but the installing-as-nuget-dependency seems like the right way for us to go now. This will give us one less global dependency.

To do this, you just need to add the adaptor package to your test project’s NuGet package dependencies. Something does feel a little odd about adding another dependency just because the build server needs it, but that’s ultimately the shift in thinking towards self-describing builds.

I’m also not in love with the fact that you technically only need to do this for one project in the solution (I assume just so that when NuGet solution-restores everything it’s present in the packages folder), but I’ll let that one go for simplicity.

Upgrading NUnit

At this point I also decided to upgrade our test dependencies. We were using NUnit 3.6.0 at this stage. I updated our NuGet package config to bump this to the latest 3.10.0 libraries.

This did cause a couple of compilation errors, but nothing that wasn’t easily fixed with equivalent NUnit API constructs. I pushed and waited (16 minutes) for the build to finish.

But the build only took 9 minutes

That’s right. The Unit tests now ran in 100 seconds, the Behaviour tests down to 2 minutes. You can see the difference between the build times from the build summary graph (the first two builds are before we upgraded NUnit).

After some digging, I think it’s this issue that was the culprit. Indeed, even when I run from Resharper in Visual Studio, the tests now run lightning fast.

Concluding

I didn’t expect this investigation to turn out like this - our problems were solved just by upgrading our NUnit dependency.

On the other hand, we did clean up our test configuration and dependencies, and double-checked that we’re running our tests in parallel.

Moral of the story is:

Make sure you’re on at least NUnit 3.8 so your tests run correctly!

Next time: Further test parallelisation

Written by Ben Osborne.

Craft beer and cats.