One of the staple technical practices of agile software development is Continuous Integration (CI). A continuous integration environment provides fast feedback on the state of your code. In a typical configuration, the CI tool is configured with a "job" that monitors the source code repository for commits. When a commit is made, the job:
- Pulls the latest code
- Performs a build
- Runs a suite of automated tests
- Informs the team of the results
One way of thinking about the CI job is as your "canary in a coal mine". The job of the canary is to warn you (by getting sick or keeling over dead) when there are life-threatening conditions (the code fails to compile, or automated tests fail). Thinking about the CI job with this metaphor, we can imagine some useful guidelines:
A healthy canary requires less effort
A sick canary that gets a little sicker may not be noticed… or it takes a lot of effort every time you look at the canary to see if it has a new disease. If every time a job runs there's a failure of some kind (e.g. some tests fail), then additional effort and attention is needed to discover if something new is broken. Typical human behavior is to stop checking after a while. Then, when new failures have been introduced, so much time has elapsed that it takes more time and effort to find out what went wrong and take care of it. Keeping the canary healthy (keeping the job failure-free) takes much less attention and effort over the long haul.
A visible canary alerts you sooner
A canary that is not visible - out of sight, out of mind - doesn't serve as an early warning. If a team instantly sees when the canary gets sick (a new failure occurs in the job), then the new issue can be taken care of immediately. If the state of the canary is only seen occasionally (someone is required to specifically go look at the job state), then long delays can occur after an issue has been introduced, requiring more time and effort to find and fix the problem.
A fast-to-respond canary keeps you healthier
A canary that is slow to respond to a hazardous condition doesn't help the miners stay healthy. A build-and-test job that takes more than 10 minutes to run means that the developer's attention has turned to something else by the time failure feedback arrives, and it takes an expensive context switch to come back and resolve the newly-created problem. A job that fails fast when something is broken reduces the cost and cycle time to make fixes. Fast automated tests also make it easier (thus: more likely) for the developer to ensure test success before committing new code.
A canary is better than a barn cat
If the barn cat has gone missing, it might be a few days or weeks before someone notices… and it is unlikely that a search party will be sent out in pursuit. If your CI job is a barn cat, it takes a while before a failure is noticed, and no-one feels compelled to go fix it. If this is the case for your CI job, it is time to have a conversation about shared ownership and fast feedback, and what needs to change to make the CI job valuable.