In this article, we’ll explore some of the different ways a team building a mobile app can manage the release of a new feature into the hands of their users. There are varying levels of sophistication available, from the very simple (and very risky) “release and pray” approach through to fancy systems which perform a staged rollout of a feature, adjusting the pace of the rollout based on live telemetry data which measures how the feature is performing. I’ll cover some of the ways in which these feature rollout strategies can vary, and introduce a Maturity Model for mobile feature rollout.
Anatomy of a feature rollout
Looking at the many variations of how feature rollouts are managed there are a few distinct facets which differ. Let’s look at a couple.
Rollout strategies differ in terms of how many stages or environments a feature moves through on its way to your users. In the simplest case a mobile dev might be pushing a build from her laptop directly to the app store, and then marking the new version of the app for immediate release to any user who wants it.
For larger teams it’s more common for new features to be initially staged into some sort of beta release. This beta is distributed on a regular basis to testers using a mechanism such as TestFlight. Quite often these beta builds will actually contain a set of distinct features which have been bundled together to form a potential new app release. Each individual feature will receive some amount of focused testing via these beta builds.
As a team gets close to an app store release they might formally declare a given beta build as a “Release Candidate” build, which will often have a final round of general “smoke” testing. Assuming all is well that Release Candidate will then be pushed to the app store and made available for users to download.
The most sophisticated engineering organizations use techniques like Feature Flags to decouple the distinct features which have been bundled into a given release. These feature flags can be used for fine-grained control of individual features, turning a feature on or off even after the release has been made available on the app store. A good feature flagging system will go even further and allow a product manager to gradually rollout a new feature – first to 5% of users, say, then 10%, then 50%, and eventually to 100%.
This extremely granular control from feature flagging system allows a given feature release to essentially be staged through each individual app installation that’s live in a user’s hands. We’ll see later in this article that an advanced feature flagging capability can sometimes obviate the need for a formal beta stage.
The other powerful benefit of a feature flagging system is that it also allows you to “un-release” a feature – rolling release back down to 0% of users – when it has made it into user’s hands but turns out to have issues. When available this is a much better option than rushing a new build through the app store and hoping folks who got a bad build will upgrade. Identifying whether a given feature might be causing issues brings us to the next aspect of feature rollout: Feature Telemetry.
Telemetry is the ability to measure how a feature is performing in real user’s hands.
If you don’t invest anything into building telemetry capability then your only real option for visibility on how a feature is performing is checking app reviews and perhaps monitoring social media traffic.
The first step for many app teams in getting some basic telemetry is via a crash-reporting service. An uptick in error reports or crashes after a feature release is a clear sign that something might not be right.
Mobile apps integrated against backend services that are owned in-house allow indirect telemetry on the app via the API metrics for your backend services. If you were hoping that a new feature would increase engagement in part of your app then you will perhaps be able to confirm that via increased traffic in the corresponding part of the backend API.
Data-driven product teams will often want more detailed insights into how users are interacting with their apps. These teams will directly instrument the app’s UI, tracking individual taps and swipes and streaming these interactions into an analytics solution. That data becomes a very rich source of information on how a new feature is performing based on how each user is interacting with it.
UI-level metrics provide low-level insight into user behavior and are relatively easy to set up. Business KPIs such as conversion rates, in-app purchase events, or broader engagement metrics provide an additional layer of telemetry. These are more product-specific metrics which require custom work to set up. In return, they provide a great deal of value by correlating a feature release directly to business impact.
When a team has both feature flags and some sort of feature telemetry in place they have the necessary components for a feedback loop that allows for a controlled release, where telemetry feedback on the feature’s performance can be used to control the progress of a release. In contrast, the default situation is an uncontrolled release – once the release is pushed to the app store it’s essentially out of your control.
An A/B test is one variant of a controlled release which is usually used in an attempt to optimize against a business goal. A Canary release is another variant- releasing the feature to 10% of users, monitoring for any issues, and then performing a general rollout.
For most teams, the control in a controlled release is provided by a human, perhaps a product manager or a tech lead. We’ll discuss later in this article how automation can help augment or even replace human involvement.
Completing the feedback loop requires correlating user/app behavior to features. Being able to segment telemetry analytics based on the state of feature flags is critical. Integrating different tools to enable this can be a frustrating exercise, but it is absolutely necessary for teams that want to manage feature release in a data-driven fashion.
A Feature Release Maturity Model
Having identified these different aspects of a feature release strategy we can start to think about a Maturity Model for mobile feature release, ranging from the very basic to the highly sophisticated. It’s important to note that a maturity level isn’t a value judgment. It doesn’t make sense for every app delivery team to be sitting at Level 4. The appropriate level for a team depends on many factors – the relative maturity and size of the application code base, the size of the team and it’s current capabilities, and importantly also the culture and values of the team and the broader engineering organization.
Level 0 – fully uncontrolled release
A team – quite likely a single-person team – operating at this level does not really have any feature staging or feature telemetry, and no formal control mechanisms for a release. When a feature is deemed ready a developer will create a production build on their laptop and push it to the app store, thus releasing it to (eventually) all users.
Level 1 – uncontrolled release with beta testing
At this level a team is creating Release Candidates of some kind and has some sort of release testing process (perhaps not formally defined) involving more than the developer who wrote the feature. This release testing is done before a release candidate is declared ready for the app store. Once the release has gone live the team is generally monitoring the quality of the release by watching crash metrics. If something goes wrong their only option is to patch the issue and push a new release to the app store – often bypassing the release testing phase and thus risking the introduction of new bugs while attempting to fix the original issue.
Level 2 – controlled release, limited visibility
At this level, a team has some sort of feature flagging capability in place and is able to use it to perform a controlled rollout for important or high-risk features. If they see issues like a spike in error reporting or reduced engagement based on API metrics they will abort a rollout and un-release the feature while they investigate the root cause.
Level 3 – data-driven release
Teams at this level have more advanced analytics capabilities embedded directly into their apps and are able to use this to perform A/B tests of a new feature as part of a rollout strategy. At this level teams are also able to actively manage the progress of a controlled rollout based on data coming in from feature telemetry, pausing or rolling back a release if data indicates it may have issues.
Teams at this level of maturity may also be moving away from staging a release using a separate build or environment and leveraging feature flags instead. Rather than creating a release candidate in order to test a feature before release, they will instead deploy a new feature to production, protected by a feature flag which is only enabled for Beta testers or Internal users. These users are able to perform manual testing and stakeholder acceptance testing within production infrastructure without exposing the unproven feature to the rest of the user base.
Organizations at this level will discover a need to introduce more structure to their release management process. This might start with a wiki page or shared spreadsheet which tracks which features are in each release, who owns each feature and who is managing the release, what stage a release is at, whether a feature will be getting a canary release treatment, etc. There will often also be a categorization scheme for features and releases, used to decide how conservative the rollout of a release should be.
Level 4 – automated release
The highest level of maturity is reached when teams are able to close the feedback loop and use data collected from feature telemetry to control the progress of a release. Organizations operating at this level have automated some or all of the mundane aspects of managing a feature release. A feature release will be initiated by a product manager but monitored and managed by software, with humans only involved in the process if telemetry indicates that the release may be causing issues.
At this level pre-release and beta testing is done in production for any significant feature, using feature flags to manage its exposure.
Appropriate levels of automation
Maturity models have the unfortunate tendency of appearing much more cut-and-dry than the messiness of reality, which is much more nuanced. This certainly applies to the gap between levels 3 and 4 of our maturity model. “Automation” is not a binary concept – a task is never entirely manual or entirely automated. Instead, the level of automation around a task constitutes a rich spectrum, described very well by John Allspaw in this article. This is a rich field of academic research – the papers referenced in that article are highly recommended – but for the sake of this discussion I’ll simply point out that moving from a data-driven but manual release process to a fully automated push-button experience is not a single step.
Teams might start along that journey by adding some simple alerting heuristics on top of their feature telemetry: “ALERT: Error counts appear to be trending badly and correlate to a recent canary release cohort”. The next step towards automation might be “ALERT: Error counts appear to be trending badly. This release will automatically roll back to 0% unless you tell me otherwise”. Next, we might have “The 5% canary release for Feature X has hit statistical significance and appears to be performing within tolerances. I am automatically increasing the canary cohort to 50%”. And so on.
Most organizations will not keep following this path all the way to a fully automated release process, and appropriately so. That level of automation imposes a significant cost and will not align with the cultural values of many engineering organizations.
Feature release in the context of mobile
While this article has been focused on feature release for mobile apps, a lot of the concepts apply broadly. Let’s talk about what is specific to feature release for native mobile apps.
With web applications, it’s extremely common to stage a new release by first deploying it a pre-production (and/or staging, test, dev, QA) environment. A given build might travel through several environments before eventually being deployed to production. With native apps this is inverted – rather than the same build traveling through multiple environments we instead have multiple builds, one for each stage.
This can be slightly more cumbersome to manage and introduces a risk that there are unexpected differences between a production build and test build. Feature flagging helps with this by allowing testing to be done against the exact same binary that will be pushed to the app store.
There is sometimes a reluctance to frequently push new versions of native apps for fear of inducing “update fatigue”. This can lead to a desire to batch up a set of features into a formal release. However, the trend towards automated app updates in both iOS and Android seems to be reducing this phenomenon somewhat.
Controlling release of a new version
The main difference between release for mobile vs. web is the relative lack of control around deploying a new application build to users. With a web application, you can deploy new versions of your app 500 times a day if you really want. The turn time for each of those deployments can also be very short (as long as you’re willing to put the effort into building robust deployment tooling). And each deployment of a web application is (almost) immediately rolled out to 100% of your users.
In contrast, with iOS, you have to deal with a relatively slow app store review process and with both iOS and Android you don’t have any reasonable way to force a new build into users’ devices. This adds up to much longer cycle times between a feature being deployed and that feature being in a user’s hands. With a web application, it’s feasible – although not necessarily advisable – for a development team to test out a release by simply deploying it into production, rolling back to the previous release if something bad happens.
Taking this approach with a native mobile app would require a much higher deal of confidence. If the release does turn out to have issues it takes so much longer for that roll back to take effect, and you have no good way of ensuring that all users actually roll back. This is why mobile teams tend to have a more formal release candidate test process than their web dev counterparts.
Even for mobile dev teams operating at the highest feature release maturity, this slow deployment cycle time must still be taken into account. There will always be a lag time between a feature being out of development and that feature is available for release via a feature flag – you have to wait for the build containing that latent feature to be fully deployed into the hands of enough users before you can do any sort of controlled release. Depending on the popularity of your app and the behavior of your users this can add days or weeks to your feature release cycle.
There are myriad ways to manage feature release for mobile applications. To get some element of control into your release process requires a feature flagging capability, and to complete the feedback loop a team needs to invest in analytics capabilities. However, going all the way to an automated release management process is not required. Many teams will get the most benefit by building out a defined, lightweight feedback process and running it by hand.
About the Author / Pete Hodgson
Pete Hodgson is a consultant to Rollout.io, a software engineer and an architect. His clients range from early-stage San Francisco startups to Fortune 50 enterprises, focused on enabling continuous delivery of software at a sustainable pace via practices like test automation, feature flags, trunk-based development and DevOps. Pete is a frequent podcast panelist, regular conference speaker in the US and Europe, and a contributing author.
Rollout is an advanced feature management solution that gives engineering and product teams feature control, post-deployment. It’s the most effective way to roll out new features to the right audience while protecting customers from failure and improving KPI’s.