Release: Decisions, Decisions
Concepts like continuous delivery, time to market or innovation are very important at eDreams ODIGEO (eDO). Working on a product like ours makes us focus on the customer. This means that we want to bring the features to them as soon as we can. The present article covers how we manage each release in the front end team and the challenges we face to take decisions around it.
How do we do a release at eDO?
When preparing a new release at eDO, lots of automated and manual tests over a candidate release are performed. Even then, we want to be sure that the release works as expected before deploying it into production to be reachable for every user.
Because of that we serve this new version to a portion of the users in order to check everything is working well. We refer to this portion of our production platform “Beta”.
Then, after a while, each release is compared to the previous one to see how it is performing. As we are focused on the product and its users, this is key to give value to the customer and assure everything works as expected. The main purpose of this is to provide new features as soon as we can without introducing a memory leak or a nullPointerException by the way.
How both versions are compared?
We have several KPIs based performance, response time, memory usage or thread count. And, of course, as a leading eCommerce company we have several KPIs based on on conversion rates. Then that data is splitted according to different criterias like device, browser, country, etc. All of this provides us the insights to actually detect specific problems on some combinations that would otherwise look as deviation errors (ie. problems on some specific version of browser). In other words, with that info, problems can be detected on specific combinations.
How much time is needed to make good decisions?
Regarding the comparison of performances, as it has been mentioned before many indicators are taken into account, which should be confident enough to really reveal useful and trustworthy information. This means we need not only to calculate them but also know how much samples are needed to actually infer that the data is trustable. To achieve this, Mathematics can be useful! The standard error of the data will reveal if the amount of data we have is sufficient to make decisions upon it. Also, defining a good confidence interval is important. After some testing we found the one better fitting in our needs was 95%.
The amount of data for us is mostly given by time (which is closely related to the traffic), thus we can analyze the evolution of the error described before over time. An example of this evolution is illustrated on the graph below:
This curve will oscillate depending on the traffic source, the season and many other factors. Nevertheless an important lesson is that the first hours have an unacceptable error, while after some hours the reduction is closer to insignificant. This means we cannot ensure low values on standard error given a certain amount of time but an approximate minimum can be extrapolated.
With this learnings in mind, the code can be released with a good confidence level in less than a day being in beta. In particular, an stable point is reached after 18-20 hours.
Wrapping everything up, the process to evaluate a release is as in the picture below.
Can this be automated?
Yes. Indeed it has clearly defined rules and we have coded a robot to take this decisions with no supervision.
Now the process is as follows:
- The robot that makes the deploy (deploy bot) starts the process.
- The deployer request to the robot who check the KPIs (decision bot) to take a decision.
- The decision maker checks the data and return a decision.
- The deployer acts according to that decision aborting the deploy or proceeding with it.
At the decision bot side, we have used node to make it possible. In particular, several processes taking data in an hourly basis store it in disk to cache it. This makes decision bot able to reply to deploy bot immediately. Alse, the bot is served by expressJS. All this is managed with pm2 which includes useful features like Zero second Downtime Reload making really easy for us to have this working 24/7. On the other hand, deploy bot is nicely written in Python by the devops team and the integration was really quick. Finally, it is worth to mention that the backend of application being tested is written in Java. Programming languages for every taste!
Conclusions (aka TL;DR)
Our releases need time to prove they are working well in production. Nevertheless, this time can be narrowed to 18-20 hours in average. The process of evaluating the performance of a release is key and should be calculated with clockwork precision. Beyond that, this process is recurrent and automating it is a good idea.