Friday, 21 February 2014

Defect discovery and risk mitigation in Continuous Delivery

Barry Boehm, the inventor of spiral model of software process and Constructive Cost Model (Cocomo) in 1976, argued that defects are more expensive to fix when they are found in the later stage of software development. His concept is later developed into Cone of Uncertainty (CoE) in his next book “Software Engineering Economics” (1981). The cone of uncertainty concept is simply represented on the diagram below:
The basic premise of the concept is that uncertainty evolves and grows as project enter the next phase. Boehm’s spiral method and Cocomo is to anticipate the risk and uncertainty in software project. For many years, the software development has revolved around the Boehm premise of software development economics.
On the other hand, Laurent Bossavit, author of The Leprechauns of Software Engineering (2012), said differently. Bossavit argue that the “underlying evidence justifying Boehm’s curve…just isn’t up to any reasonable standard of ‘research.’” (see "What does it really cost to fix a software). Bossavit, who also the head of Agile Institute, noticed that Boehm misinterpreted the result of his own studies. Therefore we can question the validity of the concept. However we know that the the risk and uncertainty do exist in the software project. So the most important thing is to deal with the risk and uncertainty.


I suggest not to argue about the exact shape of the curve, neither on whether there is any scientific method involved, but rather to identify the impact of a change and to start identifying some work around and mitigations. Especially in case of software defects (also known as bugs). It is because we need to assess the impact and mitigate the risk. As in other fields, we also cannot avoid risk in software development. However, we must be prepare to encounter the risk. Therefore we have to be able to assess the risk and know the risk we face and mitigate the risk. The major risk in the software development is to be found on the transition stage from testing to deployment. In order to ensure successful deployment in the production server we have to minimize the risk in error. The code error will become defect in the deployment and lead to system failure in the production environment. We can minimize such failure by eliminating the defect from the code error.


Let us begin with assessing the risk correctly using the defect impact matrix below. At the diagram below I suggest a matrix representing the impact of a defect. Instead of representing it over time though (like Barry Boehm did), I use the delivery stages on the horizontal axis.
Indeed, defects can be detected at a different development and release stages. The impact differs depending on how early or late it’s been discovered. Defect is the outcome of the risks that are not properly identified and assessed, therefore when the risk occurs the organization does not prepare to monitor and mitigate risks.
The content of the matrix is to be adapted to your organization. In fact the one I propose below would describe an organization with rather average software delivery processes:
(Score A to F, A being the better score, F the worse. )
We could implement continuous delivery such as canary deployment or blue/green deployment method to reduce the lowest score, for the following score:
  • F(1) score could be mitigated by offering the new version of the product to only a subset of your user base (ie 5%). Also called Canary Deployment.
  • F(2) score could be mitigated using advanced production deployment strategy of the 4th or 5th level of Continuous Delivery Maturity such as: Canary deployment or Blue/Green Deployment
Let us talk a bit about two deployments we mention above: Canary deployment and Blue/Green deployment.
Canary deployment is the deployment where software is tested in production level by routing a subset of users to new functionality. This deployment is important to test how the changes affected the users in general, but the entire system is not affected. Since the canary system is only tested to some users of the system not the entire users.
Blue/Green deployment is the deployment method of creating two identical production environment, this method provide you capability to rapidly roll-back to another environment when anything goes wrong with one environment. Blue/Green deployment enable you to switch the production environment from one environment to the other environment by rerouting the application request. The following diagram explains the Blue/Green environment.


In his book, Continuous Delivery, Jez Humble and his co-author David Farley suggested a Continuous Delivery maturity level. The maturity model ranges from 0 to 5 where 5 is achieved only by industry leaders such as NetFlix, Twitter, Github and a few more.
The matrix above could be representing an organization that has a well defined and reliable software development process. However organization can improve their maturity level by automating more steps and using advanced deployment and release processes, combined with Agile methodology. We see that in the production stage they can improve the quality of product by using automation process. Either using Canary Deployment or Blue/Green Deployment or both of them, we can correctly assess the risks and able to mitigate them hence improving the process.
We could imagine an organization going from CD level 3 to 5 in a matter of a year and reaching the following result:
As you can see, the organization is performing much better above. Discovering defects late, even in production, is not catastrophic and has almost an insignificant impact.
Therefore, it is better for companies to not hesitate to embrace those risks, keep up with fast pace innovation, while having a way to mitigate and remediate every single risk they can encounter. We cannot avoid risk, because risk is already there, consequently we have to be able to focus on the job and progress and be prepare to deal with the risks associated with software development.


Once again, the content of the matrix is subjective. I just want to point out the benefits of Continuous Delivery, Automation of processes and Agile methodology.
Combining the three of them together, will allow an organization to improve drastically, keep the customer happy, innovate, decrease risks and increase revenues through well managed technologies.

Wednesday, 12 February 2014

Do we really need task estimation in Scrum

I came accross this question on the LinkedIn Scrum Practitioner group, asked by one of the Certified Scrum Master. (
Why we have two level of estimation in scrum? What purpose do task estimates serve?
This topics pinch me as I've been pondering about that for a little while:
why do we do tasks estimation and is it a good idea ?


I also came accross this blog post recently, from one of my acquantainces:
The author explains the process of calculating team capacity for a given Sprint using Focus Factor, instead of total number of hours available.
For 5 people working 8 hours a day for 2 weeks :
That means the total working hours will be 5x8x10 = 400 hours!
Estimated planning for this capacity will be a disaster. It will lead to team working over time, rushing towards the end, quality cuts and low team morale.
Now let us take into account the Focus Factor of 6 hours per day:
Traditionally, project managers used 6-6.5 hours as planned hours in a day for project execution. Focus factor is team ability to remain focussed on the sprint goals without any other distractions.
The idea here is to sum up all the hours and find out how big the user story is.
Based on the capacity calculated above, you find out how many tasks and user stories can fit into a given the Sprint. Except that such estimations are far from accurate. On top of this, using tasks estimation instead of user stories does not make it more reliable.
Now come the question: Is estimating tasks in hours a good idea at all?


Let's see what people on the Scrum LinkedIn Practitioner group have to say.


So here is my top list of arguments for those in favor of tasks estimations:
  • Task level estimation is used in the daily burndown. It is a great way to show the team where they are each day. It may take more than a day to complete a user story and there isn't a way to see the progress on that story unless you track the task hours.
  • Tasking helps the team to plan the actual work and provides transparency.
  • Task estimation allows progress tracking, and supports decision making. Will the stories really fit in the iteration (i.e. a sanity check)?
  • When it comes to sprints, burning down story points may lead to very large steps where it appears nothing is done, then suddenly a large drop. This frightens people and makes them feel uneasy as generally people want to know if they are somewhat on track.
  • If the developers are highly experienced and mature as an agile team, they can get away with relative sizing using story points. But as this is seldom the case (most teams consisting of mixed abilities/experience levels), task breakdown is welcome.
  • From a team perspective, the main reason to have estimates is to be able to detect when things start going off track. It tells us that we need to look and possibly make a decision to go back on track


And here is my top list of arguments against tasks estimations:
  • Dual estimation using points and time is overkill (most of the time). You have to keep in mind that both of them are estimates, thus a guess on how big or much time/effort will be taken up with them.
  • The major risk of doing task estimation is it leads to falsify direct relationship, that initiate people to start to see between story points and time (time sum of all tasks = story point value of story).
  • Do we already have a time measurement in scrum: the sprint timebox. In order to find how many story points can a sprint fit in, we have to find the answer by doing, not by guessing
  • In order to start a sprint you only need enough work for a few days in advance. You don't need to break down everything from the beginning
  • There's a reason many scrum masters forego the time estimations (on any level): they are imprecise. If your story point estimation is not useful enough to draw a reliable velocity, then definitely the time estimates won't provide more details.
  • As a very rough indicator I would say each person in the team should be completing 2-4 items a day and marking them as done. More than that is too much administration which will slow the team down, and completing items less than that is too big and does not facilitate good communication and getting stuff done.
  • The Scrum Guide never uses the word task. It does use the word estimate in combination with product backlog items. So yes, according to the Scrum Guide there is estimation of Sprint Backlog Items
  • Scrum actually does not define how or when estimation should be done, but merely state that there is an estimation.
  • The more time you spend on estimation, the more it highlights the dysfunction of pressure which results in technical debt.
From the above, we cannot be convinced estimating tasks is something developers should systematically do. Let's see if there are alternatives solutions to support story sizing and sprint forecasting.


Here are some ideas, presented by LinkedInders that can be taken into consideration:
  • What I favour doing is breaking tasks into similar size, approximately half-day to day-long chunks (day long is definitely the upper limit IMHO). Then I look at burn-down by counting the number of tasks yet to complete. I don't care whether they are 4 hour or 8 hour; I don't even want to try and estimate to that level. It all averages out.
  • If you have a lot of very small stories where each one takes a couple hours and there are not many tasks on each, then by all means burn down the stories. If however, you have only a few stories and lots of task, burn down the count of tasks.
  • Task estimates are about developers planning technical work. It is less than an out-facing estimate, and more a way to plan how to approach each task, what would "normal time spent", and when to go for help.
  • Story sizing is not really about creating estimates. Ideal days, story points and relative sizes do not directly translate into hours and days. These are more for creating an understanding for the PO and the rest of the organization how much work each piece of the backlog entails.
  • We live, eat and breath by the clock and estimating the time is so ingrained in us, that we naturally want to know and predict the future, I would say the majority of people and as I count myself here, I cannot resist estimating in some form or another, even if it is a 2 second gut feeling.


Below is what I answered to my peers on the topic. This is what I learned from 3 years doing tasks estimates:
In my opinion, when we do task estimation, we tend to put too much power into the estimations. The maths are so easy that we start believing in them: 4 tasks of 2 hours + 3 tasks of 4 hours + 1 task of 8 hours. = 28 hours! Easy, right ? No… Most of the times it never happen , even when we try to use buffer techniques it still never works out
And the PO or SM to say:
so that fits into the sprint, right ? if we look at the hours...
The answer to the question is not. Let us answer the question regarding the team: do you believe you and the team are able to achieve all this into a sprint ? Put away the hours and tasks, let us use our gut feeling. We will know that the sprint length is the only valid time period.
Although I must say breaking down a story into tasks has been a useful exercise to brainstorm and think about the solution. It might not be something we want to do all together though. Another approach is for a couple of developers to spend a couple of hours doing solution design.
I've been doing tasks estimations for so long (3 years) and I've stopped believing in it. It is often a waste of developer's time, and leads to frustration:
is it 3 hours, is it 2.5 hours ? or maybe 3.5 hours ???
Ok, let's simplify, only 4 and 8 hours tasks are allowed. Even that does not feel right: we ask hours estimations and then start to round up and put constraints on what should be a very simple task.
One thing I understood: if you depend on task estimations, maybe your user stories are too big and need to be broken down.
In the end I recommend no more than 5 user stories sizes. Something like:
  • small
  • large
  • xlarge
  • unknown : might lead to a spike for investigation


Here are, my take on the topics above based on my experience with task estimation:
  • Do task break down as a team if it helps brainstorming and designing the solution
  • Do not systematically do task estimation (putting too much power into task estimations might actually lead to technical debt and lower quality)
  • Use small user stories, not epics: a user story is an entity that can be tested independently. For example, replace "As a user I want to manage products" with 3 stories: "as a user I want to be able to create products" + "as a user I want to be able to edit products" + " as a user I want to be able to delete products"
What about you, would you use task estimation ?