Planning Poker relies on relative estimation, in which the item being estimated is compared to one or more previously estimated items. It is the ratio between items that is important. An item estimated as 10 units of work (generally, story points) is estimated to take twice as long to complete as an item estimated as five units of work.
An advantage to relative estimating is that it becomes easier to do as a team estimates more items.
Estimating a new item becomes a matter of looking at the previously estimated items and finding something requiring a similar amount of work. This is easier to do when the team has already estimated 100 items than when they’ve only estimated 10.
How to Select Initial Estimates for Comparison
But, relative estimating suffers from a bootstrapping problem: How does a team select the initial estimates to which they’ll compare?
My recommendation is that when a team first starts playing Planning Poker, team members identify two values that will establish their baseline. They do this without playing Planning Poker. They do it just through discussion. After the baseline is established, team members can use Planning Poker to estimate additional items.
Ideally, the team is able to identify both a two-point story and a five-point story. There is evidence that humans estimate most reliably when sticking within one order of magnitude.
Identifying a two-point product backlog item and a five-point item does a good job of spanning this order of magnitude. Many other items can then be more reliably compared against the two and the five.
If finding a two and a five proves difficult, look instead for a two and an eight, or a three and an eight. Anything that spans the one to 10 range where we’re good estimators will work.
Avoid Starting with a One-Point Story
I like to avoid starting with a one-point story. It doesn’t leave room for anything smaller without resorting to fractions, and those are harder to work with later.
Additionally, comparing all subsequent stories to a one-point story is difficult. Saying one product backlog item will take two or three times longer than another seems intuitively easier and more accurate than saying something will take 10 times longer.
I made this point in my 2005 Agile Estimating and Planning book (now also an on-demand video course). In 2013, it was confirmed by Magne Jørgensen of the Simula Research Lab. Jørgensen, a highly respected researcher, conducted experiments involving 62 developers. He found that “using a small user story as the reference tends to make the stories to be estimated too small due to an assimilation effect.”
Why Use Two Values for a Baseline?
Establishing a baseline of two values allows for even the first stories being estimated to be compared to two other items. This is known as triangulating and helps achieve more consistent estimates.
If a team has established a baseline with two- and five-point stories, team members can validate a three-point estimate by thinking whether it will take longer than the two and less time than the five.
Citing again the research of Jørgensen, there is evidence that the direction of comparison matters. Comparing the item being estimated to one story that will take less time to develop and another that will take longer is likely to improve the estimate.
Don’t Establish a New Baseline Every Project
Some teams establish a new baseline at the start of each project. Because this results in losing all historical velocity data, I don’t recommend doing this as long as two things are true:
- The team members developing the new system will be largely those involved in the prior system. The team doesn’t need to stay entirely the same, but as long as about half the team remains the same, you’re better off using the same baseline.
- The team will be building a somewhat similar system. If a team is switching from developing a website to embedded firmware, for example, they should establish a new baseline. But if the systems being built are somewhat similar in either the domain or technologies used, don’t establish a new baseline.
Whenever possible, retain the value of historical data by keeping a team’s baseline consistent from sprint to sprint.