Do You Have a KPI Data Standardisation Problem?

October 29, 2019 by Stacey Barr | Last modified: January 28, 2020

Getting the right performance measure and KPI data is often hard, but it’s unnecessarily hard when you can’t navigate existing datasets because your KPI data has no formal or deliberate standardisation.

My business insurer asked me recently for an update on my top three customers over the past few years. It’s not something I track in my business dashboard, so I needed to run a special report to find out. It was then I realised that I don’t have a standard way of capturing customers in my accounting system, and therefore had to work it out manually. What should have taken five minutes took three hours.

In my line of work, I should know better. And apparently, so should the vast majority of organisations. According to HBR, only 3% of companies have more than 97% accuracy in their data records. A lack of data standardisation is one of the big data management mistakes that contributes to this data quality problem.

What standardisation of data means

How consistently we capture and record specific data items, like dates or customer account names or product codes or locations, is what data standardisation means. Like my problem with customer account names, without a standard, my team captured the account names in different ways for each new project, like these:

Client’s name e.g. Micky Maverick
Client’s department or service area e.g. Airforce
Client’s organisation e.g. Defence

Data standardisation is one of the important pieces of an organisation’s data architecture, which determines how much integrity the data has, and ultimately how useful and usable the organisation’s data assets are.

Of course, it’s a waste of time and energy to aim for perfect data integrity. But if you have poor data standardisation, you’ll struggle to bring to life the performance measures you need when they depend on linking two or more data sources together.

What happens when data isn’t standardised

Rail networks are usually segmented into line sections. These are sections of track varying from a few hundred metres up to a hundred kilometres long. Line sections are aligned to the signalling system, which helps train controllers know where trains are at any given time. They are used as reference points to log the locations of things like points and crossings, curves, gradients, level crossings, derailments, breakdowns, track defects and track infrastructure.

To improve safety performance, one railway asked “is the risk of a derailment affected by track defects, curves or gradients?” To answer this question, they needed data from several databases.

Their safety incidents database logged the date and location of derailments. Their track maintenance database logged the location and type of track defects. And track characteristics like the location of curves and gradients was also logged in a database. But each of these databases used a different definition of line section.

No-one knew that ‘line section’ 6101 from the safety database corresponded to ‘line number’ B445 from the track maintenance database or ‘line code’ NE124D456 from the track description database. Their data couldn’t be merged or joined, and the question therefore couldn’t be answered.

Is it worth fixing the data standardisation problem?

The railway could do what most organisations do, and that’s to have a bleat about the data problem for a bit, then give up on answering their question. Fixing a data problem like that would be too costly, right?

It would certainly take some time for a team of people to sift through thousands of records to manually match up the line section data across these several databases. With some ‘find and replace’ coding and
some manual effort on the exceptions, it might take an average of 15 seconds to standardise each record. If we have 50,000 records of data, that’s about 200 hours of effort. At an analyst’s pay rate of $30 per hour, correcting the line section data standardisation problem could cost around $6,000.

But what’s the cost of ignoring the data problem and never learning how derailments might be prevented? What if the data revealed that just one or two things could be fixed, to reduce derailments even by only one per year? When we know that derailments not only damage rollingstock, mangle massive lengths of track infrastructure, destroy their payload, and cause flow-on delays due to repair time, a single derailment can cost tens or hundreds of millions. Heaven forbid anyone is also injured or killed.

The cost of fixing data is rarely put into context. And it should be. We choose our performance measures because they monitor strategically important results. We need to take the data for our strategically important performance measures more seriously.

Three steps to start fixing your data standardisation

The long term solution to data standardisation comes back to your organisation’s data architecture design and implementation. But waiting for that to happen is wasting time. You can do a few things right away, by focusing on the data required for your most important performance measures:

Centrally decide what data standards to set and communicate these using the performance measure
definitions in your corporate KPI dictionary.
Audit existing data for your important performance measures to find
anomalies and mismatches, and just do the work to ‘clean’ that existing data.
Modify existing forms and data collection tools that gather that data, to clarify ambiguous questions or instructions, to add dropdown lists or pre-coded option lists, to provide examples of correct data formats, etc…

When you can’t link related sources of data together to build a bigger picture of performance, it’s like trying to navigate without coordinates. You can only see what’s immediately around you; what’s obvious. You’ll never see the bigger picture, and never find the insights you should.

You can’t navigate your way to insights with performance data that isn’t standardised, because if you can’t link data sources together, you can’t see what isn’t already obvious.
[tweet this]