Validation techniques to improve your data quality

Validation techniques to improve your data quality

How often have you been asked a question on a form that you either can’t answer or don’t want to answer? How do you react when you see this on a form:
Would you prefer to be contacted by phone or email?
or this:
What is your date of birth?
I often think “Neither” to the first and “none of your business” to the second, but it’s not uncommon to find that you cannot proceed without answering. At this point it’s simple enough –lie.  Designers require an answer, users don’t want to provide one, and so they lie. From a data perspective, we now have a “known” answer so we can carry on with our processes. But the business didn’t want a lie, and even though it is more likely to be referred to as an “inaccurate answer” than “a lie,” the data is still wrong.
The problem in the data world is that people like mandatory fields in the mistaken belief that there must be a “right” answer.
Consider a field that holds two possible values, it might be the result of a Yes/No question, a True/False, a checkbox checked or not checked, etc. and the implication is that there are only two acceptable answers.
For the form developer, such a field is much simpler – there are only two possible ways to proceed and people design accordingly. Unfortunately, data is seldom that simple and more thought needs to be put into data if we are going to get real value from it.
Let’s consider a scenario where you want to sign up to use a website. To gain access you need to answer some simple questions such as “Are you over 18?”
On any given date, the factual answer to this question is either Yes or No. However, the data world has to deal with more possibilities than that.
Yes/No/No answer was provided/The respondent declined to answer/Yes (although that isn’t true)/No (although that isn’t true) and even “I don’t know” – something that may not be appropriate for most of us but is certainly possible!
When designing databases, input screens, and processes, it’s important to consider all the possible answers, the choices people make when faced with questions, and the requirement for data to be useful if you are going to bother to collect and store it.

And

The first box requires less effort to create, is much quicker to read, and takes up a lot less space on the page – but it may not give you an accurate (or valuable) data.
While designing the database, a programmer should remember to keep options such as “no answer was given,” or provide the user with an option to “decline to answer.” Both options are better than collecting factually incorrect data. We should avoid data that is based on “I didn’t want to give you the right answer so I made something up” (that you now can’t trust).
Organizations must also consider how easy it is for them to provide the data you want – a tedious process will also get you the wrong answer. I was recently asked by an app to enter my date of birth using a calendar. The image below shows on the left the screen as first presented and on the second the same screen after 100 clicks on the left one-month symbol “< ”!     With (sadly) a lot more clicks to go to get to the right month and year, I exited the application. To collect useful data, it’s essential to make your users understand why you need their data. You should also let them know how their data will be used and what measures you have taken to protect it. Data is supposed to be freely given – if the user doesn’t want to give you their data that needs to be OK too. The evolving view of data recognizes that it belongs to the customer. The choices about how it is provided and how it is used are theirs and restricting those choices is likely to lead to “silly answers.” To learn more about our solutions please feel free to reach out to me at stephen.timbers@capgemini.com. Authored by: Stephen Timbers

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.