As you know missing data is a problem for statisticians. It is analogous to having missing pieces of a puzzle – you kind of know the full picture but you’re forced to somewhat guess at what the missing pieces might look like. In the purest sense, records with missing data would be eliminated. In the real world of compiled and sourced data that means practically all records would be removed because a vast majority of them never have 100% of the fields populated. So how is missing data handled?
There are many schools of thought. The most popular way is to impute by using an overall average. However, that approach tends to be misleading and results in false positives. Instead, mitigate risk by treating missing data as a “worst case” scenario based on the client’s needs. For example, if a client is affluent prospects and a prospect record is missing data such as an income value, then we will assume a low income value. There likely will be other affluent factors that are not missing that will improve the probability score.
By taking an approach such as this we’d rather lower the calculated probability than create a false positive that ends up getting this prospect record selected for a marketing campaign. Although you may be overlooking a viable prospect, you’d rather save your client’s money by chasing the best prospects in a worst case scenario.