Faculty of Mathematics and Science Dean Ejaz Ahmed and two colleagues, Dursun Aydin and Ersin Yilmaz from Turkey’s Mugla Sitki Kacman University, were the recipients of the Grand Prize Advancement Award.
The award was presented at the 13th annual International Conference on Management Science and Engineering Management at Brock University earlier this month.
The paper, titled “Nonparametric Regression Estimates based on Imputation Techniques for Right-Censored Data,” analyzes models and mathematical choices that impact our daily lives. The paper was selected by an international awards member team spanning 11 countries.
A common problem with making predictive decisions in life is only having access to a portion of the total data. However, the quality of an analysis is largely dependant on the quality of its data.
For example, if someone is choosing a car to buy and is only given access to disparate sets of features across multiple websites, each listing the key purchase points of the car, it can be challenging to compare.
Perhaps the data says both cars will last a minimum of 10 years, but what the customer really wants to know is which will last the longest after that period of time. The paper offers a comparison of models to choose from in order to make the best choices involving Right-Censored Data.
Right-Censored Data is a point above a certain value, but it is unknown by how much. Another example would be a soup that has too much salt, but how much more than ideal is unclear.
What makes a car last a long time is variable. Could it be engine wear, adherence to regular maintenance schedules or exposure to weather? If we only know the answer to one or two of these variables, the missing information is called Censored Data.
The models Ahmed chose could be used to predict which of these combinations produces the longest lasting vehicles when not all data for each imputation is available. They attempt to “guess best” and smooth out errors to find the combinations that work best, most often.
Ahmed and his team selected a kNN (k-nearest neighbour) model and a PM (Predictive Model) to test which could produce the most accurate results.
kNN uses real values taken from examples in the data set.
Imagine four vehicles lasted more than 10 years and knowing a large portion of their maintenance schedules, total kilometres driven and other factors. Each of those factors is a k-value. kNN would guess the Censored Data (unknowns) with the average of values from the four “test cars” as the closest neighbour of data points. kNN totals the influence from all the factors and attempts to tell us how these factors, all combined, contribute to the longevity of the car.
The PM instead would predict the values based on the four test cars and replace the value instead of averaging it. Our total longevity is then dependant on how important each factor is to determine what makes a car run long. The issue with PM models, says Ahmed, is how dependant each variable is on the total outcome.
Overall, both methods would provide key information to make a purchase decision, but Ahmed and his team recommends using kNN in cases where Right-Censored Data is present to give the best estimate.
When choosing a model to problem solve, Ahmed recommends recalling a notable GEP Box: “All models are incorrect, yet some are useful.”