Compiling and processing information was once a relatively straightforward process.
A person would gather a certain number of facts and figures based on a particular sample size and use a machine to record the data that were collected.
Analyzing the data to better understand any patterns and predict trends emerging from the information was fairly clear-cut, as variables, the number of things that could be measured and counted, were somewhat limited. ‘Small data’ can be processed by a standard software package and easily understood.
Enter today’s era of big data, defined through the basic ‘three Vs’: volume (lots of it), velocity (pouring in at record speeds), and variety (from traditional databases, documents, e-mails, video, audio and many, many other sources).
Professor of Mathematics and Statistics Ejaz Ahmed researches big data, specifically, high-dimensional data analysis.
Data is said to be ‘high-dimensional’ when a small or moderate sample size has a large number of variables. Advances in technology mean the types and level of detail of data are increasing at breakneck speed to the point where the amount of data far exceeds the sample size.
Ahmed, who is also Dean of the Faculty of Mathematics and Science, explains that this makes it difficult to understand patterns or predict future trends from data gathered. What often results is statistical bias, a distortion of the true meaning of the numbers.
But cutting down the number of variables to make predictions is also problematic.
“You may have selected some variable that may not be important and you may have deleted some variables that are important,” says Ahmed.
“You’re trying to pick strong signals and you’re deleting the weak signals; individually they may not be important, but together, they could be useful.”
If done improperly, reducing a large data model to a small one will result in statistical bias, a distortion of the true picture, he says.
“In my research, we work with much larger models and we try to find smaller models with the best variables for prediction of the future,” says Ahmed. “My work is to reduce the bias that results in prediction error.”
And that work has taken Ahmed far. Earlier this year, he led the International Workshop on Perspectives on High-dimensional Data Analysis in Morocco. He regularly participates in worldwide conferences as an organizer, presenter or speaker and is on the boards of a number of statistical journals.
The 2017 book that Ahmed edited, Big and Complex Data Analysis: Methodologies and Applications, has been received well in the field, according to the publisher. He is set to publish two more volumes in 2019.
With funding from the Ontario Centres of Excellence and the Natural Sciences and Engineering Research Council of Canada (NSERC), Ahmed is applying his methods in a partnership with Stathletes, a Niagara-based company that collects, records and verifies hockey data.
The Brock-Stathletes partnership aims to expand its extensive database of performance measures in hockey through Ahmed’s research.
Since joining Brock University in 2012, Ahmed has developed statistical and computational strategies to reduce bias in particular statistical models, especially those used in genetics research.
“It’s important to be as accurate as possible,” says Ahmed. “The more you have statistical bias, the more errors come into our models. This could have serious implications for health care, business decisions and many other things.”