Articles from:April 2023

  • Raymond Romaniuk Masters Project Presentation Friday April 21 at 2:00 PM

    Raymond Romaniuk, a Master of Science candidate in the Department of Mathematics and Statistics, will present his Masters Research Project (STAT 5P99) titled Combatting Imbalanced Data with the Introduction of Synthetic Data with Applications in College Basketball on Friday, April 21, 2023 from 2:00 pm – 3:00 pm in-person in MCJ 404.

    Abstract:

    Data imbalance is an important consideration when working with real world data. Over/undersampling approaches allow us to gather more insight from the limited data we have on the minority class; however, there are many proposed methods. The goal of our study is to identify the optimal approach for over/undersampling to use with Adaptive Boosting (AdaBoost). Based on a simulation study, we’ve found that combining AdaBoost with various sampling techniques provides an increased weighted accuracy across classes for progressively larger data imbalances. The three Synthetic Minority Oversampling Technique’s (SMOTE) and Jittering with Over/Undersampling (JOUS) performed the best, with the JOUS approach being the most accurate for all levels of data imbalance in the simulation study. We then applied the most effective over/undersampling methods to predict upsets (games where the lower seeded team wins) in the March Madness College Basketball Tournament.

    Keywords: Imbalanced data, Boosting Methods, AdaBoost, Over/Undersampling, College Basketball

  • Brittany Perry Masters Project Presentation Friday April 21 at 1:00 PM

    Brittany Perry, a Master of Science candidate in the Department of Mathematics and Statistics, will present her Masters Research Project (STAT 5P99) titled Boosting Methods for Classification with Small Sample Size on Friday, April 21, 2023 from 1:00 pm – 2:00 pm in-person in MCJ 404.

    Abstract:

    AdaBoost is an ensemble method that can be used to boost the performance of machine learning algorithms by combining several weak learners to create a single strong learner. The most popular weak learner is a decision stump (low depth decision tree). One limitation of AdaBoost is its effectiveness when working with small sample sizes. This work explores variants to the AdaBoost algorithm such as Real AdaBoost, Logit Boost, and Gentle AdaBoost. These variants all follow a gradient boosting procedure like AdaBoost, with modifications to the weak learners and weights used. We are specifically interested in the accuracy of these boosting algorithms when used with small sample sizes. As an application, we study the link between functional network connectivity (as measured by EEG recordings) and Schizophrenia by testing whether the proposed methods can classify a participant as Schizophrenic or healthy control based on quantities measured from their EEG recording.

    Keywords: AdaBoost , decision trees, small sample size, gradient boosting, Schizophrenia