Student Project (Selected)
Aaron Lin, Bailey Marlow, Alex Dixon and Dae Wook Kim, "USA Used Car Price Prediction Using Machine Learning", 22th Posters-at-the-Capitol, Kentucky State Capitol, Frankfort, Kentucky, March 3, 2022.
The used car market is on the increase due to many economic factors. New car sale prices are set by the manufacturer, so their prices are consistent with their actual market value. However, prices in the used car market are set by the dealer. So their prices are not necessarily consistent with their market value. Because of this, a model that can accurately predict the actual market value of a used car would be useful for both the seller and the buyer. Our study pertains to analysis and predictive modeling of the dataset containing information on the online auction sales of vehicles in the United States on the site AuctionExports.com to predict the price of a vehicle sold in an online auction. With machine learning and statistics analysis, we can train the model to account for as many variables as there are in the dataset. We used visualization tools to examine how the other variables interacted with the auction sale price variable and to use statistical test to find which variables had a significant pull on the price. We found that factors such as brand, age, title status, base color, state, remaining hours, and mileage were the best predictors of price. We used linear regression models to fit the data to predict the price based on the selected factors, utilizing simple linear models, stepwise functions, and normalization techniques, random forest and neural networks. By comparing metrics such as root mean square error (RMSE) and R-Squared, this led us to find the most effective models, random forest and neural network resulted in a satisfactory prediction, in terms a model that achieved an r-squared value of .5 or more and a lower RMSE than other models. So, an approach to predicting used car sale prices would benefit by using machine learning.
Collin Crowthers, John, Booker, and Dae Wook Kim, "Using Machine Learning for Predicting Customer Engagement on Social Media", Virtual Kentucky Academy Annual Meeting, November 6, 2021.
In the business world, the ability to analyze and predict customer engagement—when a customer shares their semantic responses (e.g., opinions, and emotion) to the brands or its products on the social media—is becoming increasingly significant, maintaining current customers, finding potential customers, promoting products and service innovation is one of the best ways to ensure business success. For this purpose, we studied 15,842 brand-themed user-generated posts of Instagram to have better understand customer engagement behaviors for the popular posts in social media. We assessed how each of ten features (user comments, post descriptions, followers, brand categories, premium brands, brand relevance, brand commercials, post users, images, emoji) affect the machine learning prediction for customer engagement. The top 10 common machine learning models are applied to the proposed feature selection for predicting whether a customer likes or not a popular brand-themed user-generated post. Our findings indicate that the Deep Neural Network (DNN) model has the best recall (81%) of all of machine learning models, while also maintaining a high accuracy (90%). This high recall rate of the DNN model is important for the positive identification of popular posts in social media because identifying the maximal number of customers likely to engage in for the purpose of customer retention in a business environment increased recall is a good indicator of model fitness for online marketing applications.
Emoji-Comment Relationship and User's Brand Engagement in Social Media (John Booker, Thomas Morris, Jessica Haeckler, Eun Hee Ko, and Dae Wook Kim)
The purpose of the project is to analyze the impact of Emoji-Comment relationship on user's brand engagement by assessing the popularity of product related Instagram posts using emojis to determine the volume of likes and comments using data analytics and machine learning.
GPU-accelerated Malicious Domain Detection (Trevor Rice, Allen Roberts, Mengkun Yang, and Dae Wook Kim)
The project aims to develop a fully functional system that allows a user to upload data sets (malicious domains) into a local database and then run tests with single domains, or list of domains to discover if they are malicious or not. We have implemented speed tests that compare the computation times between the GPU and CPU and used a web crawler using Python Scrapy, that allows the user to scrape data domains from websites that have known malicious domains. We have also implemented most-widely-used five string matching algorithms which allow us to compare the speeds of different algorithms with varying time complexities against the number of domains both on the GPU and our sample.
Ethen Holzapfel, David Cannon II, Alex Dixon, Eun Hee Ko, and Dae Wook Kim, "Developing Web-based Predictive Application for Crowdfunding Campaigns", 20th Posters-at-the-Capitol, Kentucky State Capitol, Frankfort, Kentucky, March 5, 2020.
The popularity of crowdfunding campaigns like Kickstarter, GoFundMe, and Indiegogo has led to the rise of the campaign creators and backers to predict if the campaigns are likely to succeed prior to launching them via the Internet. Predicting a successful campaign which implies the funding goal is reached can have an important influence on the campaign project description including USD pledged, number of backers, goal amount, amount of days, and campaign category, etc. in the crowdfunding platform. Therefore, we studied 300,000 real campaigns of 2009-2016 Kickstarter in Kaggle competition to find the determinant features for the campaign’s success and applied a machine learning algorithm to develop a web-based application that enables a user to predict if a campaign is a successful or failed project. Our application can provide insights to creators and backers to better understand practical impact of a crowdfunding campaign.
Alex Dixon, Dae Wook Kim and Eun Hee Ko, "Efficient Clustering for User’s Brand Sentiments Analysis on Online Social Media", 54th National Collegiate Honors Council (NCHC) Annual Conference, New Orleans, Louisiana, November 6-10, 2019.
User-generated social media content that is related to brands can have considerable effects on public opinion about brands. Sentiment expressed by images can provide insight into the effects on users’ opinions. Understanding how this social media content can affect brand perception is desirable for companies and marketing teams. Convolutional Neural Networks (CNNs) are an effective way to extract features from images for tasks such as clustering. Therefore, to better understand sentiment expressed by images, we explore techniques for clustering images based on image features extracted by CNNs.
Corey Robinson and Dae Wook Kim, "MapReduce Design and Implementation of DNS Fingerprints for Transient User Identification", The University Presentation Showcase of Scholars Week, EKU, John Grant Crabbe Main Library, Richmond, April 12, 2019
The goal of the project is to use DNS data of an anonymous user’s internet browsing behavior and match this behavior to that of a previously characterized user. This research is done by extracting a set of features that are believed to be useful for identifying a user based on their internet browsing behavior and comparing an anonymous session to that of the known session to see if the browsing behavior of the anonymous session matches that of a user that is already in the dataset. This project focuses on the features engineering stage of the project as well as the feature extraction using MapReduce which is a programming model for big data processing.
Eun Hee Ko, Douglas Bowman, Sierra Chug, and Dae Wook Kim, "A Study of Chief Marketing Officer (CMO) Tenure with Competitive Sorting Model", In Proceedings of the ACMSE 2018 Conference, p.44. ACM, 2018.
With the average CMO tenure increasing significantly over the past decade, the business press has speculated about reasons for this climb while the academic literature has been relatively silent, remaining indecisive about the contributions of the CMO to firm performance. These mixed results have resulted in calls for more systematic inquiry into the performance consequences of the CMO. This proposal investigates factors associated with CMO tenure. It develops theory based on competitive sorting model whose the underlying intuition is that when the competitiveness of an individual's talents aligns with a firm's strategic directions job tenure increases.