Frank Novak
Innovative and self-propelling data professional eager to tackle today’s important data based hurdles. With award winning, out-of-the-box problem solving and proven quality control acumen, I strive to bring reliability, clarity, efficiency, and solutions to the table.
Projects
Sommeliers and BERT: Grape Variety Classification
Using BeautifulSoup, I scraped 20,000+ Sommeliers wine reviews from winemag.com as a means to classify 10 different grape varieties. Tensorflow’s wrapped pre-trained BERT model was employed with 90% accuracy and used as a prototype Streamlit app to recommend wine bottles based on user imputed tasting notes and price range.
NIJ Recidivism Forecasting Group Project
Processed and analyzed parole data using Pandas to classify persons with a high risk of recidivism. Used SKLearn machine learning models and Keras neural nets to optimize Brier Score and predict high risk features of individuals.
Reddit API and NLP Analysis
Collected and preprocessed text data using Redshift API, NLTK, Pandas, NumPy, and SKLearn Text vectorizers. Built machine learning classification models with 95% accuracy that predicted subreddits based on vocabulary and sentiment analysis values.
Ames Housing Price Prediction
Cleaned and preprocessed 100+ home features from Ames, Iowa home sales data using SKLearn. Whittled down collinearity using a custom variance inflation factor function to enhance modeling. Grid-searched regression models to determine the most important home features and identify most profitable regions for investment.