"All the models are wrong, but some are useful"

Hi, this is Liying!
I'm a Data Scientist with previous experience in economics and banking industry. I am passionate to wrestle with complex data and apply machine learning to tell stories and solve business problems.
I will soon graduate from my Master's of Science in Data Science (MSDS) at the University of San Francisco, where I have developed a strong programming and statistics skill set that can tackle business problems involving big data.
As a Data Scientist at Beam Solutions , my primary role is to assist the company to build infrastructures for our products and to help the company's clients to detect abnormal transactions based on machine learning unsupervised algorithms.

In the following part, I'd like to share some of my projects I completed so far that I found interesting.

Featured projects

Spark ML Air Quality Prediction

A data pipeline which processes, stores and models big data with distributed computing and distributed database and models based on distributed computing and spark ML.
Tools: AWS EMR, AWS S3, MongoDB, SparkSQL, SparkML, SparkSQL, PySpark

Machine learning

Prediction Purchases Behavior in App

A machine learning classification project that predicts the purchase behavior of app users.
Tools & Methods: AWS, Random Forest, XGBoost, LightGBM

Spam Classification

A predictive model to predict if an email is spam or not.

Methods: Adaboost, Gradient Boosting, XGBoost

Movie Recommendation

A recommendation model predicts the potential ratings of movie using embedding and matrix factorization.
Methods: item/user embedding, matrix factorization, stochastic gradient descent optimization

Statistics

Bankruptcy Rate Forecast with Time Series

Time Series forecast of Canadian bankruptcy rate with macroeconomic indicator.
Tools & Methods: R, Holt-Winters, SARIMA, VARX

Python applications & algorithms

TF-IDF

A Python tf-idf(frequency–inverse document frequency) project that computes the tf-idf scores and ranks the documents using the Glove data as corpus.
Tools & Methods: TF-IDF, Python, sikit-learn, xml, Glove data

Search Engine Implementation

An implementation for search engines using different methods for searching, including linear search, index search and hashtable search.
Tools & Methods: linear search, index search, hashtable search, html, jinja2

Twitter Sentiment Analysis

A craper and a sentiment analysis project using Twitter API to scrape Twitter text tweets and vaderSentiment.
Tools & Methods: tweepy, tweepy API, vaderSentiment, flask, html, jinja2

Isolation Forest Implementation

An Anomaly Detetion algorithm implementation based on the Isolation Forest methods using Python.
Tools & Methods: Python, object oriented programming

BBC Article Recommendation

An application based on Python to recommend BBC articles according to the similarity between articles.
Tools & Methods: Python, doc2vec, html, jinja2, BeautifulSoup, requests, flask

Deep learning

Hand Wring Recognition

A vanila version of Neural Network to classify digits from images. Trained on MNIST dataset.
Tools & Methods: PyTorch, MNIST dataset, Neural Network

NLP Text Classifcation

Three NLP models for text classification, including neural language model, skip gram model and CNN for text model.
Tools & Methods: PyTorch, Convolution Neural Network, Matrix Factorizaion