RiskEx-News and Their Effect on Bitcoin Price

Toward a Better RSS Recomendation System

Introduction and Motivation

News events continue to be of outmost importance during the price discovery process of any asset. Some news events -- such as press releases -- have been found to represent a potential explanation for up to 24% of an asset's price movements. We seek to understand and classify what causes a news event to have a statistically significant effect on price. In particular, we aim to implement machine learning (ML) processes, and Natural Language Processing (NLP) techniques to extract what traders intuitively know when they choose which news articles/publications/news-beats to listen to, and consequently base their trading decisions on.

The following represent a collaborative project for identifying, processing, predicting, and recommending relevant information from news events to sophisticated traders.

Methodology

For this project we decided to split our overall scope into three steps.

I. Data Collection

1) Web Scraping and API utilization for pulling news articles
2) IMetadata Extraction from the articles
3) Preprocessing and cleaning the articles
4) Collecting and cleaning Bitcoin price data

II. Data Processing

1) Filtering of relevant news content with relevant keywords
2) Preprocessing of content for modeling purposes (vectorization, text cleaning etc.)

III. Analysize and Classify

1) Perform Clustering Analysis on articles
2) Classify events based on sentiment analysis
3) Identify significant price movements through event study methodology
4) Establish a causal relationship between a news event and its effect on the price of Bitcoin

Results and Findings

Next Steps

Further steps will involve the development of a learning recommender system built around both supervised and unsupervised learning methods. In particular, based on the foundation established by our team, we believe the following projects could be implemented to increase the strength and capacity of our model.

1. Acheive better granularity of data - create series of crawlers that extract the desired granular data.
   1.a Examples include: pricing data, Google Trends data, etc.  
2. Create two tiers of data acquisition processes.
   2.a Broad spectrum crawlers should be scheduled to stay up to data with current events, while minimizing load. 
   2.b Event driven crawlers should be developed around predictive modeling and/or event detection when a significant price movement occurs.
3. Implement a predictive pricing model for asset classes. 
   3.a Use pricing predictions to trigger searches for relevant articles, and evaluate pricing and recommender models.
   3.b Predictive pricing models could also be used for arbitrage purposes, and optimization of business processes.
4. Implement a distributive compuatition architecture
   4.a In order to implement just-in-time analysis, future team will nead to export processes to cloud.
5. Increase the quality of prediction by instituting a magnitude/directional component to our time series modeling.

Start Here:

Python was used for all data gathering, cleaning, and modeling purposes.

Multiple python notebooks (Jupyter) were written in order to carry out project. To execute our procedure follow the following in order:

Scrapping (in order):

news_to_features(1o3)

url_to_contents(2o3)

many_df_to_one(3o3)
Time Series Modeling and Event Detection (in order):

price_change_marker

filter_news_by_marker
Natural Language Processing and Formulations (in order):

clean_articles_final(1o2)

clean_articles_final(2o2)
Classificatin and Modeling:

Word2Vec_and_KMeans_clustering

Final_Classification

A brief consideration:

The majority of labor associated with this project was consumed by data gathering and cleaning. Should you choose to expand on our efforts, please reach out to any member directly and we can share the preprocessed and cleaned data sets we used to train our models. Or, if you find a better, faster way to get and clean data, go with that (and let us know!)

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Archive		Archive
Background and Resources		Background and Resources
Data		Data
Meeting_Notes		Meeting_Notes
Notebooks		Notebooks
.Rhistory		.Rhistory
Final Classification.ipynb		Final Classification.ipynb
README.md		README.md
Word2Vec, KMeans clustering.ipynb		Word2Vec, KMeans clustering.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RiskEx-News and Their Effect on Bitcoin Price

Toward a Better RSS Recomendation System

Introduction and Motivation

Methodology

I. Data Collection

II. Data Processing

III. Analysize and Classify

Results and Findings

Next Steps

Start Here:

A brief consideration:

About

Releases

Packages

Languages

Vitozhu04/RiskEx

Folders and files

Latest commit

History

Repository files navigation

RiskEx-News and Their Effect on Bitcoin Price

Toward a Better RSS Recomendation System

Introduction and Motivation

Methodology

I. Data Collection

II. Data Processing

III. Analysize and Classify

Results and Findings

Next Steps

Start Here:

A brief consideration:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages