Using Text Features to Predict the Great Stock Market Crash of 1929

from bs4 import BeautifulSoup
from urllib.request import urlopen
# Define URL.
url = ""
# Send GET request and process result using BeautifulSoup.
html = urlopen(url)
soup = BeautifulSoup(
Citrus fruits in both Florida and California are reported to be making satisfactory progress.
Southern peach orchards, however, were damaged by the March freeze and crop prospects
have been somewhat impaired.
# Install and import pysentiment2.
!pip install pysentiment2
# Import modules.
import pysentiment2 as ps
import pandas as pd
import matplotlib.pyplot as plt
# Instantiate tokenizer for LM dictionary.
lm = ps.LM()
# Tokenize texts.
tokens = [lm.tokenize(t) for t in texts]
# Compute sentiment for each document.
sentiment = [lm.get_score(p)['Polarity'] for p in tokens]
# Convert to DataFrame and plot rolling mean.
sentiment = pd.DataFrame(sentiment, index = dates)
from sklearn.feature_extraction.text import CountVectorizer# Instantiate vectorizer.
vectorizer = CountVectorizer(max_features=100)
# Transform texts into tf-idf matrix.
tf = vectorizer.fit_transform([' '.join(token) for token in tokens])
# Recover feature names.
feature_names = vectorizer.get_feature_names()
# Plot word count time series.
feature_matrix = pd.DataFrame(tf.toarray(), columns = feature_names,
index = dates)
feature_matrix.plot(figsize=(15,7), legend = None)



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
ODSC - Open Data Science

ODSC - Open Data Science


Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.