Stock Market Prediction using Deep Learning

If a human investor can be successful, why can’t a machine?

Machine Learning and deep learning have become new and effective strategies commonly used by quantitative hedge funds to maximise their profits.

This article will be an introduction on how to use neural networks to predict the stock market, in particular, whether to buy or sell your stocks and make the right investments.

Algorithmic trading has revolutionised the stock market and its surrounding industry. Over 70% of all trades happening in the US right now are being handled by bots. Gone are the days of the packed stock exchange with suited people waving sheets of paper shouting into telephones.

This got me thinking of how I could develop my own algorithm for trading stocks, or at least try to accurately predict them.

Finance is highly nonlinear and sometimes stock price data can even seem completely random. Traditional time series methods such as ARIMA ,SARIMA and GARCH models are effective only when the series is stationary, which is a restricting assumption that requires the series to be preprocessed by taking log returns (or other transforms). However, the main issue arises in implementing these models in a live trading system, as there is no guarantee of stationarity as new data is added.

This is combated by using Neural Networks(sequential models like LSTM,GRU etc.), which do not require any stationarity to be used. Furthermore, neural networks by nature are effective in finding the relationships between data and using it to predict (or classify) new data.

Machine learning and Deep Learning have found their place in the financial institutions for their power in predicting time series data with high degrees of accuracy and the research is still going on to make the models better.

Agenda of our project :

  • Firstly, we will perform web-scraping on NIFTY 50 wiki page for data collection. We will scrape the ticker symbols for all the companies listed in NIFTY 50 list .
  • Then we will use Quandl API to fetch stock data for past 7 years.
  • NOTE : To get API key create an account on Quandl website. Obtaining API Key lets you make more than 50 API calls in a day.
  • Label training data as 0(sell) and 1(buy)
  • Scale data using sklearn preprocessing library
  • Since it is a time series data and we will be creating sequences out of the data fetched, it is better to build LSTM model than a simple MLP.

Let’s hop into the code to understand it better.

  • Web-Scraping NIFTY 50 wiki page :

In the figure below , we need to scrape the ticker symbols(underlined in red) for all respective companies .

Viewing few data
  • Now we are able to fetch the ticker symbols :
Displaying ticker symbols of 50 companies
  • To fetch stock prices from Quandl and then storing them in CSV file to avoid making duplicate calls to Quandl API.
Left : CSV files generated ; Right : Example of a CSV generated
  • Let’s make a DataFrame from the files generated .
  • We are interested in only 2 columns : Stock Closing amount and Total Trade Quantity.
DataFrame we needed
  • SERIES_LENGTH : We are going to club the data in 30 days brackets. eg. day1-day30, day2-day31, day3-day32.. so on.. using this data we will predict the future Data
  • PREDICT_LENGTH : From today what will be prediction after 7 days.
  • df[“nifty_future_price”] : This will contain data of df[“NIFTY_50_Close”] after 7 days.
  • say, df[“NIFTY_50_Close”] is from day1-day7, then df[“nifty_future_price”] will have data from day8-day14 so on
  • This is done to generate sequential data, so that it has some dependency on its past data
  • Like this we will get NaNs at the end of df[“nifty_future_price”] , do we will delete them.
  • We will use sell=0 and buy=1 as labels
  • if df[“nifty_future_price”] > df[“NIFTY_50_Close”]===> then buy(1)
  • if df[“nifty_future_price”] < df[“NIFTY_50_Close”] ===> then sell(0).
  • Then, we will count the sells and buys to balance the data
    Algorithm : whichever count is less, we will take up the data upto that.
  • Finally, generate data and labels as numpy arrays .
  • Train-Test Split : We will do a 80:20 split.
Train-Test Split
  • Shapes after split :
  • Defining and Training our LSTM model architecture : It is a binary classification task. We need to predict whether to buy or sell for future data points.
  • Plotting the predictions and test set : I trained it for less epochs, so the results are not that great but ok’ish.
  • Results : Validation Accuracy :60% and Validation Loss : 0.78
  • This can surely be improved by better architecture design and running for more epochs and better tuning all the hyperparameters.
  • Machine learning is constantly evolving with new methods being developed every day. It is crucial that we update our knowledge constantly and the best way to do so is by building models for fun projects like stock price prediction. Although the LSTM model above is not good enough to be used in live trading, the foundations built by developing such a model can help us build better models that might one day be used in our trading system.
  • Neural networks are very adept at predicting time series data, and when coupled with sentiment data, can really make a practical model. Although the results here were impressive, I am still finding ways to improve it, and maybe actually develop a full trading strategy from it.

Where can you find my code?

References :

Data Scientist