ML approach for Bitcoin swing trading — Part 3— Classification with Trend Labelling | by Omer Cohen | Feb, 2022 | Medium

admin

Omer Cohen · Follow Feb 22 · 5 min read We aim to establish a swing trading strategy for Bitcoin (BTC) cryptocurrency which performs better than buy-and-hold (HODL) and random-walk based strategies.Final model should output a keep/stay-out signal 12 hours. The main metric we are looking at is the final return on investment (ROI) in…

Omer Cohen · Follow Feb 22

· 5 min read

We aim to establish a swing trading strategy for Bitcoin (BTC) cryptocurrency which performs better than buy-and-hold (HODL) and random-walk based strategies.Final model should output a keep/stay-out signal 12 hours.

The main metric we are looking at is the final return on investment (ROI) in our validation and test data.The models were also evaluated with classical classification metrics e.g., recall and precision as methods to further tune candidate models.For example, we aim not to lose money, therefore, we would like the “stay-out” predictions (class 0) to have higher recall.Time-series forecasting models were evaluated with Root Mean Squared Error (RMSE).

We evaluated performance on 3 different validation and test sets:

Diverse market (1-Jan-21–1-Jul-21) — market the contains both bearish and bullish periods

Bearish market (15-Mar-21–1-Jul-21)

Bullish market (15-Jul-21–1-Nov-21)

Hold-out test set (1-Nov-21–15-Feb-22)

If you haven’t read part 1 where we tried to accomplish a trading strategy using time series forecasting — we recommend you to check it out

Part 1 — Time Series Forecast

For part 2 where we transitioned the problem into time-series classification with naive labelling and later collected some ROI results for baseline models— check it out

Part 2 — Classification with Naive Labelling

Trend-based Labelling We tested a different labelling method relying on the following article:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7597331/pdf/entropy-22-01162.pdf

We were able to label periods of trends based on the sensitivity parameter ”w” .We implemented the labelling algorithm with w=0.3.

w is the fluctuation index, used to determine the sensitivity to fluctuations in price.In volatile assets like crypto it should be higher than a less volatile assets such as stock market indices.

See our labelling per period:

Remark : Choice of w was done in the same manner as the choice of window size λ in part 2

w=0.3 mean return is the largest of all tested w’s Remark regarding Data leakage: when labelling, we are considering past and future, meaning labelling at time t is depending on t+i, in order to border the trends.However, this is auxiliary phase which doesn’t cause any leakage in predictions as we use it on historical data, and it is only used for training phase — ideal case would be to hold value when label is 1 and stay out when label is 0.

Labels distribution:

Creating a random walk baseline model here is tricky — labelling can only be done when there is a change as big as w — which explains the white region in the graph at end of the period.

Model Selection In order to find best models, we performed Cross Validation (CV) and looked at returns on all 3 validation sets.XGBoost showed better results over SVM, LogisticRegression and ADABoost — therefore we decided to tune its hyperparameters — focusing on n_estimators, min_child_weight, max_depth and scale_pos_weight.

Remark regarding scale_pos_weight hyperparameter of XGBoostClassifier : It gives some control of weighting of classes.In order to increase recall of class 0 (not holding) even on expense of precision of this class.We believe that False Positive is the worst scenario as we do not want to hold coin while its price drops.Also, it helps with unbalanced dataset as BTC price increased dramatically since COVID-19 outbreak (2020 was almost totally in uptrend with w=0.3).

Best value found for this parameter was 0.2 — acquired with cross-validation.

Same as previous part — we compared returns of 3 validation sets: diverse, bearish, bullish periods.

ROI for the different validation periods XGEnsemble — soft voting of ensemble consists 24 XGBoost classifiers, different permutations of multiple hyperparameters.

Validation Results Discussion

We can see that even though we are not able to beat a bullish market, we lose less in bearish periods and that allows us to outperform the market in a diverse (swinging periods).Based on the results table — we decided select the XG Ensemble.

Investing with this model in the entire year of 2021 would have yielded ROI of 134.5% compared to 59.8% of buy-and-hold strategy.

White regions represent areas where our model predicts fluctuating keep/stay-out signals Holdout Test Set Results Now it’s time to test our chosen model on our holdout set:

Our trading strategy lost -1.76% while the price of bitcoin at tested period dropped in -27.93%, nice.

Summary To summarize —

we examined 3 ML approaches to tackle the problem of beating the buy-and-hold strategy— time-series forecasting, time-series classification with naive labelling and classification with trend-labelling.The latter seems to outperform buy-and-hold for the longer, diverse terms.It will probably won’t beat a monotonic bullish market.Our model is lagging, as can be seen in above results plots— don’t expect to buy at lowest and sell at highest, do expect to ride trends.We also did not take into account trading fees which will reduce the ROI.Stop-loss can also be added to strategy and might improve results.

Future Work Turn this problem into multivariate by including more features which might be helpful in predicting trading strategy.Originally, we planned to add sentiment data from social networks.However, in our short time we weren’t able to acquire enough reliable data.

Explore other cryptocurrencies .Option: use bitcoin price as additional predictor for the trading strategy.Deploy automated trading bot on an exchange and track its ROI.This project is our final project in Israel Tech Challenge — Data Science Oct 21′ cohort..

Leave a Reply

Next Post

The Week Ahead – Central Banks, U.S Nonfarms, and Geopolitics in Focus

On the Macro It’s a particularly busy week ahead on the economic calendar , with 80 stats due out through the week ending 4 th March.In the week prior, 57 stats had been in focus. For the Dollar: ISM Manufacturing and Non-manufacturing PMI, ADP nonfarm employment change, and initial jobless claims will draw attention Monday…
The Week Ahead – Central Banks, U.S Nonfarms, and Geopolitics in Focus

Subscribe US Now