visit
With Sir Isaac’s expression in my mind I thought what better place to start than existing research papers. I thought hopefully they’ll give me some unique knowledge that I can build up on when I write my own strategies.If I have seen further it is by standing on the shoulders of Giants. — Isaac Newton
How wrong I was.
This is part of a multi-part series, links below:
Around 6 months ago I stumbled across a that on the face of it seemed very promising. In short the technique goes something like this:
Now there isn’t really a clear mention in the paper as to if a wavelet transform is applied to just the close price, or to every input time series separately. They use the phrase “multivariate denoising using wavelet” which I’d assume to mean it was applied to every time series. To be safe I tried both methods.
Thankfully the issue starts to become quite apparent from here.I’m sure you’ve heard many times that whenever you’re normalising a time series for a ML model to fit your normaliser on the train set first then apply it to the test set. The reason is quite simple, our ML model behaves like a mean reverter so if we normalise our entire dataset in one go we’re basically giving our model the mean value it needs to revert to. I’ll give you a little clue, if we knew the future mean value for a time series we wouldn’t need machine learning to tell us what trades to do ;)
So back to our wavelet transform. Take a look at this line. sigma = mad(coeffs[-1],center=0) So we’re calculating the mean absolute deviation across the noisy coefficient. Then.. (pywt.threshold( i, value=uthresh, mode="soft") for i in coeffs[1:])We’re thresholding the entire time series with uthresh derived from our sigma value.
Notice something a little bit wrong with this? It’s basically the exact same issue as normalising your train and test set in one go. You’re leaking future information into each time step and not even in a small way. In fact you can run a little experiment yourself; the higher a level wavelet transform you apply, miraculously the more “accurate” your ML model’s output becomes. Using a basic LSTM classification model without WT will get you directional accuracy numbers just over 50%, but applying a WT across the whole time series will erroneously give you accuracy numbers in the mid to high 60's. I thought perhaps I’ve misinterpreted the paper. Perhaps what they did was apply the WT across each time step before feeding data into the LSTM. So, I tried that. Yep, accuracy dips below 50%. We don’t even need to go as far as the auto-encoder part to figure out a pretty huge mistake that’s been made here. We’re here though so we might as well finish up to be sure.
DisclaimerThis doesn’t constitute as investment advice. Seek advice from an authorised financial advisor before making any investments. Past performance is not indicative of future returns.