-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Something wrong in train/valid split #10
Comments
also that, looking in the dataset, all i can guess is that maybe the data is normalized including all train/valid/test data. isn't there some risk of looking in to the future? |
i tried something like this: validation performance is not as good as the original code could give. (as for the NASDAQ dataset) |
i also tried strict rolling-window normalization (within each lookback_length + steps window to ensure no future-data leak) , which also make things worse. |
I think the reasonable way are: |
Hello,
the author make a list of offsets for training and shuffled it
batch_offsets = np.arange(start=0, stop=valid_index, dtype=int)
...
np.random.shuffle(batch_offsets)
after that, to avoid overlapping with validation data, the author do this.
for j in range(valid_index - lookback_length - steps + 1):
....batch_offsets[j]....
but since batch_offsets is shuffled, it's likely some offset belonging to the valid data will be taken in the training
The text was updated successfully, but these errors were encountered: