Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[<Library component: Models|Core|etc...>] Inconsistency in fit/predict API? #1200

Open
kevinleahy-switchdin opened this issue Nov 14, 2024 · 0 comments

Comments

@kevinleahy-switchdin
Copy link

kevinleahy-switchdin commented Nov 14, 2024

Description

Hi, I had a general question comment - and also want to check my own understanding.

  • In NeuralForecast.fit, we can pass in a df, which can be a dataframe of various timeseries, including exog variables.
    • If there are any future exogenous variables, this is handled internally by self.futr_exog_list.
    • The dataframe gets converted to a TimeSeriesDataset in self._prepare_fit and pass this to model.predict.
    • Similarly, we can pass in a list of file paths, in which case the above process happens for each individual file through LocalFilesTimeSeriesDataset.
  • However, for NeuralForecast.predict, we must pass in separate dfs - one for historical variables (df) and one for future exogenous variables (futr_df).
    • Internally, in the predict function, we once again call prepare_fit, but this time just for the "historical" df, and get a TimeSeriesDataset.
    • We then append the futr_df to this TimeSeriesDataset, and pass this to model.predict.

The problem with having two different ways of doing this is that it makes it more difficult if we want to make any changes in the library for both. For example, we can't just use the same logic used in fit to allow a LocalFilesTimeSeriesDataset to be accepted for a list of filepaths in predict. Furthermore, in predict, each "unique id" must refer to one single "sample", as opposed to a timeseries where we can predict over sliding windows of that timeseries.

Another downside is that when creating data, we must reshape it for predict (split out into df and futr_df) rather than using the same data generation process for both training and evaluation or live data.

If we had both interfaces aligned, then ideally we could pass in the same data-related arguments for both so that ultimate the same "samples" would be passed to the underlying fit() and predict() methods.

Looking for thoughts/feedback on the above - cheers!

Use case

No response

@kevinleahy-switchdin kevinleahy-switchdin changed the title [<Library component: Models|Core|etc...>] [<Library component: Models|Core|etc...>] Inconsistency in fit/predict API? Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant