Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forecasting in EVA #969

Merged
merged 19 commits into from
Sep 5, 2023
Merged

Conversation

americast
Copy link
Member

@americast americast commented Aug 26, 2023

Implemented standalone forecasting in EVA (using statsforecast package). You can run it via the following commands:

DROP TABLE IF EXISTS AirData;

CREATE TABLE AirData (
    unique_id TEXT(30),
    ds TEXT(30),
    y INTEGER);

LOAD CSV 'data/forecasting/air-passengers.csv' INTO AirData;

DROP UDF IF EXISTS Forecast;

CREATE UDF Forecast
FROM (SELECT unique_id, ds, y FROM AirData)
TYPE Forecasting
'predict' 'y';

SELECT Forecast(12) FROM AirData;

Here Forecast(12) signifies a horizon length of 12.

Thanks!

@xzdandy
Copy link
Collaborator

xzdandy commented Aug 27, 2023

Implemented standalone forecasting in EVA (using statsforecast package). You can run it via the following commands:

DROP TABLE IF EXISTS AirData;

CREATE TABLE AirData (
    unique_id TEXT(30),
    ds TEXT(30),
    y INTEGER);

LOAD CSV 'data/forecasting/air-passengers.csv' INTO AirData;

DROP UDF IF EXISTS Forecast;

CREATE UDF Forecast IMPL 'evadb/udfs/forecast.py';

SELECT Forecast(unique_id, ds, y) FROM AirData;

I plan to add more features to this. Tests and documentation are still pending.

Thanks!

Hi @americast, the design looks great. Is the idea that training will be implicit here? When the user runs SELECT Forecast(unique_id, ds, y) FROM AirData;, EvaDB will train and forecast underlying together. The rationale is that the trained forecast model can only apply to the same data source.

@xzdandy xzdandy added this to the v0.3.3 milestone Aug 27, 2023
@americast
Copy link
Member Author

Thanks @xzdandy for your review. As of now, yes the training is implicit. Since statsforecast trains really fast, that should be fine. However, if we were to incorporate DL-based forecasting into this, we might want to train explicitly in the background.

@gaurav274 gaurav274 modified the milestones: v0.3.3, v0.3.4 Aug 29, 2023
@jyotigoyal09
Copy link

Hi @americast, I see you are using Forecast(unique_id, ds, y) to forecast for a single time series data. I was wondering if you have a functionality to forecast for panel data. If yes, how would you make final model selection if groups in the panel do not have same seasonality/trend pattern.

@gaurav274
Copy link
Member

Hi @americast, I see you are using Forecast(unique_id, ds, y) to forecast for a single time series data. I was wondering if you have a functionality to forecast for panel data. If yes, how would you make final model selection if groups in the panel do not have same seasonality/trend pattern.

Hi Jyoti, Thanks for showing interest in evadb. We are in the early phases of adding forecasting. @americast @xzdandy Thoughts?

@americast
Copy link
Member Author

Hi @americast, I see you are using Forecast(unique_id, ds, y) to forecast for a single time series data. I was wondering if you have a functionality to forecast for panel data. If yes, how would you make final model selection if groups in the panel do not have same seasonality/trend pattern.

Hi @jyotigoyal09. Thanks for your interest! Having the functionality to forecast for panel data could be a very useful functionality. In case of miltivariate forecasting, we can find the lowest common seasonality and use the same. However, we could also focus our attention on various deep learning models that do not have the requirement for specifying a seasonality upfront, such as transformer-based models.

It would be great if we could discuss in detail about this in an issue.

@americast
Copy link
Member Author

@xzdandy I have pushed some commits and now the training is more like the Ludwig-style (#935). The training occurs when the UDF is created and the trained model is stored using a unique model file name generated from the data. Now, whenever the UDF is called, the user only needs to specify the horizon. I have updated the commands in #969 (comment) accordingly.

@xzdandy
Copy link
Collaborator

xzdandy commented Sep 1, 2023

Thanks for the contribution! The implementation looks good to me at high level.

We can add

  • Add a long integration test cases to verify it worked end-to-end.
  • Clean up the code (remove redundant print, redundant import)

@americast americast marked this pull request as ready for review September 2, 2023 09:19
@americast
Copy link
Member Author

Thanks for the contribution! The implementation looks good to me at high level.

We can add

  • Add a long integration test cases to verify it worked end-to-end.
  • Clean up the code (remove redundant print, redundant import)

@xzdandy Thanks. I have added the test and the docs.

Copy link
Collaborator

@xzdandy xzdandy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the tests and documentations. They look great.

Minor improvements.

  1. Fix the linter: bash script/test/test.sh -m LINTER
  2. Add statsforecast dependency in setup.py
  3. Install the statsforecast dependency in .cricle/config.yml, so the test will be run in long integration tests.

@americast
Copy link
Member Author

Thanks for adding the tests and documentations. They look great.

Minor improvements.

  1. Fix the linter: bash script/test/test.sh -m LINTER
  2. Add statsforecast dependency in setup.py
  3. Install the statsforecast dependency in .cricle/config.yml, so the test will be run in long integration tests.

Thanks. I have updated them. Please let me know if this looks all good.

Copy link
Collaborator

@xzdandy xzdandy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Verified the long integration test works on python3.10.

@xzdandy xzdandy merged commit 0f88555 into georgia-tech-db:staging Sep 5, 2023
@americast
Copy link
Member Author

Great work! Verified the long integration test works on python3.10.

Awesome, thanks!

jiashenC pushed a commit that referenced this pull request Sep 5, 2023
Implemented standalone forecasting in EVA (using
[statsforecast](https://nixtla.github.io/statsforecast) package). You
can run it via the following commands:

```sql
DROP TABLE IF EXISTS AirData;

CREATE TABLE AirData (
    unique_id TEXT(30),
    ds TEXT(30),
    y INTEGER);

LOAD CSV 'data/forecasting/air-passengers.csv' INTO AirData;

DROP UDF IF EXISTS Forecast;

CREATE UDF Forecast
FROM (SELECT unique_id, ds, y FROM AirData)
TYPE Forecasting
'predict' 'y';

SELECT Forecast(12) FROM AirData;
```
Here `Forecast(12)` signifies a horizon length of `12`.

Thanks!

---------

Co-authored-by: xzdandy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants