Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to OpenAI v1.0 #3

Merged
merged 69 commits into from
Nov 22, 2023
Merged

Migrate to OpenAI v1.0 #3

merged 69 commits into from
Nov 22, 2023

Conversation

a0x8o
Copy link

@a0x8o a0x8o commented Nov 20, 2023

Remove empty evadb.db file.
Move github test into long intergration test so they are run on the
Clean up test_relational_api.py
Update link in the github data source documentation
Fix doc links
Merge remote-tracking branch 'origin/roadmap' into staging
checkpoint
checkpoint
Merge remote-tracking branch 'origin/fix_staging' into staging
docs: Update README.md
docs: Update README.md
Add the validation score and training time for create_function in XGB…
Add test_eva_db to gitignore (georgia-tech-db#1336)
Fix georgia-tech-db#1333 dependency and CMD in DockerFile (georgia-te…
CREATE INDEX IF NOT EXISTS is broken. (georgia-tech-db#1337)
Support semicolon and escaped strings in lark (georgia-tech-db#1339)
feat: third party app support in EVADB (georgia-tech-db#1033)
[WIP] Improving error handling messages for Custom Functions (georgia…
logging an error message for invalid files while loading (georgia-tec…
Verified that issue georgia-tech-db#1067 is resolved and added docume…
Add train scores for ludwig in the create function handler. (georgia-…
Make Logic Operators Case Insensitve (georgia-tech-db#1352)
Job scheduler implementation (georgia-tech-db#1308)
Fix python3.8 failing testcases due to type hint (georgia-tech-db#1364)
Add feedback for forecasting (georgia-tech-db#1258)
Adding changes for Flaml Sklearn integration (georgia-tech-db#1361)
Migrate ChatGPT function to openai v1.0 (georgia-tech-db#1368)

xzdandy and others added 30 commits October 4, 2023 02:48
Bump Version to v0.3.9+dev

---------

Co-authored-by: Jiashen Cao <[email protected]>
Co-authored-by: Gaurav Tarlok Kakkar <[email protected]>
This PR aims to solve the following issues:

- [x] Throwing error when non-numeric characters are in the data
(partially fixes #1243)
- [x] Math domain error with `statsforecast`.
- [x] Fix GPU support for `neuralforecast`.
- ~Neuralforecast support for directly using batched data.~
- ~Auto frequency determination ( #1279).~

Will create separate PRs for the last two points.

---------

Co-authored-by: Andy Xu <[email protected]>
1. Removed `config.yml` file. Users can directly use `SET` command. 
2. Moved `OPENAI_KEY` to `OPENAI_API_KEY`

---------

Co-authored-by: hershd23 <[email protected]>
Co-authored-by: Andy Xu <[email protected]>
Please suggest if this feature needs more test cases

---------

Co-authored-by: Lohith K S <[email protected]>
We shall add XGBoost classification support in EVADB.

---------

Co-authored-by: Jineet Desai <[email protected]>
Co-authored-by: Andy Xu <[email protected]>
Profiling on Vector Scan showed that we are spending a lot of time in
the post-processing logic doing a Nested Join. This is an initial commit
to change that into a Join using Pandas. Change showed ~50% improvement
in Similarity Queries.
Moved Function Expression Binder to a separate file
Integrated Milvus vector store into EvaDB. Added a `MilvusVectorStore`
class and Milvus type for query parsing and execution.
Below are environment values for the use of the Milvus index:

* `MILVUS_URI` is the URI of the Milvus instance (which would be
http://localhost:19530 when running locally). **This value is required**
* `MILVUS_USER` is the name of the user for the Milvus instance.
* `MILVUS_PASSWORD` is the password of the user for the Milvus instance.
* `MILVUS_DB_NAME` is the name of the database to be used. This will
default to the `default` database if not provided.
* `MILVUS_TOKEN` is the authorization token for the Milvus instance.

---------

Co-authored-by: Andy Xu <[email protected]>
Example notebook added for XGBoost regression and classification.

---------

Co-authored-by: Jineet Desai <[email protected]>
xzdandy and others added 29 commits October 28, 2023 19:04
…oost (#1327)

Let us show the validation score and training time for the XGBoost
AutoML model trained. This shall give us fair enough idea on how the
model trained on the training data set.

---------

Co-authored-by: Jineet Desai <[email protected]>
The environment created in the setup instructions in the documentation
calls the environment `test_eva_db`
Update the DockerFile in order to resolve dependency issues along with
fixing the invalid CMD that was previously passed in.
This PR fixes an issue in CREATE INDEX IF NOT EXISTS command wherein if
'IF NOT EXISTS' is passed, we had an unreferenced variable issue. Added
Unit Tests to check the correctness of both the cases.

Also reverted the index changes while merging dataframes after vector
scan, as it's failing for some cases where indexes can be undefined.
Support semi-colons in string literals for queries of the form:
```
"""SELECT ChatGPT("Here's a; question", "This is the context") FROM TAIPAI;"""
```

Also support string escape to run ChatGPT queries more easily:
```
"""SELECT ChatGPT('Here\\'s a question', 'This is the context') FROM TAIPAI;"""
```
This PR introduces a generic interface to support 3rd party apps in
EVADB. As an example. the template for integrating slack has been added.
In a subsequent PR the integration with slack will be completed.

---------

Co-authored-by: Gaurav Tarlok Kakkar <[email protected]>
Co-authored-by: Joy Arulraj <[email protected]>
Co-authored-by: Joy Arulraj <[email protected]>
Co-authored-by: Kaushik Ravichandran <[email protected]>
Added separate error handling for ModuleNotFoundError and
FileNotFoundError
	modified:   evadb/utils/generic_utils.py
Issue - [721](#721)

Currently, we abort the entire process when the load executor encounters
a corrupted file.
…d pdf functionality. (#1343)

Issue #1067 about not being able to load pdf files, was verified to be
working with evadb documentation pdf and a new page for loading pdf is
added to the documentation.
<img width="1310" alt="Screenshot 2023-11-07 at 1 33 01 AM"
src="https://github.com/georgia-tech-db/evadb/assets/32676813/af2fa40b-c8c1-4f3d-b93f-98d0bf278a5b">

Co-authored-by: Lohith K S <[email protected]>
In the previous commit, we added the changes for displaying the train
scores and train times for XGBoost. We plan to add similar changes to
Ludwig integration as well.

---------

Co-authored-by: Jineet Desai <[email protected]>
Co-authored-by: Andy Xu <[email protected]>
- Fix the following queries:

```
SELECT * FROM postgres_data.home_rentals where neighborhood='downtown' and number_of_rooms=2;
```

- Improve the error message: Instead of throwing arbitrary mask error,
now we raise `Unsupported Logical Operator: ...`.
This PR adds support for creating and dropping jobs in evadb based on
this [task](#1248).

1.  Jobs can be created using the create job query:

   
> CREATE JOB {job_name} AS {
>             {job_queries; ...}
>     }
>     START {start_time}
>     END {end_time}
>     EVERY {repeat_period} {repeat_unit}

2. Created jobs can be dropped using:

> DROP JOB {job_name}

3. The scheduled jobs will only be triggered if the job scheduler
process is started explicitly using:

> EvaDBConnection.start_jobs()

4. The job scheduler process can be stopped using:

> EvaDBConnection.stop_jobs()

---------

Co-authored-by: Gaurav Tarlok Kakkar <[email protected]>
Provide feedback when `Forecasting` UDF is called in the following ways:

- [x] Reporting confidence intervals
- [x] Returning a metric for the forecasting performance.
- [x] Providing suggestions in simple special cases, such as during Flat
predictions.

Eg:
```sql
SELECT HomeForecast();
```

```
SUGGESTION: Predictions are flat. Consider using LIBRARY 'neuralforecast' for more accrate predictions.
```

Partially fixes #1257 and #1243.

---------

Co-authored-by: Andy Xu <[email protected]>
Flaml provides support for Sklearn models like Random Forests, KNN,
Extra Trees Regressor, and Logistic Regression with regularization. We
plan to integrate these ML models into EVADB.
Link for Flaml documentation:
https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML

---------

Co-authored-by: Jineet Desai <[email protected]>
Migrate ChatGPT function to openai v1.0.

The test is skipped in circleCI because we must supply the
`OPENAI_API_KEY`. The test passes on local machine.

- [x] Upgrade ChatGPT function.
- [x] Upgrade Dall-e function.
- [x] Update unit test cases.
- [x] Verify that notebooks work correctly.
@a0x8o a0x8o merged commit 8a8fc46 into alexxx-db:master Nov 22, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issues with EvaDB dockerfile.