Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: updated overview and usecases #951

Merged
merged 22 commits into from
Aug 25, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/_static/js/top-navigation.js
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ topNavContent.append(topNavContentLeft);

topNavContentRight = document.createElement("div");
topNavContentRight.setAttribute("class", "image-header");
topNavContentRight.innerHTML = "<a href='https://github.com/georgia-tech-db/evadb'><img class='icon-hover' src='_static/icons/github.png' width='25px' height='25px'></a> <a href='https://join.slack.com/t/eva-db/shared_invite/zt-1i10zyddy-PlJ4iawLdurDv~aIAq90Dg'><img class='icon-hover' src='_static/icons/slack.png' width='25px' height='25px'> </a><a href='https://twitter.com/evadb_ai'> <img class='icon-hover' src='_static/icons/twitter.png' width='25px' height='25px'> </a>"
topNavContentRight.innerHTML = "<a href='https://github.com/georgia-tech-db/evadb'><img class='icon-hover' src='https://raw.githubusercontent.com/georgia-tech-db/evadb/master/docs/_static/icons/github.png' width='25px' height='25px'></a> <a href='https://join.slack.com/t/eva-db/shared_invite/zt-1i10zyddy-PlJ4iawLdurDv~aIAq90Dg'><img class='icon-hover' src='https://raw.githubusercontent.com/georgia-tech-db/evadb/master/docs/_static/icons/slack.png' width='25px' height='25px'> </a><a href='https://twitter.com/evadb_ai'> <img class='icon-hover' src='https://raw.githubusercontent.com/georgia-tech-db/evadb/master/docs/_static/icons/twitter.png' width='25px' height='25px'> </a>"

topNavContent.append(topNavContentRight)

Expand Down
11 changes: 6 additions & 5 deletions docs/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,23 +4,24 @@ parts:
- caption: Overview
chapters:
- file: source/overview/getting-started
- file: source/overview/faq
- file: source/overview/concepts
# - file: source/overview/faq

- caption: Use Cases
chapters:
- file: source/usecases/image-classification.rst
title: Image Classification
- file: source/usecases/similar-image-search.rst
title: Image Search [FAISS]
- file: source/usecases/qa-video.rst
title: Q&A from Videos [ChatGPT + HuggingFace]
- file: source/usecases/02-object-detection.ipynb
title: Object Detection
- file: source/usecases/03-emotion-analysis.ipynb
title: Emotion Analysis
- file: source/usecases/07-object-segmentation-huggingface.ipynb
title: Image Segmentation [HuggingFace]
- file: source/usecases/similar-image-search.rst
title: Image Search [FAISS]
- file: source/usecases/qa-video.rst
title: Q&A from Videos [ChatGPT + HuggingFace]


- caption: User Reference
chapters:
Expand Down
6 changes: 3 additions & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ Getting Started
.. raw:: html

<div class="grid-container">
<a class="no-underline" href="source/overview/getting-started" target="_blank"> <div class="info-box" >
<a class="no-underline" href="source/overview/getting-started.html" target="_blank"> <div class="info-box" >
<div class="image-header" style="padding:0px;">
<img src="_static/icons/code.png" width="44px" height="44px" />
<h3 style="font-size:20px;">Learn basics</h3>
Expand All @@ -54,7 +54,7 @@ Getting Started
<p class="only-dark" style="color:#FFFFFF;">Understand how to use EvaDB to build AI apps.</p>
<p style="font-weight:600;">Learn more > </p>
</div> </a>
<a class="no-underline" href="source/overview/concepts" target="_blank">
<a class="no-underline" href="source/overview/concepts.html" target="_blank">
<div class="info-box" >
<div class="image-header" style="padding:0px;">
<img src="_static/icons/download.png" width="44px" height="44px" />
Expand All @@ -74,7 +74,7 @@ Getting Started
<p class="only-light" style="color:#000000;">Have a question? Join our Slack community.</p>
<p class="only-dark" style="color:#FFFFFF;">Have a question? Join our Slack community.</p>
<p style="color:#515151;"></p>
<p style="font-weight:600;">Open the notebook></p>
<p style="font-weight:600;">Support > </p>
</div></a>
</div>

Expand Down
95 changes: 91 additions & 4 deletions docs/source/overview/concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,100 @@
Concepts
=========

These are some high-level concepts related to EvaDB.
These are some high-level concepts related to EvaDB. If you still have questions after reading this documents, ping us on `our Slack <https://join.slack.com/t/eva-db/shared_invite/zt-1i10zyddy-PlJ4iawLdurDv~aIAq90Dg>`__!


Quickly build AI-Powered Apps
---------------------------------

EvaDB supports a simple SQL-like query language designed to make it easier for users to leverage AI models. It is easy to chain multiple models in a single query to accomplish complicated tasks with minimal programming.

Here is an illustrative EvaDB app for ChatGPT-based question answering on videos. The app loads a collection of news videos into EvaDB and runs a query for extracting audio transcripts from the videos using a HuggingFace model, followed by question answering using ChatGPT.

.. code-block:: python

# pip install evadb and import it
import evadb

# Grab a evadb cursor to load data and run queries
cursor = evadb.connect().cursor()

# Load a collection of news videos into the 'news_videos' table
# This command returns a Pandas Dataframe with the query's output
# In this case, the output indicates the number of loaded videos
cursor.load(
file_regex="news_videos/*.mp4",
format="VIDEO",
table_name="news_videos"
).df()

# Define a function that wraps around a speech-to-text (Whisper) model
# After creating the function, we can use the function in any future query
cursor.create_function(
udf_name="SpeechRecognizer",
type="HuggingFace",
task='automatic-speech-recognition',
model='openai/whisper-base'
).df()

# EvaDB automatically extract the audio from the video
# We only need to run the SpeechRecognizer UDF on the 'audio' column
# to get the transcript and persist it in a table called 'transcripts'


cursor.query(
"""CREATE TABLE transcripts AS
SELECT SpeechRecognizer(audio) from news_videos;"""
).df()

# We next incrementally construct the ChatGPT query using EvaDB's Python API
# The query is based on the 'transcripts' table
# This table has a column called 'text' with the transcript text
query = cursor.table('transcripts')

# Since ChatGPT is a built-in function, we don't have to define it
# We can just directly use it in the query
# We need to set the OPENAI_KEY as an environment variable
os.environ["OPENAI_KEY"] = OPENAI_KEY
query = query.select("ChatGPT('Is this video summary related to LLM', text)")

# Finally, we run the query to get the results as a dataframe
response = query.df()


The same AI query can also be written directly in SQL and run on EvaDB.

.. code-block:: sql

--- Query for asking question using ChatGPT
SELECT ChatGPT('Is this video summary related to LLM',
SpeechRecognizer(audio)) FROM news_videos;

EvaDB's declarative query language reduces the complexity of the app, leading to more maintainable code that allows users to build on top of each other's queries.

EvaDB comes with a wide range of models for analyzing unstructured data including image classification, object detection, OCR, face detection, etc. It is fully implemented in Python, and `licensed under the Apache license <https://github.com/georgia-tech-db/evadb>`__. It already contains integrations with widely-used AI pipelines based on Hugging Face, PyTorch, and Open AI.

The high-level SQL API allows even beginners to use EvaDB in a few lines of code. Advanced users can define custom user-defined functions that wrap around any AI model or Python library.

Save time and money
----------------------

EvaDB automatically optimizes the queries to save inference cost and query execution time using its Cascades-style extensible query optimizer. EvaDB's optimizer is tailored for AI pipelines. The Cascades query optimization framework has worked well in SQL database systems for several decades. Query optimization in EvaDB is the bridge that connects the declarative query language to efficient execution.

EvaDB accelerates AI pipelines using a collection of optimizations inspired by SQL database systems including function caching, sampling, and cost-based operator reordering.

EvaDB supports an AI-oriented query language for analyzing both structured and unstructured data. Here are some illustrative apps:


The `Getting Started <source/overview/installation.html>`__ page shows how you can use EvaDB for different AI tasks and how you can easily extend EvaDB to support your custom deep learning model through user-defined functions.

The `User Guides <source/tutorials/index.html>`__ section contains Jupyter Notebooks that demonstrate how to use various features of EvaDB. Each notebook includes a link to Google Colab, where you can run the code yourself.



If you still have questions after reading this documents, ping us on
`our Slack <https://join.slack.com/t/eva-db/shared_invite/zt-1i10zyddy-PlJ4iawLdurDv~aIAq90Dg>`__!

User-Defined Function (UDF) or Function
======================================
------------------------------------------

User-defined functions are thin wrappers around deep learning models. They
allow us to use deep learning models in AI queries.
Expand Down
52 changes: 27 additions & 25 deletions docs/source/usecases/image-classification.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Implementing a Image Classification Pipeline using EvaDB on a Video
Image Classification Pipeline using EvaDB
====

Assume the database has loaded a video ``mnist_video``.
Expand Down Expand Up @@ -28,23 +28,25 @@ Create an image classification function from python source code.

After the function is registered to EvaDB system, it can be directly called and used in SQL query.

.. code-block:: python

query = cursor.table("mnist_video").select("MnistImageClassifier(data).label")
.. tab-set::
.. tab-item:: Python

.. note::
.. code-block:: python

SQL statement
query = cursor.table("mnist_video").select("MnistImageClassifier(data).label")

# Get results in a DataFrame.
query.df()

.. code-block:: sql

SELECT MnistImageClassifier(data).label FROM mnist_video
.. tab-item:: SQL

Get results in a ``DataFrame``.
.. code-block:: sql

.. code-block:: python
SELECT MnistImageClassifier(data).label FROM mnist_video;

query.df()


The result contains a projected ``label`` column, which indicates the digit of a particular frame.

Expand All @@ -69,26 +71,26 @@ The result contains a projected ``label`` column, which indicates the digit of a

Like normal SQL, you can also specify conditions to filter out some frames of the video.

.. code-block:: python
.. tab-set::

.. tab-item:: Python

query = cursor.table("mnist_video") \
.filter("id < 2") \
.select("MnistImageClassifier(data).label")
.. code-block:: python

.. note::
query = cursor.table("mnist_video") \
.filter("id < 2") \
.select("MnistImageClassifier(data).label")

# Return results in a DataFrame.
query.df()

SQL statement
.. tab-item:: SQL

.. code-block:: sql
.. code-block:: sql

SELECT MnistImageClassifier(data).label FROM mnist_video
WHERE id < 2

Return results in a ``DataFrame``.

.. code-block:: python
SELECT MnistImageClassifier(data).label FROM mnist_video
WHERE id < 2

query.df()

Now, the ``DataFrame`` only contains 2 rows after filtering.

Expand Down
9 changes: 3 additions & 6 deletions docs/source/usecases/qa-video.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,7 @@ Q&A Application on Videos
2. Register Functions
----

Whisper
****
Register speech-to-text **whisper** model from `HuggingFace`

.. code-block:: python

Expand All @@ -28,8 +27,7 @@ Whisper

EvaDB allows users to register any model in HuggingFace as a function.

ChatGPT
****
Register **OpenAI** LLM model

.. code-block:: python

Expand All @@ -49,8 +47,7 @@ ChatGPT
3. Summarize Video in Text
----

Create a table with text summary of the video.
Text summarization is generated by running audio-to-text ``Whisper`` model from ``HuggingFace``.
Create a table with text summary of the video. Text summarization is generated by running speech-to-text ``Whisper`` model from ``HuggingFace``.

.. code-block:: python

Expand Down
6 changes: 2 additions & 4 deletions docs/source/usecases/similar-image-search.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,7 @@
Implementing a Similar Image Search Pipeline using EvaDB on Images
Image Similarity Search Pipeline using EvaDB on Images
====

In this use case, we want to search similar images based on an image provided by the user.
To implement this use case, we leverage EvaDB's capability of easily expressing feature extraction pipeline.
Additionaly, we also leverage EvaDB's capability of building a similarity search index and searching the index to
In this use case, we want to search similar images based on an image provided by the user. To implement this use case, we leverage EvaDB's capability of easily expressing feature extraction pipeline. Additionaly, we also leverage EvaDB's capability of building a similarity search index and searching the index to
locate similar images through ``FAISS`` library.

For this use case, we use a reddit image dataset that can be downloaded from `Here <https://www.dropbox.com/scl/fo/fcj6ojmii0gw92zg3jb2s/h\?dl\=1\&rlkey\=j3kj1ox4yn5fhonw06v0pn7r9>`_.
Expand Down