Skip to content

Commit

Permalink
Fix benchmark documentation (#931)
Browse files Browse the repository at this point in the history
πŸ‘‹ Thanks for submitting a Pull Request to EvaDB!

πŸ™Œ We want to make contributing to EvaDB as easy and transparent as
possible. Here are a few tips to get you started:

- πŸ” Search existing EvaDB
[PRs](https://github.com/georgia-tech-db/eva/pulls) to see if a similar
PR already exists.
- πŸ”— Link this PR to a EvaDB
[issue](https://github.com/georgia-tech-db/eva/issues) to help us
understand what bug fix or feature is being implemented.
- πŸ“ˆ Provide before and after profiling results to help us quantify the
improvement your PR provides (if applicable).

πŸ‘‰ Please see our βœ… [Contributing
Guide](https://evadb.readthedocs.io/en/stable/source/contribute/index.html)
for more details.
  • Loading branch information
xzdandy authored Aug 14, 2023
1 parent 8c89ae1 commit 6d0998b
Showing 1 changed file with 12 additions and 14 deletions.
26 changes: 12 additions & 14 deletions docs/source/benchmarks/text_summarization.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
Text summarization benchmark
====
In this benchmark, we compare the performance of text summarization between EvaDB and MindsDB on `CNN-DailyMail News <https://www.kaggle.com/datasets/gowrishankarp/newspaper-text-summarization-cnn-dailymail>`.
In this benchmark, we compare the performance of text summarization between EvaDB and MindsDB on `CNN-DailyMail News <https://www.kaggle.com/datasets/gowrishankarp/newspaper-text-summarization-cnn-dailymail>`_.

1. Prepare dataset
----

.. code-block: bash
.. code-block:: bash
cd benchmark/text_summarization
bash download_dataset.sh
Expand All @@ -17,7 +17,7 @@ In this benchmark, we compare the performance of text summarization between EvaD

Install ray in your EvaDB virtual environment. ``pip install "ray>=1.13.0,<2.5.0"``

.. code-block: bash
.. code-block:: bash
cd benchmark/text_summarization
python text_summarization_with_evadb.py
Expand All @@ -26,12 +26,10 @@ In this benchmark, we compare the performance of text summarization between EvaD
3. Using MindsDB to summarize the CNN DailyMail News
----

.. _sqlite database:

Prepare sqlite database for MindsDB
****

.. code-block: bash
.. code-block:: bash
sqlite3 cnn_news_test.db
> .mode csv
Expand All @@ -41,24 +39,24 @@ Prepare sqlite database for MindsDB
Install MindsDB
****
Follow the `Setup for Source Code via pip <https://docs.mindsdb.com/setup/self-hosted/pip/source>` to install mindsdb.
Follow the `Setup for Source Code via pip <https://docs.mindsdb.com/setup/self-hosted/pip/source>`_ to install mindsdb.

.. note::

At the time of this documentation, we need to manully ``pip install evaluate`` for huggingface model to work in MindsDB.

After the installation, we use mysql cli to connect to MindsDB. Replace the port number as needed.

.. code-block: bash
.. code-block:: bash
mysql -h 127.0.0.1 --port 47335 -u mindsdb -p
Run Experiment
****

Connect the sqlite database we created before: :ref:`sqlite database`.
Connect the sqlite database we created before.

.. code-block: sql
.. code-block:: sql
CREATE DATABASE sqlite_datasource
WITH ENGINE = 'sqlite',
Expand All @@ -68,7 +66,7 @@ Connect the sqlite database we created before: :ref:`sqlite database`.
Create text summarization model and wait for its readiness.

.. code-block: sql
.. code-block:: sql
CREATE MODEL mindsdb.hf_bart_sum_20
PREDICT PRED
Expand All @@ -82,9 +80,9 @@ Create text summarization model and wait for its readiness.
DESCRIBE mindsdb.hf_bart_sum_20;
Use the model to summarize the CNN DailyMail news
Use the model to summarize the CNN DailyMail news.

.. code-block: sql
.. code-block:: sql
CREATE OR REPLACE TABLE sqlite_datasource.cnn_news_summary (
SELECT PRED
Expand All @@ -95,7 +93,7 @@ Use the model to summarize the CNN DailyMail news
4. Experiment results
----
Below are nubmers from a server with 56 Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz and two Quadro P6000 GPU
Below are nubmers from a server with 56 Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz and two Quadro P6000 GPU.

.. list-table:: Text summarization with ``sshleifer/distilbart-cnn-12-6`` on CNN-DailyMail News

Expand Down

0 comments on commit 6d0998b

Please sign in to comment.