Merge pull request blakeblackshear#1 from georgia-tech-db/master

Changes with local repo
luoj1 · Feb 6, 2020 · 7192805 · 7192805
2 parents fd44559 + 88c1996
commit 7192805
Show file tree

Hide file tree

Showing 84 changed files with 2,594 additions and 767 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -2,7 +2,8 @@
 # The VMs have 2 cores and 8 GB of RAM
 dist: trusty
 sudo: required
-
+services:
+  - mysql
 language: python
 python:
   - "3.6"
@@ -16,6 +17,7 @@ before_install:
   - conda config --set always_yes yes --set changeps1 no
   - conda update -q conda
   - conda info -a
+  - mysql -e 'CREATE DATABASE IF NOT EXISTS eva_catalog;'
 
 install:
   - conda env create -f environment.yml

diff --git a/README.md b/README.md
@@ -1,71 +1,84 @@
-## EVA (Exploratory Video Analytics)
-
-[![Build Status](https://travis-ci.org/georgia-tech-db/Eva.svg?branch=master)](https://travis-ci.com/georgia-tech-db/Eva)
-[![Coverage Status](https://coveralls.io/repos/github/georgia-tech-db/Eva/badge.svg?branch=master)](https://coveralls.io/github/georgia-tech-db/Eva?branch=master)
-### Table of Contents
-* Installation
-* Demos
-* Eva core
-* Eva storage 
-* Dataset 
-
-
-### Installation
-* Clone the repo
-* Create a virtual environment with conda (explained in detail in the next subsection)
-* Run following command to configure git hooks 
+# EVA (Exploratory Video Analytics)
+
+[![Build Status](https://travis-ci.org/georgia-tech-db/eva.svg?branch=master)](https://travis-ci.com/georgia-tech-db/eva)
+[![Coverage Status](https://coveralls.io/repos/github/georgia-tech-db/eva/badge.svg?branch=master)](https://coveralls.io/github/georgia-tech-db/eva?branch=master)
+
+EVA is an end-to-end video analytics engine that allows users to query a database of videos and return results based on machine learning analysis. 
+
+## Table of Contents
+* [Installation](#installation)
+* [Development](#development)
+* [Architecture](#architecture)
+
+## Installation
+
+Installation of EVA involves setting a virtual environment using [miniconda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) and configuring git hooks.
+
+1. Clone the repository
 ```shell
-git config core.hooksPath .githooks
+git clone https://github.com/georgia-tech-db/eva.git
 ```
 
+2. Install [miniconda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) and update the `PATH` environment variable.
+```shell
+export PATH="$HOME/miniconda/bin:$PATH" 
+```
 
-##### How to create the virtual environment
-* Install conda - we have prepared a yaml file that you can directly use with conda to install a virtual environment 
-* Navigate to the eva repository in your local computer
-* conda env create -f environment.yml
-* Note, this yaml file should install and all code should run with no errors in Ubuntu 16.04.
-   However, there are know installation issues with MacOS.
-
-### Demos
-We have demos for the following components:
-1. Eva analytics (pipeline for loading the dataset, training the filters, and outputting the optimal plan)
-```commandline
-   cd <YOUR_EVA_DIRECTORY>
-   python pipeline.py
+3. Install dependencies in a miniconda virtual environment. Virtual environments keep dependencies in separate sandboxes so you can switch between both `eva` and other Python applications easily and get them running.
+```shell
+cd eva/
+conda env create -f environment.yml
 ```
-2. Eva Query Optimizer (Will show converted queries for the original queries)
-```commandline
-   cd <YOUR_EVA_DIRECTORY>
-   python query_optimizer/query_optimizer.py
+
+4. Activate the `eva` environment.
+```shell
+conda activate eva
 ```
-3. Eva Loader (Loads UA-DETRAC dataset)
-```commandline
-   cd <YOUR_EVA_DIRECTORY>
-   python loaders/load.py
+
+5. Run following command to configure git hooks.
+```shell
+git config core.hooksPath .githooks
 ```
 
-NEW!!! There are new versions of the loaders and filters.
-```commandline
-   cd <YOUR_EVA_DIRECTORY>
-   python loaders/uadetrac_loader.py
-   python filters/minimum_filter.py
+## Development
+
+We invite you to help us build the future of visual data management DBMSs.
+
+1. Ensure that all the unit test cases (including the ones you have added) run succesfully.
+
+```shell
+   pycodestyle --select E test src/loaders
+``` 
+
+2. Ensure that the coding style conventions are followed.
+
+```shell
+   pycodestyle --select E test src/loaders
+``` 
+
+3. Run the formatter script to automatically fix most of the coding style issues.
+
+```shell
+   python script/formatting/formatter.py
 ```
 
-2. EVA storage-system (Video compression and indexing system - *currently in progress*)
+Please look up the [contributing guide](https://github.com/georgia-tech-db/eva/blob/master/CONTRIBUTING.md#development) for details.
 
-### Eva Core
-Eva core is consisted of
+## Architecture 
+
+The EVA visual data management system consists of four core components:
+
+* Query Parser
 * Query Optimizer
-* Filters
-* UDFs
-* Loaders
+* Query Execution Engine (Filters + UDFs)
+* Storage Engine (Loaders)
 
-##### Query Optimizer
+#### Query Optimizer
 The query optimizer converts a given query to the optimal form. 
 
-All code related to this module is in */query_optimizer*
+Module location: *src/query_optimizer*
 
-##### Filters
+#### Filters
 The filters does preliminary filtering to video frames using cheap machine learning models.
 The filters module also outputs statistics such as reduction rate and cost that is used by Query Optimizer module.
 
@@ -78,26 +91,28 @@ The filters below are running:
 * Random Forest
 * SVM
 
-All code related to this module is in */filters*
+Module location: *src/filters*
 
-##### UDFs
+#### UDFs
 This module contains all imported deep learning models. Currently, there is no code that performs this task. It is a work in progress.
 Information of current work is explained in detail [here](src/udfs/README.md).
 
-All related code should be inside */udfs*
+Module location: *src/udfs*
 
-##### Loaders
+#### Loaders
 The loaders load the dataset with the attributes specified in the *Accelerating Machine Learning Inference with Probabilistic Predicates* by Yao et al.
 
-All code related to this module is in */loaders*
-
-### Eva storage
-Currently a work in progress. Come check back later!
+Module location: *src/loaders*
 
+## Status
 
-### Dataset
-__[Dataset info](data/README.md)__ explains detailed information about the  datasets
+_Technology preview_: currently unsupported, possibly due to incomplete functionality or unsuitability for production use.
 
+## Contributors
 
+See the [people page](https://github.com/georgia-tech-db/eva/graphs/contributors) for the full listing of contributors.
 
+## License
 
+Copyright (c) 2018-2020 [Georgia Tech Database Group](http://db.cc.gatech.edu/)  
+Licensed under the [Apache License](LICENSE).
diff --git a/environment.yml b/environment.yml
@@ -3,6 +3,7 @@ channels:
   - conda-forge
   - anaconda
   - defaults
+  - pytorch
 dependencies:
   - python=3.7
   - pip
@@ -18,8 +19,13 @@ dependencies:
   - autoflake
   - torchvision
   - pytorch
+  - tensorflow
   - tensorboard
   - pillow=6.1
+  - sqlalchemy
+  - pymysql
+  - sqlalchemy-utils
+  - mock
   - pip:
     - antlr4-python3-runtime==4.8
     - petastorm

diff --git a/src/catalog/catalog_dataframes.py b/src/catalog/catalog_dataframes.py
@@ -12,48 +12,3 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-
-import os
-
-from src.configuration.dictionary import DATASET_DATAFRAME_NAME
-
-from src.storage.dataframe import create_dataframe
-from src.storage.dataframe import DataFrameMetadata
-
-from src.catalog.schema import Column
-from src.catalog.schema import ColumnType
-from src.catalog.schema import Schema
-
-
-def get_dataset_schema():
-    column_1 = Column("dataset_id", ColumnType.INTEGER, False)
-    column_2 = Column("dataset_name", ColumnType.STRING, False)
-
-    datset_df_schema = Schema("dataset_df_schema",
-                              [column_1, column_2])
-    return datset_df_schema
-
-
-def load_catalog_dataframes(catalog_dir_url: str,
-                            catalog_dictionary):
-
-    dataset_file_url = os.path.join(catalog_dir_url, DATASET_DATAFRAME_NAME)
-    dataset_df_schema = get_dataset_schema()
-    dataset_catalog_entry = DataFrameMetadata(dataset_file_url,
-                                              dataset_df_schema)
-
-    catalog_dictionary.update({DATASET_DATAFRAME_NAME: dataset_catalog_entry})
-
-
-def create_catalog_dataframes(catalog_dir_url: str,
-                              catalog_dictionary):
-
-    dataset_df_schema = get_dataset_schema()
-    dataset_file_url = os.path.join(catalog_dir_url, DATASET_DATAFRAME_NAME)
-    dataset_catalog_entry = DataFrameMetadata(dataset_file_url,
-                                              dataset_df_schema)
-
-    create_dataframe(dataset_catalog_entry)
-
-    # dataframe name : (schema, petastorm_schema, pyspark_schema)
-    catalog_dictionary.update({DATASET_DATAFRAME_NAME: dataset_catalog_entry})