Skip to content

Commit

Permalink
Merge pull request blakeblackshear#1 from georgia-tech-db/master
Browse files Browse the repository at this point in the history
Changes with local repo
  • Loading branch information
swati21 authored Feb 6, 2020
2 parents fd44559 + 88c1996 commit 7192805
Show file tree
Hide file tree
Showing 84 changed files with 2,594 additions and 767 deletions.
4 changes: 3 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
# The VMs have 2 cores and 8 GB of RAM
dist: trusty
sudo: required

services:
- mysql
language: python
python:
- "3.6"
Expand All @@ -16,6 +17,7 @@ before_install:
- conda config --set always_yes yes --set changeps1 no
- conda update -q conda
- conda info -a
- mysql -e 'CREATE DATABASE IF NOT EXISTS eva_catalog;'

install:
- conda env create -f environment.yml
Expand Down
139 changes: 77 additions & 62 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,71 +1,84 @@
## EVA (Exploratory Video Analytics)

[![Build Status](https://travis-ci.org/georgia-tech-db/Eva.svg?branch=master)](https://travis-ci.com/georgia-tech-db/Eva)
[![Coverage Status](https://coveralls.io/repos/github/georgia-tech-db/Eva/badge.svg?branch=master)](https://coveralls.io/github/georgia-tech-db/Eva?branch=master)
### Table of Contents
* Installation
* Demos
* Eva core
* Eva storage
* Dataset


### Installation
* Clone the repo
* Create a virtual environment with conda (explained in detail in the next subsection)
* Run following command to configure git hooks
# EVA (Exploratory Video Analytics)

[![Build Status](https://travis-ci.org/georgia-tech-db/eva.svg?branch=master)](https://travis-ci.com/georgia-tech-db/eva)
[![Coverage Status](https://coveralls.io/repos/github/georgia-tech-db/eva/badge.svg?branch=master)](https://coveralls.io/github/georgia-tech-db/eva?branch=master)

EVA is an end-to-end video analytics engine that allows users to query a database of videos and return results based on machine learning analysis.

## Table of Contents
* [Installation](#installation)
* [Development](#development)
* [Architecture](#architecture)

## Installation

Installation of EVA involves setting a virtual environment using [miniconda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) and configuring git hooks.

1. Clone the repository
```shell
git config core.hooksPath .githooks
git clone https://github.com/georgia-tech-db/eva.git
```

2. Install [miniconda](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) and update the `PATH` environment variable.
```shell
export PATH="$HOME/miniconda/bin:$PATH"
```

##### How to create the virtual environment
* Install conda - we have prepared a yaml file that you can directly use with conda to install a virtual environment
* Navigate to the eva repository in your local computer
* conda env create -f environment.yml
* Note, this yaml file should install and all code should run with no errors in Ubuntu 16.04.
However, there are know installation issues with MacOS.

### Demos
We have demos for the following components:
1. Eva analytics (pipeline for loading the dataset, training the filters, and outputting the optimal plan)
```commandline
cd <YOUR_EVA_DIRECTORY>
python pipeline.py
3. Install dependencies in a miniconda virtual environment. Virtual environments keep dependencies in separate sandboxes so you can switch between both `eva` and other Python applications easily and get them running.
```shell
cd eva/
conda env create -f environment.yml
```
2. Eva Query Optimizer (Will show converted queries for the original queries)
```commandline
cd <YOUR_EVA_DIRECTORY>
python query_optimizer/query_optimizer.py

4. Activate the `eva` environment.
```shell
conda activate eva
```
3. Eva Loader (Loads UA-DETRAC dataset)
```commandline
cd <YOUR_EVA_DIRECTORY>
python loaders/load.py

5. Run following command to configure git hooks.
```shell
git config core.hooksPath .githooks
```

NEW!!! There are new versions of the loaders and filters.
```commandline
cd <YOUR_EVA_DIRECTORY>
python loaders/uadetrac_loader.py
python filters/minimum_filter.py
## Development

We invite you to help us build the future of visual data management DBMSs.

1. Ensure that all the unit test cases (including the ones you have added) run succesfully.

```shell
pycodestyle --select E test src/loaders
```

2. Ensure that the coding style conventions are followed.

```shell
pycodestyle --select E test src/loaders
```

3. Run the formatter script to automatically fix most of the coding style issues.

```shell
python script/formatting/formatter.py
```

2. EVA storage-system (Video compression and indexing system - *currently in progress*)
Please look up the [contributing guide](https://github.com/georgia-tech-db/eva/blob/master/CONTRIBUTING.md#development) for details.

### Eva Core
Eva core is consisted of
## Architecture

The EVA visual data management system consists of four core components:

* Query Parser
* Query Optimizer
* Filters
* UDFs
* Loaders
* Query Execution Engine (Filters + UDFs)
* Storage Engine (Loaders)

##### Query Optimizer
#### Query Optimizer
The query optimizer converts a given query to the optimal form.

All code related to this module is in */query_optimizer*
Module location: *src/query_optimizer*

##### Filters
#### Filters
The filters does preliminary filtering to video frames using cheap machine learning models.
The filters module also outputs statistics such as reduction rate and cost that is used by Query Optimizer module.

Expand All @@ -78,26 +91,28 @@ The filters below are running:
* Random Forest
* SVM

All code related to this module is in */filters*
Module location: *src/filters*

##### UDFs
#### UDFs
This module contains all imported deep learning models. Currently, there is no code that performs this task. It is a work in progress.
Information of current work is explained in detail [here](src/udfs/README.md).

All related code should be inside */udfs*
Module location: *src/udfs*

##### Loaders
#### Loaders
The loaders load the dataset with the attributes specified in the *Accelerating Machine Learning Inference with Probabilistic Predicates* by Yao et al.

All code related to this module is in */loaders*

### Eva storage
Currently a work in progress. Come check back later!
Module location: *src/loaders*

## Status

### Dataset
__[Dataset info](data/README.md)__ explains detailed information about the datasets
_Technology preview_: currently unsupported, possibly due to incomplete functionality or unsuitability for production use.

## Contributors

See the [people page](https://github.com/georgia-tech-db/eva/graphs/contributors) for the full listing of contributors.

## License

Copyright (c) 2018-2020 [Georgia Tech Database Group](http://db.cc.gatech.edu/)
Licensed under the [Apache License](LICENSE).
6 changes: 6 additions & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ channels:
- conda-forge
- anaconda
- defaults
- pytorch
dependencies:
- python=3.7
- pip
Expand All @@ -18,8 +19,13 @@ dependencies:
- autoflake
- torchvision
- pytorch
- tensorflow
- tensorboard
- pillow=6.1
- sqlalchemy
- pymysql
- sqlalchemy-utils
- mock
- pip:
- antlr4-python3-runtime==4.8
- petastorm
Expand Down
45 changes: 0 additions & 45 deletions src/catalog/catalog_dataframes.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,48 +12,3 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os

from src.configuration.dictionary import DATASET_DATAFRAME_NAME

from src.storage.dataframe import create_dataframe
from src.storage.dataframe import DataFrameMetadata

from src.catalog.schema import Column
from src.catalog.schema import ColumnType
from src.catalog.schema import Schema


def get_dataset_schema():
column_1 = Column("dataset_id", ColumnType.INTEGER, False)
column_2 = Column("dataset_name", ColumnType.STRING, False)

datset_df_schema = Schema("dataset_df_schema",
[column_1, column_2])
return datset_df_schema


def load_catalog_dataframes(catalog_dir_url: str,
catalog_dictionary):

dataset_file_url = os.path.join(catalog_dir_url, DATASET_DATAFRAME_NAME)
dataset_df_schema = get_dataset_schema()
dataset_catalog_entry = DataFrameMetadata(dataset_file_url,
dataset_df_schema)

catalog_dictionary.update({DATASET_DATAFRAME_NAME: dataset_catalog_entry})


def create_catalog_dataframes(catalog_dir_url: str,
catalog_dictionary):

dataset_df_schema = get_dataset_schema()
dataset_file_url = os.path.join(catalog_dir_url, DATASET_DATAFRAME_NAME)
dataset_catalog_entry = DataFrameMetadata(dataset_file_url,
dataset_df_schema)

create_dataframe(dataset_catalog_entry)

# dataframe name : (schema, petastorm_schema, pyspark_schema)
catalog_dictionary.update({DATASET_DATAFRAME_NAME: dataset_catalog_entry})
Loading

0 comments on commit 7192805

Please sign in to comment.