open-dslib

Open Source Data Science Modules

Installation

Install using the following command:

pip install git+https://github.com/bluelabsio/open-dslib

Usage

Environment Setup

The package picks up database credentials from the environment. A reference to the database name - eg. DATABASE - is required, and all environment variables must be named accordingly:

- {DATABASE}_DB - refers to the name of the database to be queried
- {DATABASE}_HOST - refers to the database host
- {DATABASE}_PORT - refers to the database port
- {DATABASE}_USER - refers to the database user making the connection
- {DATABASE}_PW - refers to the password for the database user who is making the connection
- {DATABASE}_TYPE - refers to the database dialect

Creating Crosstabs

Crosstabs can be created in two ways:

Passing all the crosstab parameters inline:

import dslib.assessment.crosstabs as xt

xtabs = xt.CrossTabs(
    database_name = '{DATABASE}',
    universe="(SELECT * FROM schema.table_name LIMIT 100000)",
    metrics=[
        "count(*)",
        "COUNT(*)::FLOAT/NULLIF(SUM(COUNT(*)) OVER(), 0) prop",
    ],
    splits=[
        "gender",
        "race",
        ["gender", "race"] #(This can be used to cross two variables together)
    ]
)

xtabs.df

Passing crosstab parameters through a yaml config:

import dslib.assessment.crosstabs as xt

xtabs = xt.CrossTabs(
    database_name = '{DATABASE}',
    yaml_file='example_config.yaml'
)

xtabs.df

The config should be formatted like:

universes:
    'table1': (SELECT * FROM schema.table_name_1)
    'table2': (SELECT * FROM schema.table_name_2)
    
metrics:
    - count(*)
    - COUNT(*)::FLOAT/NULLIF(SUM(COUNT(*)) OVER(), 0) prop
    
splits:
    - gender
    - race
    - - gender
      - race  #(This can be used to cross two variables together)

Naming a Universe

When providing multiple universes, provide a name for each universe that it can be referenced by. Ensure that you do not use any reserved words that might have some meaning in the SQL dialect you are working with.

Examples of reserved words:

ALL
FULL
COLUMN
GROUP

You can find a list of reserved words for Amazon Redshift here.

Please do not provide any aliases for your universe in the sql that you enter, this will be handled by the tool itself.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
dslib		dslib
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

open-dslib

Installation

Usage

Environment Setup

Creating Crosstabs

Naming a Universe

About

Releases

Packages

Languages

bluelabsio/open-dslib

Folders and files

Latest commit

History

Repository files navigation

open-dslib

Installation

Usage

Environment Setup

Creating Crosstabs

Naming a Universe

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages