Statistical Analysis System (SAS) Transcompiler to SciPy

The goal of this is to transcompile a subset of SAS/Base to SciPy.

Testing

The tests can be run directly inside your git clone (without having to install stan) by typing:

nosetests stan

Differences

data merge will not require the data to be sorted before hand. Data will be implicitly sorted (similar to the SPDE engine).
dates will be suppported in a different manner (coming soon).
format, length, informats will not be necessary (we shall use dtype in numpy).
Pandas supports column names with spaces in it. This may cause issues since SAS automatically changes spaces to _.
Pandas is case sensitive, SAS is not.

Known Issues

Will not Suport

macro facility. It can be replicated (to a degree) using iPython.

Example

from stan.transcompile import transcompile
import stan.stan_magic
from pandas import DataFrame
import numpy as np
import pkgutil
from numpy import nan

import stan.proc_functions as proc_func

mod_name = ["from stan.proc_functions import %s" % name for _, name, _ in pkgutil.iter_modules(proc_func.__path__)]
exec("\n".join(mod_name))

# create an example data frame
df = DataFrame(np.random.randn(10, 5), columns = ['a','b','c','d','e'])
df

	a	b	c	d	e
0	-1.245481	-1.609963	0.442550	-0.056406	-0.213349
1	-1.118754	0.116146	-0.032579	-0.556940	0.270678
2	0.864960	-0.479118	2.370390	2.090656	-0.475426
3	0.434934	-2.510176	0.122871	0.077915	0.597477
4	0.689308	0.042817	0.217040	-1.424120	-0.214721
5	-0.432170	-1.344882	-0.055934	1.921247	1.519922
6	-0.837277	0.944802	-0.650114	-0.297314	1.432118
7	1.488292	-1.236296	0.128023	2.886408	-0.560200
8	-0.510566	-1.736577	0.066769	-0.735257	0.178167
9	2.540022	0.034493	-0.521496	-2.189938	0.111702

%%stan
data test;
set df (drop = a);
run;

u"test=df.drop(['a'],1)n"

exec(_)
test

	b	c	d	e
0	-1.609963	0.442550	-0.056406	-0.213349
1	0.116146	-0.032579	-0.556940	0.270678
2	-0.479118	2.370390	2.090656	-0.475426
3	-2.510176	0.122871	0.077915	0.597477
4	0.042817	0.217040	-1.424120	-0.214721
5	-1.344882	-0.055934	1.921247	1.519922
6	0.944802	-0.650114	-0.297314	1.432118
7	-1.236296	0.128023	2.886408	-0.560200
8	-1.736577	0.066769	-0.735257	0.178167
9	0.034493	-0.521496	-2.189938	0.111702

%%stan
data df_if;
    set df;
    if b < 0.3 then x = 0;
    else if b < 0.6 then x = 1;
    else x = 2;
run;

u"df_if=dfnfor el in ['x']:n    if el not in df_if.columns:n        df_if[el] = np.nanndf_if.ix[((df_if[u'b']<0.3)), 'x'] = (0)nfor el in ['x']:n    if el not in df_if.columns:n        df_if[el] = np.nanndf_if.ix[((~((df_if[u'b']<0.3))) & (df_if[u'b']<0.6)), 'x'] = (1)ndf_if.ix[((~((df_if[u'b']<0.6))) & (~((df_if[u'b']<0.3)))), 'x'] = (2)n"

exec(_)
df_if

	a	b	c	d	e	x
0	-1.245481	-1.609963	0.442550	-0.056406	-0.213349	0
1	-1.118754	0.116146	-0.032579	-0.556940	0.270678	0
2	0.864960	-0.479118	2.370390	2.090656	-0.475426	0
3	0.434934	-2.510176	0.122871	0.077915	0.597477	0
4	0.689308	0.042817	0.217040	-1.424120	-0.214721	0
5	-0.432170	-1.344882	-0.055934	1.921247	1.519922	0
6	-0.837277	0.944802	-0.650114	-0.297314	1.432118	2
7	1.488292	-1.236296	0.128023	2.886408	-0.560200	0
8	-0.510566	-1.736577	0.066769	-0.735257	0.178167	0
9	2.540022	0.034493	-0.521496	-2.189938	0.111702	0

# procs can be added manually they can be thought of as python functions
# you can define your own, though I need to work on the parser
# to get it "smooth"

df1 = DataFrame({'a' : [1, 0, 1], 'b' : [0, 1, 1] }, dtype=bool)
df1

	a	b
0	True	False
1	False	True
2	True	True

%%stan
proc describe data = df1 out = df2;
by a;
run;

u"df2=describe.describe(data=df1,by='a')"

exec(_)
df2

		a	b
a
False	count	1	1
	mean	0	1
	std	NaN	NaN
	min	False	True
	25%	False	True
	50%	0	1
	75%	False	True
	max	False	True
True	count	2	2
	mean	1	0.5
	std	0	0.7071068
	min	True	False
	25%	1	0.25
	50%	1	0.5
	75%	1	0.75
	max	True	True

The proc actually isn't difficult to write. So for the above code it is

actually just this:

def describe(data, by):

    return data.groupby(by).describe()

This functionality allow you to handle most of the by and retain

cases. For languages like Python and R, the normal way to handle data is

through the split-apply-combine methodology.

Merges can be achieved in a similar way, by creating a proc:

%%stan
proc merge out = df2;
dt_left left;
dt_right right;
on = 'key';
run;

u"df2=merge.merge(dt_left=left,dt_right=right,on='key')"

left = DataFrame({'key': ['foo', 'foo'], 'lval': [1, 2]})
right = DataFrame({'key': ['foo', 'foo'], 'rval': [4, 5]})

exec(_)
df2

	key	lval	rval
0	foo	1	4
1	foo	1	5
2	foo	2	4
3	foo	2	5

heres an example showing how you can define your own function and run it

(not a function that came with the package)

def sum_mean_by(data, by):
    return data.groupby(by).agg([np.sum, np.mean])

%%stan
proc sum_mean_by data = df_if out = df_sum;
by x;
run;

u"df_sum=sum_mean_by(data=df_if,by='x')"

exec(_)
df_sum

	a		b		c		d		e
	sum	mean	sum	mean	sum	mean	sum	mean	sum	mean
x
0	2.710545	0.301172	-8.723557	-0.969284	2.737635	0.304182	2.013566	0.223730	1.214251	0.134917
2	-0.837277	-0.837277	0.944802	0.944802	-0.650114	-0.650114	-0.297314	-0.297314	1.432118	1.432118

proc sql is supported through the pandasql library. So the above

table could have been produced via SQL as well.

import pandasql

q = """
select
    sum(a) as sum_a,
    sum(b) as sum_b,
    sum(c) as sum_c,
    sum(d) as sum_d,
    sum(e) as sum_e,
    avg(a) as avg_a,
    avg(b) as avg_b,
    avg(c) as avg_c,
    avg(d) as avg_d,
    avg(e) as avg_e
from
    df_if
group by x
"""

df_sum_sql = pandasql.sqldf(q, locals())
df_sum_sql

	sum_a	sum_b	sum_c	sum_d	sum_e	avg_a	avg_b	avg_c	avg_d	avg_e
0	2.710545	-8.723557	2.737635	2.013566	1.214251	0.301172	-0.969284	0.304182	0.223730	0.134917
1	-0.837277	0.944802	-0.650114	-0.297314	1.432118	-0.837277	0.944802	-0.650114	-0.297314	1.432118

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
docs		docs
stan		stan
.gitignore		.gitignore
.travis.yml		.travis.yml
CONTRIBUTING.rst		CONTRIBUTING.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.rst		README.rst
requirements.txt		requirements.txt
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Statistical Analysis System (SAS) Transcompiler to SciPy

Testing

Differences

Known Issues

Will not Suport

Example

About

Releases 1

Packages

Languages

License

HeardACat/Stan

Folders and files

Latest commit

History

Repository files navigation

Statistical Analysis System (SAS) Transcompiler to SciPy

Testing

Differences

Known Issues

Will not Suport

Example

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages