Implement frequency table function a la table in R #170

wesm · 2011-09-25T14:53:47Z

No description provided.

wesm · 2011-09-25T14:54:08Z

A cut function would also be nice

gregglind · 2012-01-13T21:23:33Z

What is the right way of doing a simple counts crosstab with marginals? For bonus points, all vars vs all vars.?

wesm · 2012-01-13T21:33:10Z

Use pivot_table (it has margins, too). I'm pretty sure this issue can be closed, I just need to look at the functionality R provides and verify that it's addressed by an analogous pivot_table call

gregglind · 2012-01-13T21:59:48Z

So, supposing I have columns 'a','b', what is the simplest call to get
the crosstab table? For some reason, pivot_table is tough for me!

On Fri, Jan 13, 2012 at 3:33 PM, Wes McKinney
[email protected]
wrote:

Use pivot_table (it has margins, too). I'm pretty sure this issue can be closed, I just need to look at the functionality R provides and verify that it's addressed by an analogous pivot_table call

Reply to this email directly or view it on GitHub:
https://github.com/wesm/pandas/issues/170#issuecomment-3486638

wesm · 2012-01-13T22:06:40Z

example:


In [10]: wp
Out[10]: 
    breaks wool tension
1   26     A    L      
2   30     A    L      
3   54     A    L      
4   25     A    L      
5   70     A    L      
6   52     A    L      
7   51     A    L      
8   26     A    L      
9   67     A    L      
10  18     A    M      
11  21     A    M      
12  29     A    M      
13  17     A    M      
14  12     A    M      
15  18     A    M      
16  35     A    M      
17  30     A    M      
18  36     A    M      
19  36     A    H      
20  21     A    H      
21  24     A    H      
22  18     A    H      
23  10     A    H      
24  43     A    H      
25  28     A    H      
26  15     A    H      
27  26     A    H      
28  27     B    L      
29  14     B    L      
30  29     B    L      
31  19     B    L      
32  29     B    L      
33  31     B    L      
34  41     B    L      
35  20     B    L      
36  44     B    L      
37  42     B    M      
38  26     B    M      
39  19     B    M      
40  16     B    M      
41  39     B    M      
42  28     B    M      
43  21     B    M      
44  39     B    M      
45  29     B    M      
46  20     B    H      
47  21     B    H      
48  24     B    H      
49  17     B    H      
50  13     B    H      
51  15     B    H      
52  15     B    H      
53  16     B    H      
54  28     B    H      

In [11]: wp.pivot_table('breaks', rows='wool', cols='tension', aggfunc='count')
Out[11]: 
tension  H  L  M
wool            
A        9  9  9
B        9  9  9

I'll have a look at R's table function and add a simple crosstab function or something

wesm · 2012-01-14T21:40:36Z

Just wrote a blog post here: http://wesmckinney.com/blog/?p=443. I don't think it's necessary to add any more functions

wesm · 2012-01-16T17:48:15Z

OK Gregg, I'll bite:

In [7]: a
Out[7]: 
array([1, 2, 6, 6, 4, 0, 2, 0, 4, 3, 5, 1, 1, 2, 6, 3, 4, 4, 5, 4, 4, 5, 5,
       2, 1, 1, 6, 3, 5, 2, 5, 6, 2, 2, 5, 1, 1, 3, 1, 4, 1, 6, 0, 1, 3, 3,
       1, 4, 2, 1, 0, 5, 0, 5, 1, 1, 5, 0, 2, 4, 2, 4, 2, 2, 2, 6, 2, 0, 1,
       4, 6, 1, 4, 0, 5, 5, 3, 5, 5, 6, 0, 6, 6, 5, 0, 2, 4, 2, 2, 0, 5, 0,
       5, 6, 5, 6, 4, 5, 0, 4])

In [8]: b
Out[8]: 
array([0, 0, 0, 2, 0, 0, 2, 1, 1, 1, 2, 2, 0, 1, 0, 0, 2, 2, 1, 0, 0, 2, 1,
       1, 0, 2, 2, 1, 2, 1, 1, 1, 2, 1, 2, 0, 2, 1, 1, 0, 0, 0, 0, 2, 1, 1,
       2, 0, 0, 1, 1, 1, 2, 2, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 2, 0, 2, 0,
       0, 1, 1, 2, 0, 1, 2, 1, 1, 2, 0, 1, 0, 1, 1, 1, 2, 1, 2, 2, 0, 2, 1,
       2, 0, 1, 1, 2, 0, 0, 0])

In [9]: c
Out[9]: 
array([3, 3, 4, 1, 1, 3, 4, 4, 1, 0, 2, 2, 4, 2, 3, 0, 1, 0, 2, 0, 4, 1, 3,
       1, 0, 1, 1, 0, 1, 4, 1, 4, 2, 3, 3, 0, 3, 3, 1, 3, 0, 1, 4, 4, 3, 1,
       3, 1, 1, 4, 1, 0, 0, 3, 1, 3, 3, 3, 2, 2, 1, 2, 3, 4, 0, 3, 1, 3, 3,
       0, 4, 3, 0, 3, 0, 2, 4, 3, 1, 0, 4, 1, 3, 0, 1, 1, 4, 0, 0, 3, 2, 1,
       4, 2, 3, 2, 2, 1, 2, 0])

In [10]: result = crosstab(a, [b, c], rownames=['a'], colnames=('b', 'c'),
                          margins=True)

In [11]: result
Out[11]: 
b    0               1               2              All
c    0  1  2  3   4  0  1  2  3   4  0  1  2  3  4     
0    0  0  1  4   1  0  3  0  0   2  1  0  0  1  0  13 
1    3  0  0  3   1  0  2  0  1   1  0  1  1  2  1  16 
2    0  3  1  1   0  1  1  1  2   2  2  1  1  0  1  17 
3    1  0  0  0   0  2  1  0  2   1  0  0  0  0  0  7  
4    3  2  2  1   1  0  1  0  0   1  2  1  1  0  0  15 
5    0  1  0  0   0  3  1  1  4   0  0  3  3  2  1  19 
6    1  2  1  1   1  0  0  1  1   2  0  2  0  1  0  13 
All  8  8  5  10  4  6  9  3  10  9  5  8  6  6  3  100

I think that's pretty slick

* master: (313 commits) TST: more Python 2.5 sadness TST: Python 2.5 float formatting changed TST: cast to i8 when checking margins BUG: DataFrame.join on keys produce wrong result, does not preserve order DOC: release notes ENH: xs level can take multiple levels, pass multiple levels to MultiIndex.droplevel, GH pandas-dev#371 BUG: fix bugs related to comments in pandas-dev#371 BUG: fix TextParser with list buglet, enable parsing of DataFrame output with index names BUG: convert tuples in concat to MultiIndex BUG: don't lose index names when adding row margin ENH: add margins to crosstab ENH: add crosstab function and test ENH: crosstab prototype function, API needs fleshing out, GH pandas-dev#170 BUG: fix buglet with xs with level, GH pandas-dev#371 TST: add test_sql.py module TST: testing, cleanup of io.sql module TST: indexing testing with minor Series.__getitem__ refactoring ENH: hack toward pandas-dev#629 BUG: check for non-contiguous memory in SeriesGrouper, causing segfault ENH: add ability to pass list of dicts to DataFrame.append (GH pandas-dev#464) ...

* Fix for issue pandas-dev#169

wesm closed this as completed Jan 14, 2012

wesm reopened this Jan 15, 2012

wesm added a commit that referenced this issue Jan 16, 2012

ENH: crosstab prototype function, API needs fleshing out, GH #170

908cae5

wesm closed this as completed Jan 16, 2012

dan-nadler pushed a commit to dan-nadler/pandas that referenced this issue Sep 23, 2019

Fix for issue pandas-dev#169 (pandas-dev#170)

ca429c3

* Fix for issue pandas-dev#169

DavidToneian mentioned this issue Jul 20, 2023

BUG: Doc build fails locally due to rstjinja issue with doc/source/user_guide/style.nbconvert.ipynb #54212

Closed

3 tasks

xythu mentioned this issue Sep 13, 2023

BUG: to_parquet set schema metadata to datetime64 instead of datetime64[us] when dtype of the column is datetime64[us] #55118

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement frequency table function a la table in R #170

Implement frequency table function a la table in R #170

wesm commented Sep 25, 2011

wesm commented Sep 25, 2011

gregglind commented Jan 13, 2012

wesm commented Jan 13, 2012

gregglind commented Jan 13, 2012

wesm commented Jan 13, 2012

wesm commented Jan 14, 2012

wesm commented Jan 16, 2012

Implement frequency table function a la table in R #170

Implement frequency table function a la table in R #170

Comments

wesm commented Sep 25, 2011

wesm commented Sep 25, 2011

gregglind commented Jan 13, 2012

wesm commented Jan 13, 2012

gregglind commented Jan 13, 2012

wesm commented Jan 13, 2012

wesm commented Jan 14, 2012

wesm commented Jan 16, 2012