Vfrdtyky

Question

What is pivot?

How do I pivot?

Is this a pivot?

Long format to wide format?

I've seen a lot of questions that ask about pivot tables. Even if they don't know that they are asking about pivot tables, they usually are. It is virtually impossible to write a canonical question and answer that encompasses all aspects of pivoting....

... But I'm going to give it a go.

The problem with existing questions and answers is that often the question is focused on a nuance that the OP has trouble generalizing in order to use a number of the existing good answers. However, none of the answers attempt to give a comprehensive explanation (because it's a daunting task)

Look a few examples from my google search

How to pivot a dataframe in Pandas?
- Good question and answer. But the answer only answers the specific question with little explanation.

pandas pivot table to data frame
- In this question, the OP is concerned with the output of the pivot. Namely how the columns look. OP wanted it to look like R. This isn't very helpful for pandas users.

pandas pivoting a dataframe, duplicate rows
- Another decent question but the answer focuses on one method, namely pd.DataFrame.pivot

So whenever someone searches for pivot they get sporadic results that are likely not going to answer their specific question.

Setup

You may notice that I conspicuously named my columns and relevant column values to correspond with how I'm going to pivot in the answers below. Pay attention so that you get familiar with where which column names go where to get the results you're looking for.

import numpy as np

import pandas as pd

from numpy.core.defchararray import add



np.random.seed([3,1415])

n = 20



cols = np.array(['key', 'row', 'item', 'col'])

arr1 = (np.random.randint(5, size=(n, 4)) // [2, 1, 2, 1]).astype(str)



df = pd.DataFrame(

    add(cols, arr1), columns=cols

).join(

    pd.DataFrame(np.random.rand(n, 2).round(2)).add_prefix('val')

)

print(df)



     key   row   item   col  val0  val1

0   key0  row3  item1  col3  0.81  0.04

1   key1  row2  item1  col2  0.44  0.07

2   key1  row0  item1  col0  0.77  0.01

3   key0  row4  item0  col2  0.15  0.59

4   key1  row0  item2  col1  0.81  0.64

5   key1  row2  item2  col4  0.13  0.88

6   key2  row4  item1  col3  0.88  0.39

7   key1  row4  item1  col1  0.10  0.07

8   key1  row0  item2  col4  0.65  0.02

9   key1  row2  item0  col2  0.35  0.61

10  key2  row0  item2  col1  0.40  0.85

11  key2  row4  item1  col2  0.64  0.25

12  key0  row2  item2  col3  0.50  0.44

13  key0  row4  item1  col4  0.24  0.46

14  key1  row3  item2  col3  0.28  0.11

15  key0  row3  item1  col1  0.31  0.23

16  key0  row0  item2  col3  0.86  0.01

17  key0  row4  item0  col3  0.64  0.21

18  key2  row2  item2  col0  0.13  0.45

19  key0  row2  item0  col4  0.37  0.70

Question(s)

Why do I get ValueError: Index contains duplicate entries, cannot reshape

How do I pivot df such that the col values are columns, row values are the index, and mean of val0 are the values?

col   col0   col1   col2   col3  col4

row                                  

row0  0.77  0.605    NaN  0.860  0.65

row2  0.13    NaN  0.395  0.500  0.25

row3   NaN  0.310    NaN  0.545   NaN

row4   NaN  0.100  0.395  0.760  0.24

How do I pivot df such that the col values are columns, row values are the index, mean of val0 are the values, and missing values are 0?

col   col0   col1   col2   col3  col4

row                                  

row0  0.77  0.605  0.000  0.860  0.65

row2  0.13  0.000  0.395  0.500  0.25

row3  0.00  0.310  0.000  0.545  0.00

row4  0.00  0.100  0.395  0.760  0.24

Can I get something other than mean, like maybe sum?

col   col0  col1  col2  col3  col4

row                               

row0  0.77  1.21  0.00  0.86  0.65

row2  0.13  0.00  0.79  0.50  0.50

row3  0.00  0.31  0.00  1.09  0.00

row4  0.00  0.10  0.79  1.52  0.24

Can I do more that one aggregation at a time?

       sum                          mean                           

col   col0  col1  col2  col3  col4  col0   col1   col2   col3  col4

row                                                                

row0  0.77  1.21  0.00  0.86  0.65  0.77  0.605  0.000  0.860  0.65

row2  0.13  0.00  0.79  0.50  0.50  0.13  0.000  0.395  0.500  0.25

row3  0.00  0.31  0.00  1.09  0.00  0.00  0.310  0.000  0.545  0.00

row4  0.00  0.10  0.79  1.52  0.24  0.00  0.100  0.395  0.760  0.24

Can I aggregate over multiple value columns?

      val0                             val1                          

col   col0   col1   col2   col3  col4  col0   col1  col2   col3  col4

row                                                                  

row0  0.77  0.605  0.000  0.860  0.65  0.01  0.745  0.00  0.010  0.02

row2  0.13  0.000  0.395  0.500  0.25  0.45  0.000  0.34  0.440  0.79

row3  0.00  0.310  0.000  0.545  0.00  0.00  0.230  0.00  0.075  0.00

row4  0.00  0.100  0.395  0.760  0.24  0.00  0.070  0.42  0.300  0.46

Can Subdivide by multiple columns?

item item0             item1                         item2                   

col   col2  col3  col4  col0  col1  col2  col3  col4  col0   col1  col3  col4

row                                                                          

row0  0.00  0.00  0.00  0.77  0.00  0.00  0.00  0.00  0.00  0.605  0.86  0.65

row2  0.35  0.00  0.37  0.00  0.00  0.44  0.00  0.00  0.13  0.000  0.50  0.13

row3  0.00  0.00  0.00  0.00  0.31  0.00  0.81  0.00  0.00  0.000  0.28  0.00

row4  0.15  0.64  0.00  0.00  0.10  0.64  0.88  0.24  0.00  0.000  0.00  0.00

Or

item      item0             item1                         item2                  

col        col2  col3  col4  col0  col1  col2  col3  col4  col0  col1  col3  col4

key  row                                                                         

key0 row0  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.86  0.00

     row2  0.00  0.00  0.37  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.50  0.00

     row3  0.00  0.00  0.00  0.00  0.31  0.00  0.81  0.00  0.00  0.00  0.00  0.00

     row4  0.15  0.64  0.00  0.00  0.00  0.00  0.00  0.24  0.00  0.00  0.00  0.00

key1 row0  0.00  0.00  0.00  0.77  0.00  0.00  0.00  0.00  0.00  0.81  0.00  0.65

     row2  0.35  0.00  0.00  0.00  0.00  0.44  0.00  0.00  0.00  0.00  0.00  0.13

     row3  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.28  0.00

     row4  0.00  0.00  0.00  0.00  0.10  0.00  0.00  0.00  0.00  0.00  0.00  0.00

key2 row0  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.40  0.00  0.00

     row2  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.13  0.00  0.00  0.00

     row4  0.00  0.00  0.00  0.00  0.00  0.64  0.88  0.00  0.00  0.00  0.00  0.00

Can I aggregate the frequency in which the column and rows occur together, aka "cross tabulation"?

col   col0  col1  col2  col3  col4

row                               

row0     1     2     0     1     1

row2     1     0     2     1     2

row3     0     1     0     2     0

row4     0     1     2     2     1

score 148 · Accepted Answer · 2017-11-14 17:13:35Z

We start by answering the first question:

Question 1

Why do I get ValueError: Index contains duplicate entries, cannot reshape

This occurs because pandas is attempting to reindex either a columns or index object with duplicate entries. There are varying methods to use that can perform a pivot. Some of them are not well suited to when there are duplicates of the keys in which it is being asked to pivot on. For example. Consider pd.DataFrame.pivot. I know there are duplicate entries that share the row and col values:

df.duplicated(['row', 'col']).any()



True

So when I pivot using

df.pivot(index='row', columns='col', values='val0')

I get the error mentioned above. In fact, I get the same error when I try to perform the same task with:

df.set_index(['row', 'col'])['val0'].unstack()

Here is a list of idioms we can use to pivot

pd.DataFrame.groupby + pd.DataFrame.unstack
- Good general approach for doing just about any type of pivot
- You specify all columns that will constitute the pivoted row levels and column levels in one group by. You follow that by selecting the remaining columns you want to aggregate and the function(s) you want to perform the aggregation. Finally, you unstack the levels that you want to be in the column index.

pd.DataFrame.pivot_table
- A glorified version of groupby with more intuitive API. For many people, this is the preferred approach. And is the intended approach by the developers.
- Specify row level, column levels, values to be aggregated, and function(s) to perform aggregations.

pd.DataFrame.set_index + pd.DataFrame.unstack
- Convenient and intuitive for some (myself included). Cannot handle duplicate grouped keys.
- Similar to the groupby paradigm, we specify all columns that will eventually be either row or column levels and set those to be the index. We then unstack the levels we want in the columns. If either the remaining index levels or column levels are not unique, this method will fail.

pd.DataFrame.pivot
- Very similar to set_index in that it shares the duplicate key limitation. The API is very limited as well. It only takes scalar values for index, columns, values.
- Similar to the pivot_table method in that we select rows, columns, and values on which to pivot. However, we cannot aggregate and if either rows or columns are not unique, this method will fail.

pd.crosstab
- This a specialized version of pivot_table and in it's purest form is the most intuitive way to perform several tasks.

pd.factorize + np.bincount
- This is a highly advanced technique that is very obscure but is very fast. It cannot be used in all circumstances, but when it can be used and you are comfortable using it, you will reap the performance rewards.

pd.get_dummies + pd.DataFrame.dot
- I use this for cleverly performing cross tabulation.

Examples

What I'm going to do for each subsequent answer and question is to answer it using pd.DataFrame.pivot_table. Then I'll provide alternatives to perform the same task.

Question 3

How do I pivot df such that the col values are columns, row values are the index, mean of val0 are the values, and missing values are 0?

pd.DataFrame.pivot_table

fill_value is not set by default. I tend to set it appropriately. In this case I set it to 0. Notice I skipped question 2 as it's the same as this answer without the fill_value

aggfunc='mean' is the default and I didn't have to set it. I included it to be explicit.

df.pivot_table(

    values='val0', index='row', columns='col',

    fill_value=0, aggfunc='mean')



col   col0   col1   col2   col3  col4

row                                  

row0  0.77  0.605  0.000  0.860  0.65

row2  0.13  0.000  0.395  0.500  0.25

row3  0.00  0.310  0.000  0.545  0.00

row4  0.00  0.100  0.395  0.760  0.24

pd.DataFrame.groupby

df.groupby(['row', 'col'])['val0'].mean().unstack(fill_value=0)

pd.crosstab

pd.crosstab(

    index=df['row'], columns=df['col'],

    values=df['val0'], aggfunc='mean').fillna(0)

Question 4

Can I get something other than mean, like maybe sum?

pd.DataFrame.pivot_table

df.pivot_table(

    values='val0', index='row', columns='col',

    fill_value=0, aggfunc='sum')



col   col0  col1  col2  col3  col4

row                               

row0  0.77  1.21  0.00  0.86  0.65

row2  0.13  0.00  0.79  0.50  0.50

row3  0.00  0.31  0.00  1.09  0.00

row4  0.00  0.10  0.79  1.52  0.24

pd.DataFrame.groupby

df.groupby(['row', 'col'])['val0'].sum().unstack(fill_value=0)

pd.crosstab

pd.crosstab(

    index=df['row'], columns=df['col'],

    values=df['val0'], aggfunc='sum').fillna(0)

Question 5

Can I do more that one aggregation at a time?

Notice that for pivot_table and cross_tab I needed to pass list of callables. On the other hand, groupby.agg is able to take strings for a limited number of special functions. groupby.agg would also have taken the same callables we passed to the others, but it is often more efficient to leverage the string function names as there are efficiencies to be gained.

pd.DataFrame.pivot_table

df.pivot_table(

    values='val0', index='row', columns='col',

    fill_value=0, aggfunc=[np.size, np.mean])



     size                      mean                           

col  col0 col1 col2 col3 col4  col0   col1   col2   col3  col4

row                                                           

row0    1    2    0    1    1  0.77  0.605  0.000  0.860  0.65

row2    1    0    2    1    2  0.13  0.000  0.395  0.500  0.25

row3    0    1    0    2    0  0.00  0.310  0.000  0.545  0.00

row4    0    1    2    2    1  0.00  0.100  0.395  0.760  0.24

pd.DataFrame.groupby

df.groupby(['row', 'col'])['val0'].agg(['size', 'mean']).unstack(fill_value=0)

pd.crosstab

pd.crosstab(

    index=df['row'], columns=df['col'],

    values=df['val0'], aggfunc=[np.size, np.mean]).fillna(0, downcast='infer')

Question 6

Can I aggregate over multiple value columns?

pd.DataFrame.pivot_table we pass values=['val0', 'val1'] but we could've left that off completely

df.pivot_table(

    values=['val0', 'val1'], index='row', columns='col',

    fill_value=0, aggfunc='mean')



      val0                             val1                          

col   col0   col1   col2   col3  col4  col0   col1  col2   col3  col4

row                                                                  

row0  0.77  0.605  0.000  0.860  0.65  0.01  0.745  0.00  0.010  0.02

row2  0.13  0.000  0.395  0.500  0.25  0.45  0.000  0.34  0.440  0.79

row3  0.00  0.310  0.000  0.545  0.00  0.00  0.230  0.00  0.075  0.00

row4  0.00  0.100  0.395  0.760  0.24  0.00  0.070  0.42  0.300  0.46

pd.DataFrame.groupby

df.groupby(['row', 'col'])['val0', 'val1'].mean().unstack(fill_value=0)

Question 7

Can Subdivide by multiple columns?

pd.DataFrame.pivot_table

df.pivot_table(

    values='val0', index='row', columns=['item', 'col'],

    fill_value=0, aggfunc='mean')



item item0             item1                         item2                   

col   col2  col3  col4  col0  col1  col2  col3  col4  col0   col1  col3  col4

row                                                                          

row0  0.00  0.00  0.00  0.77  0.00  0.00  0.00  0.00  0.00  0.605  0.86  0.65

row2  0.35  0.00  0.37  0.00  0.00  0.44  0.00  0.00  0.13  0.000  0.50  0.13

row3  0.00  0.00  0.00  0.00  0.31  0.00  0.81  0.00  0.00  0.000  0.28  0.00

row4  0.15  0.64  0.00  0.00  0.10  0.64  0.88  0.24  0.00  0.000  0.00  0.00

pd.DataFrame.groupby

df.groupby(

    ['row', 'item', 'col']

)['val0'].mean().unstack(['item', 'col']).fillna(0).sort_index(1)

Question 8

Can Subdivide by multiple columns?

pd.DataFrame.pivot_table

df.pivot_table(

    values='val0', index=['key', 'row'], columns=['item', 'col'],

    fill_value=0, aggfunc='mean')



item      item0             item1                         item2                  

col        col2  col3  col4  col0  col1  col2  col3  col4  col0  col1  col3  col4

key  row                                                                         

key0 row0  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.86  0.00

     row2  0.00  0.00  0.37  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.50  0.00

     row3  0.00  0.00  0.00  0.00  0.31  0.00  0.81  0.00  0.00  0.00  0.00  0.00

     row4  0.15  0.64  0.00  0.00  0.00  0.00  0.00  0.24  0.00  0.00  0.00  0.00

key1 row0  0.00  0.00  0.00  0.77  0.00  0.00  0.00  0.00  0.00  0.81  0.00  0.65

     row2  0.35  0.00  0.00  0.00  0.00  0.44  0.00  0.00  0.00  0.00  0.00  0.13

     row3  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.28  0.00

     row4  0.00  0.00  0.00  0.00  0.10  0.00  0.00  0.00  0.00  0.00  0.00  0.00

key2 row0  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.40  0.00  0.00

     row2  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.13  0.00  0.00  0.00

     row4  0.00  0.00  0.00  0.00  0.00  0.64  0.88  0.00  0.00  0.00  0.00  0.00

pd.DataFrame.groupby

df.groupby(

    ['key', 'row', 'item', 'col']

)['val0'].mean().unstack(['item', 'col']).fillna(0).sort_index(1)

pd.DataFrame.set_index because the set of keys are unique for both rows and columns

df.set_index(

    ['key', 'row', 'item', 'col']

)['val0'].unstack(['item', 'col']).fillna(0).sort_index(1)

Question 9

Can I aggregate the frequency in which the column and rows occur together, aka "cross tabulation"?

pd.DataFrame.pivot_table

df.pivot_table(index='row', columns='col', fill_value=0, aggfunc='size')



    col   col0  col1  col2  col3  col4

row                               

row0     1     2     0     1     1

row2     1     0     2     1     2

row3     0     1     0     2     0

row4     0     1     2     2     1

pd.DataFrame.groupby

df.groupby(['row', 'col'])['val0'].size().unstack(fill_value=0)

pd.cross_tab
```
pd.crosstab(df['row'], df['col'])
```

pd.factorize + np.bincount

# get integer factorization `i` and unique values `r`

# for column `'row'`

i, r = pd.factorize(df['row'].values)

# get integer factorization `j` and unique values `c`

# for column `'col'`

j, c = pd.factorize(df['col'].values)

# `n` will be the number of rows

# `m` will be the number of columns

n, m = r.size, c.size

# `i * m + j` is a clever way of counting the 

# factorization bins assuming a flat array of length

# `n * m`.  Which is why we subsequently reshape as `(n, m)`

b = np.bincount(i * m + j, minlength=n * m).reshape(n, m)

# BTW, whenever I read this, I think 'Bean, Rice, and Cheese'

pd.DataFrame(b, r, c)



      col3  col2  col0  col1  col4

row3     2     0     0     1     0

row2     1     2     1     0     2

row0     1     0     1     2     1

row4     2     2     0     1     1

pd.get_dummies

pd.get_dummies(df['row']).T.dot(pd.get_dummies(df['col']))



      col0  col1  col2  col3  col4

row0     1     2     0     1     1

row2     1     0     2     1     2

row3     0     1     0     2     0

row4     0     1     2     2     1

Woah! I missed this 'better than documents' answer all together. Good thing is, I actually get to upvote twice;) — Nov 13 '17 at 19:04

score 148 · Accepted Answer · 2017-11-14 17:13:35Z

We start by answering the first question:

Question 1

Why do I get ValueError: Index contains duplicate entries, cannot reshape

This occurs because pandas is attempting to reindex either a columns or index object with duplicate entries. There are varying methods to use that can perform a pivot. Some of them are not well suited to when there are duplicates of the keys in which it is being asked to pivot on. For example. Consider pd.DataFrame.pivot. I know there are duplicate entries that share the row and col values:

df.duplicated(['row', 'col']).any()



True

So when I pivot using

df.pivot(index='row', columns='col', values='val0')

I get the error mentioned above. In fact, I get the same error when I try to perform the same task with:

df.set_index(['row', 'col'])['val0'].unstack()

Here is a list of idioms we can use to pivot

pd.DataFrame.groupby + pd.DataFrame.unstack
- Good general approach for doing just about any type of pivot
- You specify all columns that will constitute the pivoted row levels and column levels in one group by. You follow that by selecting the remaining columns you want to aggregate and the function(s) you want to perform the aggregation. Finally, you unstack the levels that you want to be in the column index.

pd.DataFrame.pivot_table
- A glorified version of groupby with more intuitive API. For many people, this is the preferred approach. And is the intended approach by the developers.
- Specify row level, column levels, values to be aggregated, and function(s) to perform aggregations.

pd.DataFrame.set_index + pd.DataFrame.unstack
- Convenient and intuitive for some (myself included). Cannot handle duplicate grouped keys.
- Similar to the groupby paradigm, we specify all columns that will eventually be either row or column levels and set those to be the index. We then unstack the levels we want in the columns. If either the remaining index levels or column levels are not unique, this method will fail.

pd.DataFrame.pivot
- Very similar to set_index in that it shares the duplicate key limitation. The API is very limited as well. It only takes scalar values for index, columns, values.
- Similar to the pivot_table method in that we select rows, columns, and values on which to pivot. However, we cannot aggregate and if either rows or columns are not unique, this method will fail.

pd.crosstab
- This a specialized version of pivot_table and in it's purest form is the most intuitive way to perform several tasks.

pd.factorize + np.bincount
- This is a highly advanced technique that is very obscure but is very fast. It cannot be used in all circumstances, but when it can be used and you are comfortable using it, you will reap the performance rewards.

pd.get_dummies + pd.DataFrame.dot
- I use this for cleverly performing cross tabulation.

Examples

What I'm going to do for each subsequent answer and question is to answer it using pd.DataFrame.pivot_table. Then I'll provide alternatives to perform the same task.

Question 3

How do I pivot df such that the col values are columns, row values are the index, mean of val0 are the values, and missing values are 0?

pd.DataFrame.pivot_table

fill_value is not set by default. I tend to set it appropriately. In this case I set it to 0. Notice I skipped question 2 as it's the same as this answer without the fill_value

aggfunc='mean' is the default and I didn't have to set it. I included it to be explicit.

df.pivot_table(

    values='val0', index='row', columns='col',

    fill_value=0, aggfunc='mean')



col   col0   col1   col2   col3  col4

row                                  

row0  0.77  0.605  0.000  0.860  0.65

row2  0.13  0.000  0.395  0.500  0.25

row3  0.00  0.310  0.000  0.545  0.00

row4  0.00  0.100  0.395  0.760  0.24

pd.DataFrame.groupby

df.groupby(['row', 'col'])['val0'].mean().unstack(fill_value=0)

pd.crosstab

pd.crosstab(

    index=df['row'], columns=df['col'],

    values=df['val0'], aggfunc='mean').fillna(0)

Question 4

Can I get something other than mean, like maybe sum?

pd.DataFrame.pivot_table

df.pivot_table(

    values='val0', index='row', columns='col',

    fill_value=0, aggfunc='sum')



col   col0  col1  col2  col3  col4

row                               

row0  0.77  1.21  0.00  0.86  0.65

row2  0.13  0.00  0.79  0.50  0.50

row3  0.00  0.31  0.00  1.09  0.00

row4  0.00  0.10  0.79  1.52  0.24

pd.DataFrame.groupby

df.groupby(['row', 'col'])['val0'].sum().unstack(fill_value=0)

pd.crosstab

pd.crosstab(

    index=df['row'], columns=df['col'],

    values=df['val0'], aggfunc='sum').fillna(0)

Question 5

Can I do more that one aggregation at a time?

Notice that for pivot_table and cross_tab I needed to pass list of callables. On the other hand, groupby.agg is able to take strings for a limited number of special functions. groupby.agg would also have taken the same callables we passed to the others, but it is often more efficient to leverage the string function names as there are efficiencies to be gained.

pd.DataFrame.pivot_table

df.pivot_table(

    values='val0', index='row', columns='col',

    fill_value=0, aggfunc=[np.size, np.mean])



     size                      mean                           

col  col0 col1 col2 col3 col4  col0   col1   col2   col3  col4

row                                                           

row0    1    2    0    1    1  0.77  0.605  0.000  0.860  0.65

row2    1    0    2    1    2  0.13  0.000  0.395  0.500  0.25

row3    0    1    0    2    0  0.00  0.310  0.000  0.545  0.00

row4    0    1    2    2    1  0.00  0.100  0.395  0.760  0.24

pd.DataFrame.groupby

df.groupby(['row', 'col'])['val0'].agg(['size', 'mean']).unstack(fill_value=0)

pd.crosstab

pd.crosstab(

    index=df['row'], columns=df['col'],

    values=df['val0'], aggfunc=[np.size, np.mean]).fillna(0, downcast='infer')

Question 6

Can I aggregate over multiple value columns?

pd.DataFrame.pivot_table we pass values=['val0', 'val1'] but we could've left that off completely

df.pivot_table(

    values=['val0', 'val1'], index='row', columns='col',

    fill_value=0, aggfunc='mean')



      val0                             val1                          

col   col0   col1   col2   col3  col4  col0   col1  col2   col3  col4

row                                                                  

row0  0.77  0.605  0.000  0.860  0.65  0.01  0.745  0.00  0.010  0.02

row2  0.13  0.000  0.395  0.500  0.25  0.45  0.000  0.34  0.440  0.79

row3  0.00  0.310  0.000  0.545  0.00  0.00  0.230  0.00  0.075  0.00

row4  0.00  0.100  0.395  0.760  0.24  0.00  0.070  0.42  0.300  0.46

pd.DataFrame.groupby

df.groupby(['row', 'col'])['val0', 'val1'].mean().unstack(fill_value=0)

Question 7

Can Subdivide by multiple columns?

pd.DataFrame.pivot_table

df.pivot_table(

    values='val0', index='row', columns=['item', 'col'],

    fill_value=0, aggfunc='mean')



item item0             item1                         item2                   

col   col2  col3  col4  col0  col1  col2  col3  col4  col0   col1  col3  col4

row                                                                          

row0  0.00  0.00  0.00  0.77  0.00  0.00  0.00  0.00  0.00  0.605  0.86  0.65

row2  0.35  0.00  0.37  0.00  0.00  0.44  0.00  0.00  0.13  0.000  0.50  0.13

row3  0.00  0.00  0.00  0.00  0.31  0.00  0.81  0.00  0.00  0.000  0.28  0.00

row4  0.15  0.64  0.00  0.00  0.10  0.64  0.88  0.24  0.00  0.000  0.00  0.00

pd.DataFrame.groupby

df.groupby(

    ['row', 'item', 'col']

)['val0'].mean().unstack(['item', 'col']).fillna(0).sort_index(1)

Question 8

Can Subdivide by multiple columns?

pd.DataFrame.pivot_table

df.pivot_table(

    values='val0', index=['key', 'row'], columns=['item', 'col'],

    fill_value=0, aggfunc='mean')



item      item0             item1                         item2                  

col        col2  col3  col4  col0  col1  col2  col3  col4  col0  col1  col3  col4

key  row                                                                         

key0 row0  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.86  0.00

     row2  0.00  0.00  0.37  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.50  0.00

     row3  0.00  0.00  0.00  0.00  0.31  0.00  0.81  0.00  0.00  0.00  0.00  0.00

     row4  0.15  0.64  0.00  0.00  0.00  0.00  0.00  0.24  0.00  0.00  0.00  0.00

key1 row0  0.00  0.00  0.00  0.77  0.00  0.00  0.00  0.00  0.00  0.81  0.00  0.65

     row2  0.35  0.00  0.00  0.00  0.00  0.44  0.00  0.00  0.00  0.00  0.00  0.13

     row3  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.28  0.00

     row4  0.00  0.00  0.00  0.00  0.10  0.00  0.00  0.00  0.00  0.00  0.00  0.00

key2 row0  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.40  0.00  0.00

     row2  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.13  0.00  0.00  0.00

     row4  0.00  0.00  0.00  0.00  0.00  0.64  0.88  0.00  0.00  0.00  0.00  0.00

pd.DataFrame.groupby

df.groupby(

    ['key', 'row', 'item', 'col']

)['val0'].mean().unstack(['item', 'col']).fillna(0).sort_index(1)

pd.DataFrame.set_index because the set of keys are unique for both rows and columns

df.set_index(

    ['key', 'row', 'item', 'col']

)['val0'].unstack(['item', 'col']).fillna(0).sort_index(1)

Question 9

Can I aggregate the frequency in which the column and rows occur together, aka "cross tabulation"?

pd.DataFrame.pivot_table

df.pivot_table(index='row', columns='col', fill_value=0, aggfunc='size')



    col   col0  col1  col2  col3  col4

row                               

row0     1     2     0     1     1

row2     1     0     2     1     2

row3     0     1     0     2     0

row4     0     1     2     2     1

pd.DataFrame.groupby

df.groupby(['row', 'col'])['val0'].size().unstack(fill_value=0)

pd.cross_tab
```
pd.crosstab(df['row'], df['col'])
```

pd.factorize + np.bincount

# get integer factorization `i` and unique values `r`

# for column `'row'`

i, r = pd.factorize(df['row'].values)

# get integer factorization `j` and unique values `c`

# for column `'col'`

j, c = pd.factorize(df['col'].values)

# `n` will be the number of rows

# `m` will be the number of columns

n, m = r.size, c.size

# `i * m + j` is a clever way of counting the 

# factorization bins assuming a flat array of length

# `n * m`.  Which is why we subsequently reshape as `(n, m)`

b = np.bincount(i * m + j, minlength=n * m).reshape(n, m)

# BTW, whenever I read this, I think 'Bean, Rice, and Cheese'

pd.DataFrame(b, r, c)



      col3  col2  col0  col1  col4

row3     2     0     0     1     0

row2     1     2     1     0     2

row0     1     0     1     2     1

row4     2     2     0     1     1

pd.get_dummies

pd.get_dummies(df['row']).T.dot(pd.get_dummies(df['col']))



      col0  col1  col2  col3  col4

row0     1     2     0     1     1

row2     1     0     2     1     2

row3     0     1     0     2     0

row4     0     1     2     2     1

Woah! I missed this 'better than documents' answer all together. Good thing is, I actually get to upvote twice;) — Nov 13 '17 at 19:04

How to pivot a dataframe

Setup

Question(s)

Setup

Question(s)

Setup

Question(s)

Setup

Question(s)

1 Answer 1

Question 1

Examples

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Your Answer

Sign up or log in

Post as a guest

Post as a guest

1 Answer 1

1 Answer 1

Question 1

Examples

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 1

Examples

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 1

Examples

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 1

Examples

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Bressuire

Vorschmack

Xamarin.iOS Cant Deploy on Iphone

1 Answer
1

1 Answer
1

1 Answer
1