Frequency of repetitive position in pandas data frame

up vote
1
down vote

favorite

Hi I am working to find out repetitive position of the following data frame:

data = pd.DataFrame()

data ['league'] =['A','A','A','A','A','A','B','B','B']

data ['Team'] = ['X','X','X','Y','Y','Y','Z','Z','Z']

data ['week'] =[1,2,3,1,2,3,1,2,3]

data ['position']= [1,1,2,2,2,1,2,3,4]

I will compare the data for position from previous row, it is it the same, I will assign one. If it is different previous row, I will assign as 1

My expected outcome will be as follow:

enter image description here

It means I will group by (League, Team and week) and work out the frequency.
Can anyone advise how to do that in Pandas

Thanks,

Zep

edited Nov 12 at 9:19

asked Nov 12 at 9:00

Zephyr

42310

add a comment |

up vote
1
down vote

favorite

Hi I am working to find out repetitive position of the following data frame:

data = pd.DataFrame()

data ['league'] =['A','A','A','A','A','A','B','B','B']

data ['Team'] = ['X','X','X','Y','Y','Y','Z','Z','Z']

data ['week'] =[1,2,3,1,2,3,1,2,3]

data ['position']= [1,1,2,2,2,1,2,3,4]

I will compare the data for position from previous row, it is it the same, I will assign one. If it is different previous row, I will assign as 1

My expected outcome will be as follow:

enter image description here

It means I will group by (League, Team and week) and work out the frequency.
Can anyone advise how to do that in Pandas

Thanks,

Zep

edited Nov 12 at 9:19

asked Nov 12 at 9:00

Zephyr

42310

add a comment |

up vote
1
down vote

favorite

Hi I am working to find out repetitive position of the following data frame:

data = pd.DataFrame()

data ['league'] =['A','A','A','A','A','A','B','B','B']

data ['Team'] = ['X','X','X','Y','Y','Y','Z','Z','Z']

data ['week'] =[1,2,3,1,2,3,1,2,3]

data ['position']= [1,1,2,2,2,1,2,3,4]

I will compare the data for position from previous row, it is it the same, I will assign one. If it is different previous row, I will assign as 1

My expected outcome will be as follow:

enter image description here

It means I will group by (League, Team and week) and work out the frequency.
Can anyone advise how to do that in Pandas

Thanks,

Zep

edited Nov 12 at 9:19

asked Nov 12 at 9:00

Zephyr

42310

Hi I am working to find out repetitive position of the following data frame:

data = pd.DataFrame()

data ['league'] =['A','A','A','A','A','A','B','B','B']

data ['Team'] = ['X','X','X','Y','Y','Y','Z','Z','Z']

data ['week'] =[1,2,3,1,2,3,1,2,3]

data ['position']= [1,1,2,2,2,1,2,3,4]

I will compare the data for position from previous row, it is it the same, I will assign one. If it is different previous row, I will assign as 1

My expected outcome will be as follow:

enter image description here

It means I will group by (League, Team and week) and work out the frequency.
Can anyone advise how to do that in Pandas

Thanks,

Zep

python pandas

edited Nov 12 at 9:19

asked Nov 12 at 9:00

Zephyr

42310

edited Nov 12 at 9:19

asked Nov 12 at 9:00

Zephyr

42310

edited Nov 12 at 9:19

asked Nov 12 at 9:00

Zephyr

42310

asked Nov 12 at 9:00

Zephyr

42310

asked Nov 12 at 9:00

Zephyr

42310

add a comment |

2 Answers
2

active

oldest

votes

up vote
1
down vote

accepted

Use diff, and compare against 0:

v = df.position.diff()

v[0] = 0

df['frequency'] = v.ne(0).astype(int)



print(df)

  league Team  week  position  frequency

0      A    X     1         1          0

1      A    X     2         1          0

2      A    X     3         2          1

3      A    Y     1         2          0

4      A    Y     2         2          0

5      A    Y     3         1          1

6      B    Z     1         2          1

7      B    Z     2         3          1

8      B    Z     3         4          1

For performance reasons, you should try to avoid a fillna call.

df = pd.concat([df] * 100000, ignore_index=True)



%timeit df['frequency'] = df['position'].diff().abs().fillna(0,downcast='infer')

%%timeit

v = df.position.diff()

v[0] = 0

df['frequency'] = v.ne(0).astype(int)



83.7 ms ± 1.55 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

10.9 ms ± 217 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

To extend this answer to work in a groupby, use

v = df.groupby(['league', 'Team', 'week']).position.diff()

v[np.isnan(v)] = 0



df['frequency'] = v.ne(0).astype(int)

answered Nov 12 at 9:20

coldspeed

115k18106185

@ coldspeed, what if simple data['Freq'] = data.position.diff().fillna("0")
– pygo
Nov 12 at 9:27

@pygo Simple but slower, hence avoided.
– coldspeed
Nov 12 at 9:28

Thanks coldspeed. How about if position in week 1 has to be zero as it doesn’t hv any previous value if we group by. I mean I am tracking position changes from week one(this is just start of tracking)
– Zephyr
Nov 12 at 9:29

@Zephyr week 1 is zero by default for all groups (as per my understanding of your problem).
– coldspeed
Nov 12 at 9:34

@coldspeed, what is v[0] = 0 as assiging it zero
– pygo
Nov 12 at 9:35

|
show 5 more comments

up vote
1
down vote

Use diff and abs with fillna:

data['frequency'] = data['position'].diff().abs().fillna(0,downcast='infer')



print(data)

  league Team  week  position  frequency

0      A    X     1         1          0

1      A    X     2         1          0

2      A    X     3         2          1

3      A    Y     1         2          0

4      A    Y     2         2          0

5      A    Y     3         1          1

6      B    Z     1         2          1

7      B    Z     2         3          1

8      B    Z     3         4          1

Using groupby gives all zeros, since you are comparing within groups not on whole dataframe.

data.groupby(['league', 'Team', 'week'])['position'].diff().fillna(0,downcast='infer')



0    0

1    0

2    0

3    0

4    0

5    0

6    0

7    0

8    0

Name: position, dtype: int64

edited Nov 12 at 9:28

answered Nov 12 at 9:03

Sandeep Kadapa

5,642427

Thanks Sandeep. If I want to groupby with League,team, week then work out the frequency, how would I add that. The sample data frame is already sorted but actual data is random.
– Zephyr
Nov 12 at 9:21

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53258755%2ffrequency-of-repetitive-position-in-pandas-data-frame%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
1
down vote

accepted

Use diff, and compare against 0:

v = df.position.diff()

v[0] = 0

df['frequency'] = v.ne(0).astype(int)



print(df)

  league Team  week  position  frequency

0      A    X     1         1          0

1      A    X     2         1          0

2      A    X     3         2          1

3      A    Y     1         2          0

4      A    Y     2         2          0

5      A    Y     3         1          1

6      B    Z     1         2          1

7      B    Z     2         3          1

8      B    Z     3         4          1

For performance reasons, you should try to avoid a fillna call.

df = pd.concat([df] * 100000, ignore_index=True)



%timeit df['frequency'] = df['position'].diff().abs().fillna(0,downcast='infer')

%%timeit

v = df.position.diff()

v[0] = 0

df['frequency'] = v.ne(0).astype(int)



83.7 ms ± 1.55 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

10.9 ms ± 217 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

To extend this answer to work in a groupby, use

v = df.groupby(['league', 'Team', 'week']).position.diff()

v[np.isnan(v)] = 0



df['frequency'] = v.ne(0).astype(int)

answered Nov 12 at 9:20

coldspeed

115k18106185

@ coldspeed, what if simple data['Freq'] = data.position.diff().fillna("0")
– pygo
Nov 12 at 9:27

@pygo Simple but slower, hence avoided.
– coldspeed
Nov 12 at 9:28

Thanks coldspeed. How about if position in week 1 has to be zero as it doesn’t hv any previous value if we group by. I mean I am tracking position changes from week one(this is just start of tracking)
– Zephyr
Nov 12 at 9:29

@Zephyr week 1 is zero by default for all groups (as per my understanding of your problem).
– coldspeed
Nov 12 at 9:34

@coldspeed, what is v[0] = 0 as assiging it zero
– pygo
Nov 12 at 9:35

|
show 5 more comments

up vote
1
down vote

accepted

Use diff, and compare against 0:

v = df.position.diff()

v[0] = 0

df['frequency'] = v.ne(0).astype(int)



print(df)

  league Team  week  position  frequency

0      A    X     1         1          0

1      A    X     2         1          0

2      A    X     3         2          1

3      A    Y     1         2          0

4      A    Y     2         2          0

5      A    Y     3         1          1

6      B    Z     1         2          1

7      B    Z     2         3          1

8      B    Z     3         4          1

For performance reasons, you should try to avoid a fillna call.

df = pd.concat([df] * 100000, ignore_index=True)



%timeit df['frequency'] = df['position'].diff().abs().fillna(0,downcast='infer')

%%timeit

v = df.position.diff()

v[0] = 0

df['frequency'] = v.ne(0).astype(int)



83.7 ms ± 1.55 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

10.9 ms ± 217 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

To extend this answer to work in a groupby, use

v = df.groupby(['league', 'Team', 'week']).position.diff()

v[np.isnan(v)] = 0



df['frequency'] = v.ne(0).astype(int)

answered Nov 12 at 9:20

coldspeed

115k18106185

@ coldspeed, what if simple data['Freq'] = data.position.diff().fillna("0")
– pygo
Nov 12 at 9:27

@pygo Simple but slower, hence avoided.
– coldspeed
Nov 12 at 9:28

Thanks coldspeed. How about if position in week 1 has to be zero as it doesn’t hv any previous value if we group by. I mean I am tracking position changes from week one(this is just start of tracking)
– Zephyr
Nov 12 at 9:29

@Zephyr week 1 is zero by default for all groups (as per my understanding of your problem).
– coldspeed
Nov 12 at 9:34

@coldspeed, what is v[0] = 0 as assiging it zero
– pygo
Nov 12 at 9:35

|
show 5 more comments

up vote
1
down vote

accepted

Use diff, and compare against 0:

v = df.position.diff()

v[0] = 0

df['frequency'] = v.ne(0).astype(int)



print(df)

  league Team  week  position  frequency

0      A    X     1         1          0

1      A    X     2         1          0

2      A    X     3         2          1

3      A    Y     1         2          0

4      A    Y     2         2          0

5      A    Y     3         1          1

6      B    Z     1         2          1

7      B    Z     2         3          1

8      B    Z     3         4          1

For performance reasons, you should try to avoid a fillna call.

df = pd.concat([df] * 100000, ignore_index=True)



%timeit df['frequency'] = df['position'].diff().abs().fillna(0,downcast='infer')

%%timeit

v = df.position.diff()

v[0] = 0

df['frequency'] = v.ne(0).astype(int)



83.7 ms ± 1.55 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

10.9 ms ± 217 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

To extend this answer to work in a groupby, use

v = df.groupby(['league', 'Team', 'week']).position.diff()

v[np.isnan(v)] = 0



df['frequency'] = v.ne(0).astype(int)

answered Nov 12 at 9:20

coldspeed

115k18106185

Use diff, and compare against 0:

v = df.position.diff()

v[0] = 0

df['frequency'] = v.ne(0).astype(int)



print(df)

  league Team  week  position  frequency

0      A    X     1         1          0

1      A    X     2         1          0

2      A    X     3         2          1

3      A    Y     1         2          0

4      A    Y     2         2          0

5      A    Y     3         1          1

6      B    Z     1         2          1

7      B    Z     2         3          1

8      B    Z     3         4          1

For performance reasons, you should try to avoid a fillna call.

df = pd.concat([df] * 100000, ignore_index=True)



%timeit df['frequency'] = df['position'].diff().abs().fillna(0,downcast='infer')

%%timeit

v = df.position.diff()

v[0] = 0

df['frequency'] = v.ne(0).astype(int)



83.7 ms ± 1.55 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

10.9 ms ± 217 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

To extend this answer to work in a groupby, use

v = df.groupby(['league', 'Team', 'week']).position.diff()

v[np.isnan(v)] = 0



df['frequency'] = v.ne(0).astype(int)

answered Nov 12 at 9:20

coldspeed

115k18106185

answered Nov 12 at 9:20

coldspeed

115k18106185

answered Nov 12 at 9:20

coldspeed

115k18106185

answered Nov 12 at 9:20

coldspeed

115k18106185

@ coldspeed, what if simple data['Freq'] = data.position.diff().fillna("0")
– pygo
Nov 12 at 9:27

@pygo Simple but slower, hence avoided.
– coldspeed
Nov 12 at 9:28

Thanks coldspeed. How about if position in week 1 has to be zero as it doesn’t hv any previous value if we group by. I mean I am tracking position changes from week one(this is just start of tracking)
– Zephyr
Nov 12 at 9:29

@Zephyr week 1 is zero by default for all groups (as per my understanding of your problem).
– coldspeed
Nov 12 at 9:34

@coldspeed, what is v[0] = 0 as assiging it zero
– pygo
Nov 12 at 9:35

|
show 5 more comments

@ coldspeed, what if simple data['Freq'] = data.position.diff().fillna("0")
– pygo
Nov 12 at 9:27

@pygo Simple but slower, hence avoided.
– coldspeed
Nov 12 at 9:28

Thanks coldspeed. How about if position in week 1 has to be zero as it doesn’t hv any previous value if we group by. I mean I am tracking position changes from week one(this is just start of tracking)
– Zephyr
Nov 12 at 9:29

@Zephyr week 1 is zero by default for all groups (as per my understanding of your problem).
– coldspeed
Nov 12 at 9:34

@coldspeed, what is v[0] = 0 as assiging it zero
– pygo
Nov 12 at 9:35

@ coldspeed, what if simple data['Freq'] = data.position.diff().fillna("0")
– pygo
Nov 12 at 9:27

@pygo Simple but slower, hence avoided.
– coldspeed
Nov 12 at 9:28

Thanks coldspeed. How about if position in week 1 has to be zero as it doesn’t hv any previous value if we group by. I mean I am tracking position changes from week one(this is just start of tracking)
– Zephyr
Nov 12 at 9:29

@Zephyr week 1 is zero by default for all groups (as per my understanding of your problem).
– coldspeed
Nov 12 at 9:34

@coldspeed, what is v[0] = 0 as assiging it zero
– pygo
Nov 12 at 9:35

|
show 5 more comments

up vote
1
down vote

Use diff and abs with fillna:

data['frequency'] = data['position'].diff().abs().fillna(0,downcast='infer')



print(data)

  league Team  week  position  frequency

0      A    X     1         1          0

1      A    X     2         1          0

2      A    X     3         2          1

3      A    Y     1         2          0

4      A    Y     2         2          0

5      A    Y     3         1          1

6      B    Z     1         2          1

7      B    Z     2         3          1

8      B    Z     3         4          1

Using groupby gives all zeros, since you are comparing within groups not on whole dataframe.

data.groupby(['league', 'Team', 'week'])['position'].diff().fillna(0,downcast='infer')



0    0

1    0

2    0

3    0

4    0

5    0

6    0

7    0

8    0

Name: position, dtype: int64

edited Nov 12 at 9:28

answered Nov 12 at 9:03

Sandeep Kadapa

5,642427

Thanks Sandeep. If I want to groupby with League,team, week then work out the frequency, how would I add that. The sample data frame is already sorted but actual data is random.
– Zephyr
Nov 12 at 9:21

add a comment |

up vote
1
down vote

Use diff and abs with fillna:

data['frequency'] = data['position'].diff().abs().fillna(0,downcast='infer')



print(data)

  league Team  week  position  frequency

0      A    X     1         1          0

1      A    X     2         1          0

2      A    X     3         2          1

3      A    Y     1         2          0

4      A    Y     2         2          0

5      A    Y     3         1          1

6      B    Z     1         2          1

7      B    Z     2         3          1

8      B    Z     3         4          1

Using groupby gives all zeros, since you are comparing within groups not on whole dataframe.

data.groupby(['league', 'Team', 'week'])['position'].diff().fillna(0,downcast='infer')



0    0

1    0

2    0

3    0

4    0

5    0

6    0

7    0

8    0

Name: position, dtype: int64

edited Nov 12 at 9:28

answered Nov 12 at 9:03

Sandeep Kadapa

5,642427

Thanks Sandeep. If I want to groupby with League,team, week then work out the frequency, how would I add that. The sample data frame is already sorted but actual data is random.
– Zephyr
Nov 12 at 9:21

add a comment |

up vote
1
down vote

Use diff and abs with fillna:

data['frequency'] = data['position'].diff().abs().fillna(0,downcast='infer')



print(data)

  league Team  week  position  frequency

0      A    X     1         1          0

1      A    X     2         1          0

2      A    X     3         2          1

3      A    Y     1         2          0

4      A    Y     2         2          0

5      A    Y     3         1          1

6      B    Z     1         2          1

7      B    Z     2         3          1

8      B    Z     3         4          1

Using groupby gives all zeros, since you are comparing within groups not on whole dataframe.

data.groupby(['league', 'Team', 'week'])['position'].diff().fillna(0,downcast='infer')



0    0

1    0

2    0

3    0

4    0

5    0

6    0

7    0

8    0

Name: position, dtype: int64

edited Nov 12 at 9:28

answered Nov 12 at 9:03

Sandeep Kadapa

5,642427

Use diff and abs with fillna:

data['frequency'] = data['position'].diff().abs().fillna(0,downcast='infer')



print(data)

  league Team  week  position  frequency

0      A    X     1         1          0

1      A    X     2         1          0

2      A    X     3         2          1

3      A    Y     1         2          0

4      A    Y     2         2          0

5      A    Y     3         1          1

6      B    Z     1         2          1

7      B    Z     2         3          1

8      B    Z     3         4          1

Using groupby gives all zeros, since you are comparing within groups not on whole dataframe.

data.groupby(['league', 'Team', 'week'])['position'].diff().fillna(0,downcast='infer')



0    0

1    0

2    0

3    0

4    0

5    0

6    0

7    0

8    0

Name: position, dtype: int64

edited Nov 12 at 9:28

answered Nov 12 at 9:03

Sandeep Kadapa

5,642427

edited Nov 12 at 9:28

answered Nov 12 at 9:03

Sandeep Kadapa

5,642427

answered Nov 12 at 9:03

Sandeep Kadapa

5,642427

answered Nov 12 at 9:03

Sandeep Kadapa

5,642427

Thanks Sandeep. If I want to groupby with League,team, week then work out the frequency, how would I add that. The sample data frame is already sorted but actual data is random.
– Zephyr
Nov 12 at 9:21

add a comment |

Thanks Sandeep. If I want to groupby with League,team, week then work out the frequency, how would I add that. The sample data frame is already sorted but actual data is random.
– Zephyr
Nov 12 at 9:21

Thanks Sandeep. If I want to groupby with League,team, week then work out the frequency, how would I add that. The sample data frame is already sorted but actual data is random.
– Zephyr
Nov 12 at 9:21

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vfrdtyky