Creating smaller dataframes from a larger dataframe using multiple filter criteria: using python and pandas
Good morning,
Basically I have 2 pandas dataframes from CSVs:
Dataframe 1: each row is a group where the row index is a geographical area code, and the columns are the top 5 most similar areas. e.g:
0 1 2 3 4 5
Rank
00C 00C 03H 02D 05H 02E 04E
00D 00D 02P 02X 01X 03R 06M
Dataframe 2: This is a larger dataframe with hospital activity numbers broken down by age group, gender and the geographical areas. e.g:
MALE_0-4 MALE_5-9 FEMALE_80-84 FEMALE_85+
06M 75 59 43 48
00C 132 121 173 204
01X 84 63 124 102
03H 127 131 130 83
02P 93 89 208 151
02D 70 62 92 81
05H 96 76 52 32
00C 106 62 123 106
03R 75 59 43 48
02P 10 121 173 204
03R 84 63 124 102
03R 30 131 130 83
02E 93 89 208 151
06M 70 62 92 81
04E 96 76 52 32
00D 106 62 123 106
What I am trying to do is create smaller dataframes from Dataframe 2 - filtered by the groupings from Dataframe 1. Each geographical area code can appear in multiple lookups. I have a basic idea of for loops, but can't quite get it to work.
Dataframe Output 1:
MALE_0-4 MALE_5-9 FEMALE_80-84 FEMALE_85+
00C 132 121 173 204
03H 127 131 130 83
02D 70 62 92 81
05H 96 76 52 32
00C 106 62 123 106
02E 93 89 208 151
04E 96 76 52 32
Dataframe Output 2:
MALE_0-4 MALE_5-9 FEMALE_80-84 FEMALE_85+
06M 75 59 43 48
01X 84 63 124 102
02P 93 89 208 151
03R 75 59 43 48
02P 10 121 173 204
03R 84 63 124 102
03R 30 131 130 83
06M 70 62 92 81
00D 106 62 123 106
...
Hope this makes sense and any help would be appreciated.
python pandas dataframe
add a comment |
Good morning,
Basically I have 2 pandas dataframes from CSVs:
Dataframe 1: each row is a group where the row index is a geographical area code, and the columns are the top 5 most similar areas. e.g:
0 1 2 3 4 5
Rank
00C 00C 03H 02D 05H 02E 04E
00D 00D 02P 02X 01X 03R 06M
Dataframe 2: This is a larger dataframe with hospital activity numbers broken down by age group, gender and the geographical areas. e.g:
MALE_0-4 MALE_5-9 FEMALE_80-84 FEMALE_85+
06M 75 59 43 48
00C 132 121 173 204
01X 84 63 124 102
03H 127 131 130 83
02P 93 89 208 151
02D 70 62 92 81
05H 96 76 52 32
00C 106 62 123 106
03R 75 59 43 48
02P 10 121 173 204
03R 84 63 124 102
03R 30 131 130 83
02E 93 89 208 151
06M 70 62 92 81
04E 96 76 52 32
00D 106 62 123 106
What I am trying to do is create smaller dataframes from Dataframe 2 - filtered by the groupings from Dataframe 1. Each geographical area code can appear in multiple lookups. I have a basic idea of for loops, but can't quite get it to work.
Dataframe Output 1:
MALE_0-4 MALE_5-9 FEMALE_80-84 FEMALE_85+
00C 132 121 173 204
03H 127 131 130 83
02D 70 62 92 81
05H 96 76 52 32
00C 106 62 123 106
02E 93 89 208 151
04E 96 76 52 32
Dataframe Output 2:
MALE_0-4 MALE_5-9 FEMALE_80-84 FEMALE_85+
06M 75 59 43 48
01X 84 63 124 102
02P 93 89 208 151
03R 75 59 43 48
02P 10 121 173 204
03R 84 63 124 102
03R 30 131 130 83
06M 70 62 92 81
00D 106 62 123 106
...
Hope this makes sense and any help would be appreciated.
python pandas dataframe
2
Can't understand what you want! Can you please provide the one of expected output when you give a specific input!!
– Rahul Agarwal
Nov 15 '18 at 12:19
Hi Rahul. I am effectively looking to split the second dataframe into multiple dataframes - filtered so each of the smaller dataframes only include the data for each of the lookups in the first dataframe (with each column in the first dataframe representing the filter criteria). Dataframe 2 is actually much bigger (so includes rows for all the codes in the first dataframe as well as the 07L, 07M....etc already included in the example). Thanks!
– Daviesa
Nov 15 '18 at 14:03
1
Possible duplicate of Filter dataframe rows if value in column is in a set list of values
– sophros
Nov 15 '18 at 14:34
Could you create example with 5 rows and 3 columns with desired output? It's much easier to help you if we have Minimal, Complete, and Verifiable example.
– zipa
Nov 15 '18 at 14:36
Thanks all. I've changed the examples so the ouputs are correct for the input data. Notice that all the codes in Output 1 are in Dataframe 1's first column, and all the codes in Output 2 are in Dataframe 2's second column.
– Daviesa
Nov 15 '18 at 15:37
add a comment |
Good morning,
Basically I have 2 pandas dataframes from CSVs:
Dataframe 1: each row is a group where the row index is a geographical area code, and the columns are the top 5 most similar areas. e.g:
0 1 2 3 4 5
Rank
00C 00C 03H 02D 05H 02E 04E
00D 00D 02P 02X 01X 03R 06M
Dataframe 2: This is a larger dataframe with hospital activity numbers broken down by age group, gender and the geographical areas. e.g:
MALE_0-4 MALE_5-9 FEMALE_80-84 FEMALE_85+
06M 75 59 43 48
00C 132 121 173 204
01X 84 63 124 102
03H 127 131 130 83
02P 93 89 208 151
02D 70 62 92 81
05H 96 76 52 32
00C 106 62 123 106
03R 75 59 43 48
02P 10 121 173 204
03R 84 63 124 102
03R 30 131 130 83
02E 93 89 208 151
06M 70 62 92 81
04E 96 76 52 32
00D 106 62 123 106
What I am trying to do is create smaller dataframes from Dataframe 2 - filtered by the groupings from Dataframe 1. Each geographical area code can appear in multiple lookups. I have a basic idea of for loops, but can't quite get it to work.
Dataframe Output 1:
MALE_0-4 MALE_5-9 FEMALE_80-84 FEMALE_85+
00C 132 121 173 204
03H 127 131 130 83
02D 70 62 92 81
05H 96 76 52 32
00C 106 62 123 106
02E 93 89 208 151
04E 96 76 52 32
Dataframe Output 2:
MALE_0-4 MALE_5-9 FEMALE_80-84 FEMALE_85+
06M 75 59 43 48
01X 84 63 124 102
02P 93 89 208 151
03R 75 59 43 48
02P 10 121 173 204
03R 84 63 124 102
03R 30 131 130 83
06M 70 62 92 81
00D 106 62 123 106
...
Hope this makes sense and any help would be appreciated.
python pandas dataframe
Good morning,
Basically I have 2 pandas dataframes from CSVs:
Dataframe 1: each row is a group where the row index is a geographical area code, and the columns are the top 5 most similar areas. e.g:
0 1 2 3 4 5
Rank
00C 00C 03H 02D 05H 02E 04E
00D 00D 02P 02X 01X 03R 06M
Dataframe 2: This is a larger dataframe with hospital activity numbers broken down by age group, gender and the geographical areas. e.g:
MALE_0-4 MALE_5-9 FEMALE_80-84 FEMALE_85+
06M 75 59 43 48
00C 132 121 173 204
01X 84 63 124 102
03H 127 131 130 83
02P 93 89 208 151
02D 70 62 92 81
05H 96 76 52 32
00C 106 62 123 106
03R 75 59 43 48
02P 10 121 173 204
03R 84 63 124 102
03R 30 131 130 83
02E 93 89 208 151
06M 70 62 92 81
04E 96 76 52 32
00D 106 62 123 106
What I am trying to do is create smaller dataframes from Dataframe 2 - filtered by the groupings from Dataframe 1. Each geographical area code can appear in multiple lookups. I have a basic idea of for loops, but can't quite get it to work.
Dataframe Output 1:
MALE_0-4 MALE_5-9 FEMALE_80-84 FEMALE_85+
00C 132 121 173 204
03H 127 131 130 83
02D 70 62 92 81
05H 96 76 52 32
00C 106 62 123 106
02E 93 89 208 151
04E 96 76 52 32
Dataframe Output 2:
MALE_0-4 MALE_5-9 FEMALE_80-84 FEMALE_85+
06M 75 59 43 48
01X 84 63 124 102
02P 93 89 208 151
03R 75 59 43 48
02P 10 121 173 204
03R 84 63 124 102
03R 30 131 130 83
06M 70 62 92 81
00D 106 62 123 106
...
Hope this makes sense and any help would be appreciated.
python pandas dataframe
python pandas dataframe
edited Nov 15 '18 at 16:51
Daviesa
asked Nov 15 '18 at 11:54
DaviesaDaviesa
32
32
2
Can't understand what you want! Can you please provide the one of expected output when you give a specific input!!
– Rahul Agarwal
Nov 15 '18 at 12:19
Hi Rahul. I am effectively looking to split the second dataframe into multiple dataframes - filtered so each of the smaller dataframes only include the data for each of the lookups in the first dataframe (with each column in the first dataframe representing the filter criteria). Dataframe 2 is actually much bigger (so includes rows for all the codes in the first dataframe as well as the 07L, 07M....etc already included in the example). Thanks!
– Daviesa
Nov 15 '18 at 14:03
1
Possible duplicate of Filter dataframe rows if value in column is in a set list of values
– sophros
Nov 15 '18 at 14:34
Could you create example with 5 rows and 3 columns with desired output? It's much easier to help you if we have Minimal, Complete, and Verifiable example.
– zipa
Nov 15 '18 at 14:36
Thanks all. I've changed the examples so the ouputs are correct for the input data. Notice that all the codes in Output 1 are in Dataframe 1's first column, and all the codes in Output 2 are in Dataframe 2's second column.
– Daviesa
Nov 15 '18 at 15:37
add a comment |
2
Can't understand what you want! Can you please provide the one of expected output when you give a specific input!!
– Rahul Agarwal
Nov 15 '18 at 12:19
Hi Rahul. I am effectively looking to split the second dataframe into multiple dataframes - filtered so each of the smaller dataframes only include the data for each of the lookups in the first dataframe (with each column in the first dataframe representing the filter criteria). Dataframe 2 is actually much bigger (so includes rows for all the codes in the first dataframe as well as the 07L, 07M....etc already included in the example). Thanks!
– Daviesa
Nov 15 '18 at 14:03
1
Possible duplicate of Filter dataframe rows if value in column is in a set list of values
– sophros
Nov 15 '18 at 14:34
Could you create example with 5 rows and 3 columns with desired output? It's much easier to help you if we have Minimal, Complete, and Verifiable example.
– zipa
Nov 15 '18 at 14:36
Thanks all. I've changed the examples so the ouputs are correct for the input data. Notice that all the codes in Output 1 are in Dataframe 1's first column, and all the codes in Output 2 are in Dataframe 2's second column.
– Daviesa
Nov 15 '18 at 15:37
2
2
Can't understand what you want! Can you please provide the one of expected output when you give a specific input!!
– Rahul Agarwal
Nov 15 '18 at 12:19
Can't understand what you want! Can you please provide the one of expected output when you give a specific input!!
– Rahul Agarwal
Nov 15 '18 at 12:19
Hi Rahul. I am effectively looking to split the second dataframe into multiple dataframes - filtered so each of the smaller dataframes only include the data for each of the lookups in the first dataframe (with each column in the first dataframe representing the filter criteria). Dataframe 2 is actually much bigger (so includes rows for all the codes in the first dataframe as well as the 07L, 07M....etc already included in the example). Thanks!
– Daviesa
Nov 15 '18 at 14:03
Hi Rahul. I am effectively looking to split the second dataframe into multiple dataframes - filtered so each of the smaller dataframes only include the data for each of the lookups in the first dataframe (with each column in the first dataframe representing the filter criteria). Dataframe 2 is actually much bigger (so includes rows for all the codes in the first dataframe as well as the 07L, 07M....etc already included in the example). Thanks!
– Daviesa
Nov 15 '18 at 14:03
1
1
Possible duplicate of Filter dataframe rows if value in column is in a set list of values
– sophros
Nov 15 '18 at 14:34
Possible duplicate of Filter dataframe rows if value in column is in a set list of values
– sophros
Nov 15 '18 at 14:34
Could you create example with 5 rows and 3 columns with desired output? It's much easier to help you if we have Minimal, Complete, and Verifiable example.
– zipa
Nov 15 '18 at 14:36
Could you create example with 5 rows and 3 columns with desired output? It's much easier to help you if we have Minimal, Complete, and Verifiable example.
– zipa
Nov 15 '18 at 14:36
Thanks all. I've changed the examples so the ouputs are correct for the input data. Notice that all the codes in Output 1 are in Dataframe 1's first column, and all the codes in Output 2 are in Dataframe 2's second column.
– Daviesa
Nov 15 '18 at 15:37
Thanks all. I've changed the examples so the ouputs are correct for the input data. Notice that all the codes in Output 1 are in Dataframe 1's first column, and all the codes in Output 2 are in Dataframe 2's second column.
– Daviesa
Nov 15 '18 at 15:37
add a comment |
2 Answers
2
active
oldest
votes
Going by the linked duplicate question this is what you should use (sketchy):
for _, row in df1.iterrows():
broken_down = df2[df2['region'].isin(row)]
Many thanks, but this returned an error: 'only list-like objects are allowed to be passed to isin(), you passed a [str]'
– Daviesa
Nov 15 '18 at 15:42
As I wrote the solution was sketchy. With the correction it should no longer throw an error.
– sophros
Nov 15 '18 at 16:24
Fantastic - that's done the trick! Thanks for your help, this is so useful! Now I'm able to append them to a list of dataframes
– Daviesa
Nov 15 '18 at 16:56
I am glad I could help! Best of luck!
– sophros
Nov 15 '18 at 17:00
add a comment |
Just adding the code to append to a list, just for future use. Thanks to sophros for solving:
broken_down =
for _, row in df1.iterrows():
broken_down.append(df2[df2['region'].isin(row)])
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53318939%2fcreating-smaller-dataframes-from-a-larger-dataframe-using-multiple-filter-criter%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Going by the linked duplicate question this is what you should use (sketchy):
for _, row in df1.iterrows():
broken_down = df2[df2['region'].isin(row)]
Many thanks, but this returned an error: 'only list-like objects are allowed to be passed to isin(), you passed a [str]'
– Daviesa
Nov 15 '18 at 15:42
As I wrote the solution was sketchy. With the correction it should no longer throw an error.
– sophros
Nov 15 '18 at 16:24
Fantastic - that's done the trick! Thanks for your help, this is so useful! Now I'm able to append them to a list of dataframes
– Daviesa
Nov 15 '18 at 16:56
I am glad I could help! Best of luck!
– sophros
Nov 15 '18 at 17:00
add a comment |
Going by the linked duplicate question this is what you should use (sketchy):
for _, row in df1.iterrows():
broken_down = df2[df2['region'].isin(row)]
Many thanks, but this returned an error: 'only list-like objects are allowed to be passed to isin(), you passed a [str]'
– Daviesa
Nov 15 '18 at 15:42
As I wrote the solution was sketchy. With the correction it should no longer throw an error.
– sophros
Nov 15 '18 at 16:24
Fantastic - that's done the trick! Thanks for your help, this is so useful! Now I'm able to append them to a list of dataframes
– Daviesa
Nov 15 '18 at 16:56
I am glad I could help! Best of luck!
– sophros
Nov 15 '18 at 17:00
add a comment |
Going by the linked duplicate question this is what you should use (sketchy):
for _, row in df1.iterrows():
broken_down = df2[df2['region'].isin(row)]
Going by the linked duplicate question this is what you should use (sketchy):
for _, row in df1.iterrows():
broken_down = df2[df2['region'].isin(row)]
edited Nov 15 '18 at 16:24
answered Nov 15 '18 at 14:36
sophrossophros
2,8351932
2,8351932
Many thanks, but this returned an error: 'only list-like objects are allowed to be passed to isin(), you passed a [str]'
– Daviesa
Nov 15 '18 at 15:42
As I wrote the solution was sketchy. With the correction it should no longer throw an error.
– sophros
Nov 15 '18 at 16:24
Fantastic - that's done the trick! Thanks for your help, this is so useful! Now I'm able to append them to a list of dataframes
– Daviesa
Nov 15 '18 at 16:56
I am glad I could help! Best of luck!
– sophros
Nov 15 '18 at 17:00
add a comment |
Many thanks, but this returned an error: 'only list-like objects are allowed to be passed to isin(), you passed a [str]'
– Daviesa
Nov 15 '18 at 15:42
As I wrote the solution was sketchy. With the correction it should no longer throw an error.
– sophros
Nov 15 '18 at 16:24
Fantastic - that's done the trick! Thanks for your help, this is so useful! Now I'm able to append them to a list of dataframes
– Daviesa
Nov 15 '18 at 16:56
I am glad I could help! Best of luck!
– sophros
Nov 15 '18 at 17:00
Many thanks, but this returned an error: 'only list-like objects are allowed to be passed to isin(), you passed a [str]'
– Daviesa
Nov 15 '18 at 15:42
Many thanks, but this returned an error: 'only list-like objects are allowed to be passed to isin(), you passed a [str]'
– Daviesa
Nov 15 '18 at 15:42
As I wrote the solution was sketchy. With the correction it should no longer throw an error.
– sophros
Nov 15 '18 at 16:24
As I wrote the solution was sketchy. With the correction it should no longer throw an error.
– sophros
Nov 15 '18 at 16:24
Fantastic - that's done the trick! Thanks for your help, this is so useful! Now I'm able to append them to a list of dataframes
– Daviesa
Nov 15 '18 at 16:56
Fantastic - that's done the trick! Thanks for your help, this is so useful! Now I'm able to append them to a list of dataframes
– Daviesa
Nov 15 '18 at 16:56
I am glad I could help! Best of luck!
– sophros
Nov 15 '18 at 17:00
I am glad I could help! Best of luck!
– sophros
Nov 15 '18 at 17:00
add a comment |
Just adding the code to append to a list, just for future use. Thanks to sophros for solving:
broken_down =
for _, row in df1.iterrows():
broken_down.append(df2[df2['region'].isin(row)])
add a comment |
Just adding the code to append to a list, just for future use. Thanks to sophros for solving:
broken_down =
for _, row in df1.iterrows():
broken_down.append(df2[df2['region'].isin(row)])
add a comment |
Just adding the code to append to a list, just for future use. Thanks to sophros for solving:
broken_down =
for _, row in df1.iterrows():
broken_down.append(df2[df2['region'].isin(row)])
Just adding the code to append to a list, just for future use. Thanks to sophros for solving:
broken_down =
for _, row in df1.iterrows():
broken_down.append(df2[df2['region'].isin(row)])
answered Nov 15 '18 at 16:57
DaviesaDaviesa
32
32
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53318939%2fcreating-smaller-dataframes-from-a-larger-dataframe-using-multiple-filter-criter%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Can't understand what you want! Can you please provide the one of expected output when you give a specific input!!
– Rahul Agarwal
Nov 15 '18 at 12:19
Hi Rahul. I am effectively looking to split the second dataframe into multiple dataframes - filtered so each of the smaller dataframes only include the data for each of the lookups in the first dataframe (with each column in the first dataframe representing the filter criteria). Dataframe 2 is actually much bigger (so includes rows for all the codes in the first dataframe as well as the 07L, 07M....etc already included in the example). Thanks!
– Daviesa
Nov 15 '18 at 14:03
1
Possible duplicate of Filter dataframe rows if value in column is in a set list of values
– sophros
Nov 15 '18 at 14:34
Could you create example with 5 rows and 3 columns with desired output? It's much easier to help you if we have Minimal, Complete, and Verifiable example.
– zipa
Nov 15 '18 at 14:36
Thanks all. I've changed the examples so the ouputs are correct for the input data. Notice that all the codes in Output 1 are in Dataframe 1's first column, and all the codes in Output 2 are in Dataframe 2's second column.
– Daviesa
Nov 15 '18 at 15:37