Python pandas splitting text and numbers in dataframe
I have a dataframe df1 with column name Acc Number as the first column and the data looks like:
Acc Number
ASC100.1
MJT122
ASC120.4
XTY111
I need to make a new dataframe df2 that will have two columns first having the text part and the second having the numbers so the desired output is:
Text Number
ASC 100.1
MJT 122
ASC 100.4
XTY 111
How would I go about doing this?
Thanks!
python pandas dataframe
add a comment |
I have a dataframe df1 with column name Acc Number as the first column and the data looks like:
Acc Number
ASC100.1
MJT122
ASC120.4
XTY111
I need to make a new dataframe df2 that will have two columns first having the text part and the second having the numbers so the desired output is:
Text Number
ASC 100.1
MJT 122
ASC 100.4
XTY 111
How would I go about doing this?
Thanks!
python pandas dataframe
add a comment |
I have a dataframe df1 with column name Acc Number as the first column and the data looks like:
Acc Number
ASC100.1
MJT122
ASC120.4
XTY111
I need to make a new dataframe df2 that will have two columns first having the text part and the second having the numbers so the desired output is:
Text Number
ASC 100.1
MJT 122
ASC 100.4
XTY 111
How would I go about doing this?
Thanks!
python pandas dataframe
I have a dataframe df1 with column name Acc Number as the first column and the data looks like:
Acc Number
ASC100.1
MJT122
ASC120.4
XTY111
I need to make a new dataframe df2 that will have two columns first having the text part and the second having the numbers so the desired output is:
Text Number
ASC 100.1
MJT 122
ASC 100.4
XTY 111
How would I go about doing this?
Thanks!
python pandas dataframe
python pandas dataframe
asked Nov 13 '18 at 23:14
AndyAndy
437
437
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
You could do something like this:
import pandas as pd
data = ['ASC100.1',
'MJT122',
'ASC120.4',
'XTY111']
df = pd.DataFrame(data=data, columns=['col'])
result = df.col.str.extract('([a-zA-Z]+)([^a-zA-Z]+)', expand=True)
result.columns = ['Text', 'Number']
print(result)
Output
Text Number
0 ASC 100.1
1 MJT 122
2 ASC 120.4
3 XTY 111
The pattern ([a-zA-Z]+)([^a-zA-Z]+)
means match a group of letters: ([a-zA-Z]+)
followed by a group of non letters: ([^a-zA-Z]+)
. A safer alternative will be to use the following regex: ([a-zA-Z]+)(d+.?d+)
assuming the numbers can only have at most one point.
Further
- The documentation on regex in Python.
- The documentation on extract.
Thanks Daniel, the str.extract worked, why would the regex be a safer option?
– Andy
Nov 13 '18 at 23:28
Because it will match only and only numbers with a possible point among them.
– Daniel Mesejo
Nov 13 '18 at 23:30
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53290902%2fpython-pandas-splitting-text-and-numbers-in-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
You could do something like this:
import pandas as pd
data = ['ASC100.1',
'MJT122',
'ASC120.4',
'XTY111']
df = pd.DataFrame(data=data, columns=['col'])
result = df.col.str.extract('([a-zA-Z]+)([^a-zA-Z]+)', expand=True)
result.columns = ['Text', 'Number']
print(result)
Output
Text Number
0 ASC 100.1
1 MJT 122
2 ASC 120.4
3 XTY 111
The pattern ([a-zA-Z]+)([^a-zA-Z]+)
means match a group of letters: ([a-zA-Z]+)
followed by a group of non letters: ([^a-zA-Z]+)
. A safer alternative will be to use the following regex: ([a-zA-Z]+)(d+.?d+)
assuming the numbers can only have at most one point.
Further
- The documentation on regex in Python.
- The documentation on extract.
Thanks Daniel, the str.extract worked, why would the regex be a safer option?
– Andy
Nov 13 '18 at 23:28
Because it will match only and only numbers with a possible point among them.
– Daniel Mesejo
Nov 13 '18 at 23:30
add a comment |
You could do something like this:
import pandas as pd
data = ['ASC100.1',
'MJT122',
'ASC120.4',
'XTY111']
df = pd.DataFrame(data=data, columns=['col'])
result = df.col.str.extract('([a-zA-Z]+)([^a-zA-Z]+)', expand=True)
result.columns = ['Text', 'Number']
print(result)
Output
Text Number
0 ASC 100.1
1 MJT 122
2 ASC 120.4
3 XTY 111
The pattern ([a-zA-Z]+)([^a-zA-Z]+)
means match a group of letters: ([a-zA-Z]+)
followed by a group of non letters: ([^a-zA-Z]+)
. A safer alternative will be to use the following regex: ([a-zA-Z]+)(d+.?d+)
assuming the numbers can only have at most one point.
Further
- The documentation on regex in Python.
- The documentation on extract.
Thanks Daniel, the str.extract worked, why would the regex be a safer option?
– Andy
Nov 13 '18 at 23:28
Because it will match only and only numbers with a possible point among them.
– Daniel Mesejo
Nov 13 '18 at 23:30
add a comment |
You could do something like this:
import pandas as pd
data = ['ASC100.1',
'MJT122',
'ASC120.4',
'XTY111']
df = pd.DataFrame(data=data, columns=['col'])
result = df.col.str.extract('([a-zA-Z]+)([^a-zA-Z]+)', expand=True)
result.columns = ['Text', 'Number']
print(result)
Output
Text Number
0 ASC 100.1
1 MJT 122
2 ASC 120.4
3 XTY 111
The pattern ([a-zA-Z]+)([^a-zA-Z]+)
means match a group of letters: ([a-zA-Z]+)
followed by a group of non letters: ([^a-zA-Z]+)
. A safer alternative will be to use the following regex: ([a-zA-Z]+)(d+.?d+)
assuming the numbers can only have at most one point.
Further
- The documentation on regex in Python.
- The documentation on extract.
You could do something like this:
import pandas as pd
data = ['ASC100.1',
'MJT122',
'ASC120.4',
'XTY111']
df = pd.DataFrame(data=data, columns=['col'])
result = df.col.str.extract('([a-zA-Z]+)([^a-zA-Z]+)', expand=True)
result.columns = ['Text', 'Number']
print(result)
Output
Text Number
0 ASC 100.1
1 MJT 122
2 ASC 120.4
3 XTY 111
The pattern ([a-zA-Z]+)([^a-zA-Z]+)
means match a group of letters: ([a-zA-Z]+)
followed by a group of non letters: ([^a-zA-Z]+)
. A safer alternative will be to use the following regex: ([a-zA-Z]+)(d+.?d+)
assuming the numbers can only have at most one point.
Further
- The documentation on regex in Python.
- The documentation on extract.
edited Nov 13 '18 at 23:29
answered Nov 13 '18 at 23:20
Daniel MesejoDaniel Mesejo
16.6k21430
16.6k21430
Thanks Daniel, the str.extract worked, why would the regex be a safer option?
– Andy
Nov 13 '18 at 23:28
Because it will match only and only numbers with a possible point among them.
– Daniel Mesejo
Nov 13 '18 at 23:30
add a comment |
Thanks Daniel, the str.extract worked, why would the regex be a safer option?
– Andy
Nov 13 '18 at 23:28
Because it will match only and only numbers with a possible point among them.
– Daniel Mesejo
Nov 13 '18 at 23:30
Thanks Daniel, the str.extract worked, why would the regex be a safer option?
– Andy
Nov 13 '18 at 23:28
Thanks Daniel, the str.extract worked, why would the regex be a safer option?
– Andy
Nov 13 '18 at 23:28
Because it will match only and only numbers with a possible point among them.
– Daniel Mesejo
Nov 13 '18 at 23:30
Because it will match only and only numbers with a possible point among them.
– Daniel Mesejo
Nov 13 '18 at 23:30
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53290902%2fpython-pandas-splitting-text-and-numbers-in-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown