How to read in Chinese text and write Chinese characters to csv - Python 3
I've searched SO but have not been able to find the answer to this specific problem. I am trying to read in from a .txt file of Chinese characters. When I try to write to a .csv, the contents of cells look like this:
b'xefxbbxbfxe5'
as opposed to:
山西襄汾
How can I output to a .csv the latter format? Snippet of relevant code is below:
infilehandle = open(infilepath, encoding = 'utf-8') # open .txt file
txtlines = infilehandle.read().replace('n', '')
date_pattern = re.compile('(d{4}.d{1,2}.d{1,2})')
date = date_pattern.findall(txtlines)[0]
title = txtlines.split(date)[0]
localrow =
localrow.append(date.encode("utf-8-sig"))
localrow.append(title.encode("utf_8_sig"))
outfilehandle.writerow(localrow) # writes to .csv
python
|
show 1 more comment
I've searched SO but have not been able to find the answer to this specific problem. I am trying to read in from a .txt file of Chinese characters. When I try to write to a .csv, the contents of cells look like this:
b'xefxbbxbfxe5'
as opposed to:
山西襄汾
How can I output to a .csv the latter format? Snippet of relevant code is below:
infilehandle = open(infilepath, encoding = 'utf-8') # open .txt file
txtlines = infilehandle.read().replace('n', '')
date_pattern = re.compile('(d{4}.d{1,2}.d{1,2})')
date = date_pattern.findall(txtlines)[0]
title = txtlines.split(date)[0]
localrow =
localrow.append(date.encode("utf-8-sig"))
localrow.append(title.encode("utf_8_sig"))
outfilehandle.writerow(localrow) # writes to .csv
python
3
Wasoutfilehandle
also created withencoding='utf-8'
?
– Peter Wood
Nov 14 '18 at 22:10
If data items forwriterow
aren't strings, they are converted withstr
butstr(b'n') == "b'n'"
– Michael Butscher
Nov 14 '18 at 22:12
1
How are you viewing the contents of the.csv
file?
– Peter Wood
Nov 14 '18 at 22:15
1
Your snippet of code is not so relevant as you say it is. It seems to search for a sequence of digits and do something with them. Are you sure that is an important part of your problem?
– usr2564301
Nov 14 '18 at 22:22
Peter, I am viewing the contents in Excel. I use the default to set outfilehandle, which I believe in Python 3 is utf-8 but I could be wrong.
– steven.m787
Nov 14 '18 at 23:57
|
show 1 more comment
I've searched SO but have not been able to find the answer to this specific problem. I am trying to read in from a .txt file of Chinese characters. When I try to write to a .csv, the contents of cells look like this:
b'xefxbbxbfxe5'
as opposed to:
山西襄汾
How can I output to a .csv the latter format? Snippet of relevant code is below:
infilehandle = open(infilepath, encoding = 'utf-8') # open .txt file
txtlines = infilehandle.read().replace('n', '')
date_pattern = re.compile('(d{4}.d{1,2}.d{1,2})')
date = date_pattern.findall(txtlines)[0]
title = txtlines.split(date)[0]
localrow =
localrow.append(date.encode("utf-8-sig"))
localrow.append(title.encode("utf_8_sig"))
outfilehandle.writerow(localrow) # writes to .csv
python
I've searched SO but have not been able to find the answer to this specific problem. I am trying to read in from a .txt file of Chinese characters. When I try to write to a .csv, the contents of cells look like this:
b'xefxbbxbfxe5'
as opposed to:
山西襄汾
How can I output to a .csv the latter format? Snippet of relevant code is below:
infilehandle = open(infilepath, encoding = 'utf-8') # open .txt file
txtlines = infilehandle.read().replace('n', '')
date_pattern = re.compile('(d{4}.d{1,2}.d{1,2})')
date = date_pattern.findall(txtlines)[0]
title = txtlines.split(date)[0]
localrow =
localrow.append(date.encode("utf-8-sig"))
localrow.append(title.encode("utf_8_sig"))
outfilehandle.writerow(localrow) # writes to .csv
python
python
asked Nov 14 '18 at 22:07
steven.m787steven.m787
82
82
3
Wasoutfilehandle
also created withencoding='utf-8'
?
– Peter Wood
Nov 14 '18 at 22:10
If data items forwriterow
aren't strings, they are converted withstr
butstr(b'n') == "b'n'"
– Michael Butscher
Nov 14 '18 at 22:12
1
How are you viewing the contents of the.csv
file?
– Peter Wood
Nov 14 '18 at 22:15
1
Your snippet of code is not so relevant as you say it is. It seems to search for a sequence of digits and do something with them. Are you sure that is an important part of your problem?
– usr2564301
Nov 14 '18 at 22:22
Peter, I am viewing the contents in Excel. I use the default to set outfilehandle, which I believe in Python 3 is utf-8 but I could be wrong.
– steven.m787
Nov 14 '18 at 23:57
|
show 1 more comment
3
Wasoutfilehandle
also created withencoding='utf-8'
?
– Peter Wood
Nov 14 '18 at 22:10
If data items forwriterow
aren't strings, they are converted withstr
butstr(b'n') == "b'n'"
– Michael Butscher
Nov 14 '18 at 22:12
1
How are you viewing the contents of the.csv
file?
– Peter Wood
Nov 14 '18 at 22:15
1
Your snippet of code is not so relevant as you say it is. It seems to search for a sequence of digits and do something with them. Are you sure that is an important part of your problem?
– usr2564301
Nov 14 '18 at 22:22
Peter, I am viewing the contents in Excel. I use the default to set outfilehandle, which I believe in Python 3 is utf-8 but I could be wrong.
– steven.m787
Nov 14 '18 at 23:57
3
3
Was
outfilehandle
also created with encoding='utf-8'
?– Peter Wood
Nov 14 '18 at 22:10
Was
outfilehandle
also created with encoding='utf-8'
?– Peter Wood
Nov 14 '18 at 22:10
If data items for
writerow
aren't strings, they are converted with str
but str(b'n') == "b'n'"
– Michael Butscher
Nov 14 '18 at 22:12
If data items for
writerow
aren't strings, they are converted with str
but str(b'n') == "b'n'"
– Michael Butscher
Nov 14 '18 at 22:12
1
1
How are you viewing the contents of the
.csv
file?– Peter Wood
Nov 14 '18 at 22:15
How are you viewing the contents of the
.csv
file?– Peter Wood
Nov 14 '18 at 22:15
1
1
Your snippet of code is not so relevant as you say it is. It seems to search for a sequence of digits and do something with them. Are you sure that is an important part of your problem?
– usr2564301
Nov 14 '18 at 22:22
Your snippet of code is not so relevant as you say it is. It seems to search for a sequence of digits and do something with them. Are you sure that is an important part of your problem?
– usr2564301
Nov 14 '18 at 22:22
Peter, I am viewing the contents in Excel. I use the default to set outfilehandle, which I believe in Python 3 is utf-8 but I could be wrong.
– steven.m787
Nov 14 '18 at 23:57
Peter, I am viewing the contents in Excel. I use the default to set outfilehandle, which I believe in Python 3 is utf-8 but I could be wrong.
– steven.m787
Nov 14 '18 at 23:57
|
show 1 more comment
1 Answer
1
active
oldest
votes
First, make sure to create outfilehandle
with encoding='utf-8'
, as suggested by Peter Wood, like so:
outfilehandle = csv.writer(open('outfile.csv', 'w', encoding='utf-8'))
Then there is no need to call date.encode("utf-8-sig")
, just change lines 7-8 in your code snippet into:
localrow.append(date)
localrow.append(title)
Also, it may be helpful to read Python Unicode HOWTO and Processing Text Files in Python 3.
I modified my code, but the resulting cell contents now look like: 山西è instead of the Chinese characters in the input text file. I have read through the links provided, but am unsure how to apply that information to writing to a csv. Thanks in advance.
– steven.m787
Nov 15 '18 at 13:50
Realizing this is an Excel issue. Opening the file in Notepad displays the "correct" characters.
– steven.m787
Nov 15 '18 at 14:01
@steven.m787 you may need to change Excel's encoding. see this: itg.ias.edu/content/…
– azalea
Nov 15 '18 at 16:57
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53309456%2fhow-to-read-in-chinese-text-and-write-chinese-characters-to-csv-python-3%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
First, make sure to create outfilehandle
with encoding='utf-8'
, as suggested by Peter Wood, like so:
outfilehandle = csv.writer(open('outfile.csv', 'w', encoding='utf-8'))
Then there is no need to call date.encode("utf-8-sig")
, just change lines 7-8 in your code snippet into:
localrow.append(date)
localrow.append(title)
Also, it may be helpful to read Python Unicode HOWTO and Processing Text Files in Python 3.
I modified my code, but the resulting cell contents now look like: 山西è instead of the Chinese characters in the input text file. I have read through the links provided, but am unsure how to apply that information to writing to a csv. Thanks in advance.
– steven.m787
Nov 15 '18 at 13:50
Realizing this is an Excel issue. Opening the file in Notepad displays the "correct" characters.
– steven.m787
Nov 15 '18 at 14:01
@steven.m787 you may need to change Excel's encoding. see this: itg.ias.edu/content/…
– azalea
Nov 15 '18 at 16:57
add a comment |
First, make sure to create outfilehandle
with encoding='utf-8'
, as suggested by Peter Wood, like so:
outfilehandle = csv.writer(open('outfile.csv', 'w', encoding='utf-8'))
Then there is no need to call date.encode("utf-8-sig")
, just change lines 7-8 in your code snippet into:
localrow.append(date)
localrow.append(title)
Also, it may be helpful to read Python Unicode HOWTO and Processing Text Files in Python 3.
I modified my code, but the resulting cell contents now look like: 山西è instead of the Chinese characters in the input text file. I have read through the links provided, but am unsure how to apply that information to writing to a csv. Thanks in advance.
– steven.m787
Nov 15 '18 at 13:50
Realizing this is an Excel issue. Opening the file in Notepad displays the "correct" characters.
– steven.m787
Nov 15 '18 at 14:01
@steven.m787 you may need to change Excel's encoding. see this: itg.ias.edu/content/…
– azalea
Nov 15 '18 at 16:57
add a comment |
First, make sure to create outfilehandle
with encoding='utf-8'
, as suggested by Peter Wood, like so:
outfilehandle = csv.writer(open('outfile.csv', 'w', encoding='utf-8'))
Then there is no need to call date.encode("utf-8-sig")
, just change lines 7-8 in your code snippet into:
localrow.append(date)
localrow.append(title)
Also, it may be helpful to read Python Unicode HOWTO and Processing Text Files in Python 3.
First, make sure to create outfilehandle
with encoding='utf-8'
, as suggested by Peter Wood, like so:
outfilehandle = csv.writer(open('outfile.csv', 'w', encoding='utf-8'))
Then there is no need to call date.encode("utf-8-sig")
, just change lines 7-8 in your code snippet into:
localrow.append(date)
localrow.append(title)
Also, it may be helpful to read Python Unicode HOWTO and Processing Text Files in Python 3.
answered Nov 15 '18 at 0:04
azaleaazalea
3,84622233
3,84622233
I modified my code, but the resulting cell contents now look like: 山西è instead of the Chinese characters in the input text file. I have read through the links provided, but am unsure how to apply that information to writing to a csv. Thanks in advance.
– steven.m787
Nov 15 '18 at 13:50
Realizing this is an Excel issue. Opening the file in Notepad displays the "correct" characters.
– steven.m787
Nov 15 '18 at 14:01
@steven.m787 you may need to change Excel's encoding. see this: itg.ias.edu/content/…
– azalea
Nov 15 '18 at 16:57
add a comment |
I modified my code, but the resulting cell contents now look like: 山西è instead of the Chinese characters in the input text file. I have read through the links provided, but am unsure how to apply that information to writing to a csv. Thanks in advance.
– steven.m787
Nov 15 '18 at 13:50
Realizing this is an Excel issue. Opening the file in Notepad displays the "correct" characters.
– steven.m787
Nov 15 '18 at 14:01
@steven.m787 you may need to change Excel's encoding. see this: itg.ias.edu/content/…
– azalea
Nov 15 '18 at 16:57
I modified my code, but the resulting cell contents now look like: 山西è instead of the Chinese characters in the input text file. I have read through the links provided, but am unsure how to apply that information to writing to a csv. Thanks in advance.
– steven.m787
Nov 15 '18 at 13:50
I modified my code, but the resulting cell contents now look like: 山西è instead of the Chinese characters in the input text file. I have read through the links provided, but am unsure how to apply that information to writing to a csv. Thanks in advance.
– steven.m787
Nov 15 '18 at 13:50
Realizing this is an Excel issue. Opening the file in Notepad displays the "correct" characters.
– steven.m787
Nov 15 '18 at 14:01
Realizing this is an Excel issue. Opening the file in Notepad displays the "correct" characters.
– steven.m787
Nov 15 '18 at 14:01
@steven.m787 you may need to change Excel's encoding. see this: itg.ias.edu/content/…
– azalea
Nov 15 '18 at 16:57
@steven.m787 you may need to change Excel's encoding. see this: itg.ias.edu/content/…
– azalea
Nov 15 '18 at 16:57
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53309456%2fhow-to-read-in-chinese-text-and-write-chinese-characters-to-csv-python-3%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
3
Was
outfilehandle
also created withencoding='utf-8'
?– Peter Wood
Nov 14 '18 at 22:10
If data items for
writerow
aren't strings, they are converted withstr
butstr(b'n') == "b'n'"
– Michael Butscher
Nov 14 '18 at 22:12
1
How are you viewing the contents of the
.csv
file?– Peter Wood
Nov 14 '18 at 22:15
1
Your snippet of code is not so relevant as you say it is. It seems to search for a sequence of digits and do something with them. Are you sure that is an important part of your problem?
– usr2564301
Nov 14 '18 at 22:22
Peter, I am viewing the contents in Excel. I use the default to set outfilehandle, which I believe in Python 3 is utf-8 but I could be wrong.
– steven.m787
Nov 14 '18 at 23:57