How to read in Chinese text and write Chinese characters to csv - Python 3












1















I've searched SO but have not been able to find the answer to this specific problem. I am trying to read in from a .txt file of Chinese characters. When I try to write to a .csv, the contents of cells look like this:



b'xefxbbxbfxe5'



as opposed to:



山西襄汾



How can I output to a .csv the latter format? Snippet of relevant code is below:



infilehandle = open(infilepath, encoding = 'utf-8') # open .txt file
txtlines = infilehandle.read().replace('n', '')
date_pattern = re.compile('(d{4}.d{1,2}.d{1,2})')
date = date_pattern.findall(txtlines)[0]
title = txtlines.split(date)[0]
localrow =
localrow.append(date.encode("utf-8-sig"))
localrow.append(title.encode("utf_8_sig"))
outfilehandle.writerow(localrow) # writes to .csv









share|improve this question


















  • 3





    Was outfilehandle also created with encoding='utf-8'?

    – Peter Wood
    Nov 14 '18 at 22:10













  • If data items for writerow aren't strings, they are converted with str but str(b'n') == "b'n'"

    – Michael Butscher
    Nov 14 '18 at 22:12






  • 1





    How are you viewing the contents of the .csv file?

    – Peter Wood
    Nov 14 '18 at 22:15






  • 1





    Your snippet of code is not so relevant as you say it is. It seems to search for a sequence of digits and do something with them. Are you sure that is an important part of your problem?

    – usr2564301
    Nov 14 '18 at 22:22











  • Peter, I am viewing the contents in Excel. I use the default to set outfilehandle, which I believe in Python 3 is utf-8 but I could be wrong.

    – steven.m787
    Nov 14 '18 at 23:57
















1















I've searched SO but have not been able to find the answer to this specific problem. I am trying to read in from a .txt file of Chinese characters. When I try to write to a .csv, the contents of cells look like this:



b'xefxbbxbfxe5'



as opposed to:



山西襄汾



How can I output to a .csv the latter format? Snippet of relevant code is below:



infilehandle = open(infilepath, encoding = 'utf-8') # open .txt file
txtlines = infilehandle.read().replace('n', '')
date_pattern = re.compile('(d{4}.d{1,2}.d{1,2})')
date = date_pattern.findall(txtlines)[0]
title = txtlines.split(date)[0]
localrow =
localrow.append(date.encode("utf-8-sig"))
localrow.append(title.encode("utf_8_sig"))
outfilehandle.writerow(localrow) # writes to .csv









share|improve this question


















  • 3





    Was outfilehandle also created with encoding='utf-8'?

    – Peter Wood
    Nov 14 '18 at 22:10













  • If data items for writerow aren't strings, they are converted with str but str(b'n') == "b'n'"

    – Michael Butscher
    Nov 14 '18 at 22:12






  • 1





    How are you viewing the contents of the .csv file?

    – Peter Wood
    Nov 14 '18 at 22:15






  • 1





    Your snippet of code is not so relevant as you say it is. It seems to search for a sequence of digits and do something with them. Are you sure that is an important part of your problem?

    – usr2564301
    Nov 14 '18 at 22:22











  • Peter, I am viewing the contents in Excel. I use the default to set outfilehandle, which I believe in Python 3 is utf-8 but I could be wrong.

    – steven.m787
    Nov 14 '18 at 23:57














1












1








1








I've searched SO but have not been able to find the answer to this specific problem. I am trying to read in from a .txt file of Chinese characters. When I try to write to a .csv, the contents of cells look like this:



b'xefxbbxbfxe5'



as opposed to:



山西襄汾



How can I output to a .csv the latter format? Snippet of relevant code is below:



infilehandle = open(infilepath, encoding = 'utf-8') # open .txt file
txtlines = infilehandle.read().replace('n', '')
date_pattern = re.compile('(d{4}.d{1,2}.d{1,2})')
date = date_pattern.findall(txtlines)[0]
title = txtlines.split(date)[0]
localrow =
localrow.append(date.encode("utf-8-sig"))
localrow.append(title.encode("utf_8_sig"))
outfilehandle.writerow(localrow) # writes to .csv









share|improve this question














I've searched SO but have not been able to find the answer to this specific problem. I am trying to read in from a .txt file of Chinese characters. When I try to write to a .csv, the contents of cells look like this:



b'xefxbbxbfxe5'



as opposed to:



山西襄汾



How can I output to a .csv the latter format? Snippet of relevant code is below:



infilehandle = open(infilepath, encoding = 'utf-8') # open .txt file
txtlines = infilehandle.read().replace('n', '')
date_pattern = re.compile('(d{4}.d{1,2}.d{1,2})')
date = date_pattern.findall(txtlines)[0]
title = txtlines.split(date)[0]
localrow =
localrow.append(date.encode("utf-8-sig"))
localrow.append(title.encode("utf_8_sig"))
outfilehandle.writerow(localrow) # writes to .csv






python






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 14 '18 at 22:07









steven.m787steven.m787

82




82








  • 3





    Was outfilehandle also created with encoding='utf-8'?

    – Peter Wood
    Nov 14 '18 at 22:10













  • If data items for writerow aren't strings, they are converted with str but str(b'n') == "b'n'"

    – Michael Butscher
    Nov 14 '18 at 22:12






  • 1





    How are you viewing the contents of the .csv file?

    – Peter Wood
    Nov 14 '18 at 22:15






  • 1





    Your snippet of code is not so relevant as you say it is. It seems to search for a sequence of digits and do something with them. Are you sure that is an important part of your problem?

    – usr2564301
    Nov 14 '18 at 22:22











  • Peter, I am viewing the contents in Excel. I use the default to set outfilehandle, which I believe in Python 3 is utf-8 but I could be wrong.

    – steven.m787
    Nov 14 '18 at 23:57














  • 3





    Was outfilehandle also created with encoding='utf-8'?

    – Peter Wood
    Nov 14 '18 at 22:10













  • If data items for writerow aren't strings, they are converted with str but str(b'n') == "b'n'"

    – Michael Butscher
    Nov 14 '18 at 22:12






  • 1





    How are you viewing the contents of the .csv file?

    – Peter Wood
    Nov 14 '18 at 22:15






  • 1





    Your snippet of code is not so relevant as you say it is. It seems to search for a sequence of digits and do something with them. Are you sure that is an important part of your problem?

    – usr2564301
    Nov 14 '18 at 22:22











  • Peter, I am viewing the contents in Excel. I use the default to set outfilehandle, which I believe in Python 3 is utf-8 but I could be wrong.

    – steven.m787
    Nov 14 '18 at 23:57








3




3





Was outfilehandle also created with encoding='utf-8'?

– Peter Wood
Nov 14 '18 at 22:10







Was outfilehandle also created with encoding='utf-8'?

– Peter Wood
Nov 14 '18 at 22:10















If data items for writerow aren't strings, they are converted with str but str(b'n') == "b'n'"

– Michael Butscher
Nov 14 '18 at 22:12





If data items for writerow aren't strings, they are converted with str but str(b'n') == "b'n'"

– Michael Butscher
Nov 14 '18 at 22:12




1




1





How are you viewing the contents of the .csv file?

– Peter Wood
Nov 14 '18 at 22:15





How are you viewing the contents of the .csv file?

– Peter Wood
Nov 14 '18 at 22:15




1




1





Your snippet of code is not so relevant as you say it is. It seems to search for a sequence of digits and do something with them. Are you sure that is an important part of your problem?

– usr2564301
Nov 14 '18 at 22:22





Your snippet of code is not so relevant as you say it is. It seems to search for a sequence of digits and do something with them. Are you sure that is an important part of your problem?

– usr2564301
Nov 14 '18 at 22:22













Peter, I am viewing the contents in Excel. I use the default to set outfilehandle, which I believe in Python 3 is utf-8 but I could be wrong.

– steven.m787
Nov 14 '18 at 23:57





Peter, I am viewing the contents in Excel. I use the default to set outfilehandle, which I believe in Python 3 is utf-8 but I could be wrong.

– steven.m787
Nov 14 '18 at 23:57












1 Answer
1






active

oldest

votes


















0














First, make sure to create outfilehandle with encoding='utf-8', as suggested by Peter Wood, like so:



outfilehandle = csv.writer(open('outfile.csv', 'w', encoding='utf-8'))


Then there is no need to call date.encode("utf-8-sig"), just change lines 7-8 in your code snippet into:



localrow.append(date)
localrow.append(title)


Also, it may be helpful to read Python Unicode HOWTO and Processing Text Files in Python 3.






share|improve this answer
























  • I modified my code, but the resulting cell contents now look like: 山西è instead of the Chinese characters in the input text file. I have read through the links provided, but am unsure how to apply that information to writing to a csv. Thanks in advance.

    – steven.m787
    Nov 15 '18 at 13:50













  • Realizing this is an Excel issue. Opening the file in Notepad displays the "correct" characters.

    – steven.m787
    Nov 15 '18 at 14:01











  • @steven.m787 you may need to change Excel's encoding. see this: itg.ias.edu/content/…

    – azalea
    Nov 15 '18 at 16:57











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53309456%2fhow-to-read-in-chinese-text-and-write-chinese-characters-to-csv-python-3%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0














First, make sure to create outfilehandle with encoding='utf-8', as suggested by Peter Wood, like so:



outfilehandle = csv.writer(open('outfile.csv', 'w', encoding='utf-8'))


Then there is no need to call date.encode("utf-8-sig"), just change lines 7-8 in your code snippet into:



localrow.append(date)
localrow.append(title)


Also, it may be helpful to read Python Unicode HOWTO and Processing Text Files in Python 3.






share|improve this answer
























  • I modified my code, but the resulting cell contents now look like: 山西è instead of the Chinese characters in the input text file. I have read through the links provided, but am unsure how to apply that information to writing to a csv. Thanks in advance.

    – steven.m787
    Nov 15 '18 at 13:50













  • Realizing this is an Excel issue. Opening the file in Notepad displays the "correct" characters.

    – steven.m787
    Nov 15 '18 at 14:01











  • @steven.m787 you may need to change Excel's encoding. see this: itg.ias.edu/content/…

    – azalea
    Nov 15 '18 at 16:57
















0














First, make sure to create outfilehandle with encoding='utf-8', as suggested by Peter Wood, like so:



outfilehandle = csv.writer(open('outfile.csv', 'w', encoding='utf-8'))


Then there is no need to call date.encode("utf-8-sig"), just change lines 7-8 in your code snippet into:



localrow.append(date)
localrow.append(title)


Also, it may be helpful to read Python Unicode HOWTO and Processing Text Files in Python 3.






share|improve this answer
























  • I modified my code, but the resulting cell contents now look like: 山西è instead of the Chinese characters in the input text file. I have read through the links provided, but am unsure how to apply that information to writing to a csv. Thanks in advance.

    – steven.m787
    Nov 15 '18 at 13:50













  • Realizing this is an Excel issue. Opening the file in Notepad displays the "correct" characters.

    – steven.m787
    Nov 15 '18 at 14:01











  • @steven.m787 you may need to change Excel's encoding. see this: itg.ias.edu/content/…

    – azalea
    Nov 15 '18 at 16:57














0












0








0







First, make sure to create outfilehandle with encoding='utf-8', as suggested by Peter Wood, like so:



outfilehandle = csv.writer(open('outfile.csv', 'w', encoding='utf-8'))


Then there is no need to call date.encode("utf-8-sig"), just change lines 7-8 in your code snippet into:



localrow.append(date)
localrow.append(title)


Also, it may be helpful to read Python Unicode HOWTO and Processing Text Files in Python 3.






share|improve this answer













First, make sure to create outfilehandle with encoding='utf-8', as suggested by Peter Wood, like so:



outfilehandle = csv.writer(open('outfile.csv', 'w', encoding='utf-8'))


Then there is no need to call date.encode("utf-8-sig"), just change lines 7-8 in your code snippet into:



localrow.append(date)
localrow.append(title)


Also, it may be helpful to read Python Unicode HOWTO and Processing Text Files in Python 3.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 15 '18 at 0:04









azaleaazalea

3,84622233




3,84622233













  • I modified my code, but the resulting cell contents now look like: 山西è instead of the Chinese characters in the input text file. I have read through the links provided, but am unsure how to apply that information to writing to a csv. Thanks in advance.

    – steven.m787
    Nov 15 '18 at 13:50













  • Realizing this is an Excel issue. Opening the file in Notepad displays the "correct" characters.

    – steven.m787
    Nov 15 '18 at 14:01











  • @steven.m787 you may need to change Excel's encoding. see this: itg.ias.edu/content/…

    – azalea
    Nov 15 '18 at 16:57



















  • I modified my code, but the resulting cell contents now look like: 山西è instead of the Chinese characters in the input text file. I have read through the links provided, but am unsure how to apply that information to writing to a csv. Thanks in advance.

    – steven.m787
    Nov 15 '18 at 13:50













  • Realizing this is an Excel issue. Opening the file in Notepad displays the "correct" characters.

    – steven.m787
    Nov 15 '18 at 14:01











  • @steven.m787 you may need to change Excel's encoding. see this: itg.ias.edu/content/…

    – azalea
    Nov 15 '18 at 16:57

















I modified my code, but the resulting cell contents now look like: 山西è instead of the Chinese characters in the input text file. I have read through the links provided, but am unsure how to apply that information to writing to a csv. Thanks in advance.

– steven.m787
Nov 15 '18 at 13:50







I modified my code, but the resulting cell contents now look like: 山西è instead of the Chinese characters in the input text file. I have read through the links provided, but am unsure how to apply that information to writing to a csv. Thanks in advance.

– steven.m787
Nov 15 '18 at 13:50















Realizing this is an Excel issue. Opening the file in Notepad displays the "correct" characters.

– steven.m787
Nov 15 '18 at 14:01





Realizing this is an Excel issue. Opening the file in Notepad displays the "correct" characters.

– steven.m787
Nov 15 '18 at 14:01













@steven.m787 you may need to change Excel's encoding. see this: itg.ias.edu/content/…

– azalea
Nov 15 '18 at 16:57





@steven.m787 you may need to change Excel's encoding. see this: itg.ias.edu/content/…

– azalea
Nov 15 '18 at 16:57




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53309456%2fhow-to-read-in-chinese-text-and-write-chinese-characters-to-csv-python-3%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Xamarin.iOS Cant Deploy on Iphone

Glorious Revolution

Dulmage-Mendelsohn matrix decomposition in Python