How to read in Chinese text and write Chinese characters to csv

How to read in Chinese text and write Chinese characters to csv - Python 3

I've searched SO but have not been able to find the answer to this specific problem. I am trying to read in from a .txt file of Chinese characters. When I try to write to a .csv, the contents of cells look like this:

b'xefxbbxbfxe5'

as opposed to:

山西襄汾

How can I output to a .csv the latter format? Snippet of relevant code is below:

infilehandle = open(infilepath, encoding = 'utf-8') # open .txt file

txtlines = infilehandle.read().replace('n', '')

date_pattern = re.compile('(d{4}.d{1,2}.d{1,2})')

date = date_pattern.findall(txtlines)[0]

title = txtlines.split(date)[0]

localrow = 

localrow.append(date.encode("utf-8-sig"))

localrow.append(title.encode("utf_8_sig"))

outfilehandle.writerow(localrow) # writes to .csv

asked Nov 14 '18 at 22:07

steven.m787

3

Was outfilehandle also created with encoding='utf-8'?

– Peter Wood
Nov 14 '18 at 22:10

If data items for writerow aren't strings, they are converted with str but str(b'n') == "b'n'"

– Michael Butscher
Nov 14 '18 at 22:12

1

How are you viewing the contents of the .csv file?

– Peter Wood
Nov 14 '18 at 22:15

1

Your snippet of code is not so relevant as you say it is. It seems to search for a sequence of digits and do something with them. Are you sure that is an important part of your problem?

– usr2564301
Nov 14 '18 at 22:22

Peter, I am viewing the contents in Excel. I use the default to set outfilehandle, which I believe in Python 3 is utf-8 but I could be wrong.

– steven.m787
Nov 14 '18 at 23:57

|
show 1 more comment

b'xefxbbxbfxe5'

as opposed to:

山西襄汾

How can I output to a .csv the latter format? Snippet of relevant code is below:

infilehandle = open(infilepath, encoding = 'utf-8') # open .txt file

txtlines = infilehandle.read().replace('n', '')

date_pattern = re.compile('(d{4}.d{1,2}.d{1,2})')

date = date_pattern.findall(txtlines)[0]

title = txtlines.split(date)[0]

localrow = 

localrow.append(date.encode("utf-8-sig"))

localrow.append(title.encode("utf_8_sig"))

outfilehandle.writerow(localrow) # writes to .csv

asked Nov 14 '18 at 22:07

steven.m787

3

Was outfilehandle also created with encoding='utf-8'?

– Peter Wood
Nov 14 '18 at 22:10

If data items for writerow aren't strings, they are converted with str but str(b'n') == "b'n'"

– Michael Butscher
Nov 14 '18 at 22:12

1

How are you viewing the contents of the .csv file?

– Peter Wood
Nov 14 '18 at 22:15

1

Your snippet of code is not so relevant as you say it is. It seems to search for a sequence of digits and do something with them. Are you sure that is an important part of your problem?

– usr2564301
Nov 14 '18 at 22:22

Peter, I am viewing the contents in Excel. I use the default to set outfilehandle, which I believe in Python 3 is utf-8 but I could be wrong.

– steven.m787
Nov 14 '18 at 23:57

|
show 1 more comment

b'xefxbbxbfxe5'

as opposed to:

山西襄汾

How can I output to a .csv the latter format? Snippet of relevant code is below:

infilehandle = open(infilepath, encoding = 'utf-8') # open .txt file

txtlines = infilehandle.read().replace('n', '')

date_pattern = re.compile('(d{4}.d{1,2}.d{1,2})')

date = date_pattern.findall(txtlines)[0]

title = txtlines.split(date)[0]

localrow = 

localrow.append(date.encode("utf-8-sig"))

localrow.append(title.encode("utf_8_sig"))

outfilehandle.writerow(localrow) # writes to .csv

asked Nov 14 '18 at 22:07

steven.m787

b'xefxbbxbfxe5'

as opposed to:

山西襄汾

How can I output to a .csv the latter format? Snippet of relevant code is below:

infilehandle = open(infilepath, encoding = 'utf-8') # open .txt file

txtlines = infilehandle.read().replace('n', '')

date_pattern = re.compile('(d{4}.d{1,2}.d{1,2})')

date = date_pattern.findall(txtlines)[0]

title = txtlines.split(date)[0]

localrow = 

localrow.append(date.encode("utf-8-sig"))

localrow.append(title.encode("utf_8_sig"))

outfilehandle.writerow(localrow) # writes to .csv

python

asked Nov 14 '18 at 22:07

steven.m787

asked Nov 14 '18 at 22:07

steven.m787

asked Nov 14 '18 at 22:07

steven.m787

asked Nov 14 '18 at 22:07

steven.m787

asked Nov 14 '18 at 22:07

steven.m787

3

Was outfilehandle also created with encoding='utf-8'?

– Peter Wood
Nov 14 '18 at 22:10

If data items for writerow aren't strings, they are converted with str but str(b'n') == "b'n'"

– Michael Butscher
Nov 14 '18 at 22:12

1

How are you viewing the contents of the .csv file?

– Peter Wood
Nov 14 '18 at 22:15

1

Your snippet of code is not so relevant as you say it is. It seems to search for a sequence of digits and do something with them. Are you sure that is an important part of your problem?

– usr2564301
Nov 14 '18 at 22:22

Peter, I am viewing the contents in Excel. I use the default to set outfilehandle, which I believe in Python 3 is utf-8 but I could be wrong.

– steven.m787
Nov 14 '18 at 23:57

|
show 1 more comment

3

Was outfilehandle also created with encoding='utf-8'?

– Peter Wood
Nov 14 '18 at 22:10

If data items for writerow aren't strings, they are converted with str but str(b'n') == "b'n'"

– Michael Butscher
Nov 14 '18 at 22:12

1

How are you viewing the contents of the .csv file?

– Peter Wood
Nov 14 '18 at 22:15

1

Your snippet of code is not so relevant as you say it is. It seems to search for a sequence of digits and do something with them. Are you sure that is an important part of your problem?

– usr2564301
Nov 14 '18 at 22:22

Peter, I am viewing the contents in Excel. I use the default to set outfilehandle, which I believe in Python 3 is utf-8 but I could be wrong.

– steven.m787
Nov 14 '18 at 23:57

Was outfilehandle also created with encoding='utf-8'?

– Peter Wood
Nov 14 '18 at 22:10

If data items for writerow aren't strings, they are converted with str but str(b'n') == "b'n'"

– Michael Butscher
Nov 14 '18 at 22:12

How are you viewing the contents of the .csv file?

– Peter Wood
Nov 14 '18 at 22:15

Your snippet of code is not so relevant as you say it is. It seems to search for a sequence of digits and do something with them. Are you sure that is an important part of your problem?

– usr2564301
Nov 14 '18 at 22:22

Peter, I am viewing the contents in Excel. I use the default to set outfilehandle, which I believe in Python 3 is utf-8 but I could be wrong.

– steven.m787
Nov 14 '18 at 23:57

|
show 1 more comment

1 Answer
1

active

oldest

votes

First, make sure to create outfilehandle with encoding='utf-8', as suggested by Peter Wood, like so:

outfilehandle = csv.writer(open('outfile.csv', 'w', encoding='utf-8'))

Then there is no need to call date.encode("utf-8-sig"), just change lines 7-8 in your code snippet into:

localrow.append(date)

localrow.append(title)

Also, it may be helpful to read Python Unicode HOWTO and Processing Text Files in Python 3.

answered Nov 15 '18 at 0:04

azalea

3,84622233

I modified my code, but the resulting cell contents now look like: å±±è¥¿è instead of the Chinese characters in the input text file. I have read through the links provided, but am unsure how to apply that information to writing to a csv. Thanks in advance.

– steven.m787
Nov 15 '18 at 13:50

Realizing this is an Excel issue. Opening the file in Notepad displays the "correct" characters.

– steven.m787
Nov 15 '18 at 14:01

@steven.m787 you may need to change Excel's encoding. see this: itg.ias.edu/content/…

– azalea
Nov 15 '18 at 16:57

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53309456%2fhow-to-read-in-chinese-text-and-write-chinese-characters-to-csv-python-3%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

First, make sure to create outfilehandle with encoding='utf-8', as suggested by Peter Wood, like so:

outfilehandle = csv.writer(open('outfile.csv', 'w', encoding='utf-8'))

Then there is no need to call date.encode("utf-8-sig"), just change lines 7-8 in your code snippet into:

localrow.append(date)

localrow.append(title)

Also, it may be helpful to read Python Unicode HOWTO and Processing Text Files in Python 3.

answered Nov 15 '18 at 0:04

azalea

3,84622233

I modified my code, but the resulting cell contents now look like: å±±è¥¿è instead of the Chinese characters in the input text file. I have read through the links provided, but am unsure how to apply that information to writing to a csv. Thanks in advance.

– steven.m787
Nov 15 '18 at 13:50

Realizing this is an Excel issue. Opening the file in Notepad displays the "correct" characters.

– steven.m787
Nov 15 '18 at 14:01

@steven.m787 you may need to change Excel's encoding. see this: itg.ias.edu/content/…

– azalea
Nov 15 '18 at 16:57

add a comment |

First, make sure to create outfilehandle with encoding='utf-8', as suggested by Peter Wood, like so:

outfilehandle = csv.writer(open('outfile.csv', 'w', encoding='utf-8'))

Then there is no need to call date.encode("utf-8-sig"), just change lines 7-8 in your code snippet into:

localrow.append(date)

localrow.append(title)

Also, it may be helpful to read Python Unicode HOWTO and Processing Text Files in Python 3.

answered Nov 15 '18 at 0:04

azalea

3,84622233

I modified my code, but the resulting cell contents now look like: å±±è¥¿è instead of the Chinese characters in the input text file. I have read through the links provided, but am unsure how to apply that information to writing to a csv. Thanks in advance.

– steven.m787
Nov 15 '18 at 13:50

Realizing this is an Excel issue. Opening the file in Notepad displays the "correct" characters.

– steven.m787
Nov 15 '18 at 14:01

@steven.m787 you may need to change Excel's encoding. see this: itg.ias.edu/content/…

– azalea
Nov 15 '18 at 16:57

add a comment |

First, make sure to create outfilehandle with encoding='utf-8', as suggested by Peter Wood, like so:

outfilehandle = csv.writer(open('outfile.csv', 'w', encoding='utf-8'))

Then there is no need to call date.encode("utf-8-sig"), just change lines 7-8 in your code snippet into:

localrow.append(date)

localrow.append(title)

Also, it may be helpful to read Python Unicode HOWTO and Processing Text Files in Python 3.

answered Nov 15 '18 at 0:04

azalea

3,84622233

First, make sure to create outfilehandle with encoding='utf-8', as suggested by Peter Wood, like so:

outfilehandle = csv.writer(open('outfile.csv', 'w', encoding='utf-8'))

Then there is no need to call date.encode("utf-8-sig"), just change lines 7-8 in your code snippet into:

localrow.append(date)

localrow.append(title)

Also, it may be helpful to read Python Unicode HOWTO and Processing Text Files in Python 3.

answered Nov 15 '18 at 0:04

azalea

3,84622233

answered Nov 15 '18 at 0:04

azalea

3,84622233

answered Nov 15 '18 at 0:04

azalea

3,84622233

answered Nov 15 '18 at 0:04

azalea

3,84622233

I modified my code, but the resulting cell contents now look like: å±±è¥¿è instead of the Chinese characters in the input text file. I have read through the links provided, but am unsure how to apply that information to writing to a csv. Thanks in advance.

– steven.m787
Nov 15 '18 at 13:50

Realizing this is an Excel issue. Opening the file in Notepad displays the "correct" characters.

– steven.m787
Nov 15 '18 at 14:01

@steven.m787 you may need to change Excel's encoding. see this: itg.ias.edu/content/…

– azalea
Nov 15 '18 at 16:57

add a comment |

I modified my code, but the resulting cell contents now look like: å±±è¥¿è instead of the Chinese characters in the input text file. I have read through the links provided, but am unsure how to apply that information to writing to a csv. Thanks in advance.

– steven.m787
Nov 15 '18 at 13:50

Realizing this is an Excel issue. Opening the file in Notepad displays the "correct" characters.

– steven.m787
Nov 15 '18 at 14:01

@steven.m787 you may need to change Excel's encoding. see this: itg.ias.edu/content/…

– azalea
Nov 15 '18 at 16:57

I modified my code, but the resulting cell contents now look like: å±±è¥¿è instead of the Chinese characters in the input text file. I have read through the links provided, but am unsure how to apply that information to writing to a csv. Thanks in advance.

– steven.m787
Nov 15 '18 at 13:50

Realizing this is an Excel issue. Opening the file in Notepad displays the "correct" characters.

– steven.m787
Nov 15 '18 at 14:01

@steven.m787 you may need to change Excel's encoding. see this: itg.ias.edu/content/…

– azalea
Nov 15 '18 at 16:57

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

4 NVBhLI 09saFxnWMF,xLu4 schj1ZMZ29,8wB

搜尋此網誌

Vfrdtyky