Perl encoding from JSON issue
apologies if this is a really stupid question or already asked elsewhere. I'm reading in some JSON and using decode_json on it, then extracting text from it and outputting that to a file.
My problem is that Unicode characters are encoded as eg u2019 in the JSON, decode_json appears to convert this to x{2019}. When I grab this text and output to a UTF8-encoded file, it appears as garbage.
Sample code:
use warnings;
use strict;
use JSON qw( decode_json );
use Data::Dumper;
open IN, $file or die;
binmode IN, ":utf8";
my $data = <IN>;
my $json = decode_json( $data );
open OUT, ">$outfile" or die;
binmode OUT, ":utf8";
binmode STDOUT, ":utf8";
foreach my $textdat (@{ $json->{'results'} }) {
print STDOUT Dumper($textdat);
my $text = $textdat->{'text'};
print OUT "$textn";
}
The Dumper output shows that the u encoding has been converted to x encoding. What am I doing wrong?
json perl unicode
add a comment |
apologies if this is a really stupid question or already asked elsewhere. I'm reading in some JSON and using decode_json on it, then extracting text from it and outputting that to a file.
My problem is that Unicode characters are encoded as eg u2019 in the JSON, decode_json appears to convert this to x{2019}. When I grab this text and output to a UTF8-encoded file, it appears as garbage.
Sample code:
use warnings;
use strict;
use JSON qw( decode_json );
use Data::Dumper;
open IN, $file or die;
binmode IN, ":utf8";
my $data = <IN>;
my $json = decode_json( $data );
open OUT, ">$outfile" or die;
binmode OUT, ":utf8";
binmode STDOUT, ":utf8";
foreach my $textdat (@{ $json->{'results'} }) {
print STDOUT Dumper($textdat);
my $text = $textdat->{'text'};
print OUT "$textn";
}
The Dumper output shows that the u encoding has been converted to x encoding. What am I doing wrong?
json perl unicode
add a comment |
apologies if this is a really stupid question or already asked elsewhere. I'm reading in some JSON and using decode_json on it, then extracting text from it and outputting that to a file.
My problem is that Unicode characters are encoded as eg u2019 in the JSON, decode_json appears to convert this to x{2019}. When I grab this text and output to a UTF8-encoded file, it appears as garbage.
Sample code:
use warnings;
use strict;
use JSON qw( decode_json );
use Data::Dumper;
open IN, $file or die;
binmode IN, ":utf8";
my $data = <IN>;
my $json = decode_json( $data );
open OUT, ">$outfile" or die;
binmode OUT, ":utf8";
binmode STDOUT, ":utf8";
foreach my $textdat (@{ $json->{'results'} }) {
print STDOUT Dumper($textdat);
my $text = $textdat->{'text'};
print OUT "$textn";
}
The Dumper output shows that the u encoding has been converted to x encoding. What am I doing wrong?
json perl unicode
apologies if this is a really stupid question or already asked elsewhere. I'm reading in some JSON and using decode_json on it, then extracting text from it and outputting that to a file.
My problem is that Unicode characters are encoded as eg u2019 in the JSON, decode_json appears to convert this to x{2019}. When I grab this text and output to a UTF8-encoded file, it appears as garbage.
Sample code:
use warnings;
use strict;
use JSON qw( decode_json );
use Data::Dumper;
open IN, $file or die;
binmode IN, ":utf8";
my $data = <IN>;
my $json = decode_json( $data );
open OUT, ">$outfile" or die;
binmode OUT, ":utf8";
binmode STDOUT, ":utf8";
foreach my $textdat (@{ $json->{'results'} }) {
print STDOUT Dumper($textdat);
my $text = $textdat->{'text'};
print OUT "$textn";
}
The Dumper output shows that the u encoding has been converted to x encoding. What am I doing wrong?
json perl unicode
json perl unicode
asked Nov 12 at 15:46
Dom Glennon
112
112
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
decode_json
needs UTF-8 encoded input, so use from_json
instead that accepts unicode:
my $json = from_json($data);
Another option would be to encode the data yourself:
use Encode;
my $encoded_data = encode('UTF-8', $data);
...
my $json = decode_json($data);
But it makes little sense to encode data just to decode it.
The first option didn't fix it, but from_json did - thank you!
– Dom Glennon
Nov 12 at 17:04
@DomGlennon: Oh, I forgot to includeuse Encode
, sorry.
– choroba
Nov 12 at 17:05
add a comment |
decode_json
expects UTF-8, but you're passing decoded text (Unicode Code Points) instead.
So, you could remove the existing character decoding.
use feature qw( say );
use open 'std', ':encoding(UTF-8)';
use JSON qw( decode_json );
my $json_utf8 = do {
open(my $fh, '<:raw', $in_qfn)
or die("Can't open "$in_qfn": $!n");
local $/;
<$fh>;
};
my $data = decode_json($json_utf8);
{
open(my $fh, '>', $out_qfn)
or die("Can't create "$out_qfn": $!n");
for my $result (@{ $data->{results} }) {
say $fh $result->{text};
}
}
Or, you could use from_json
(or JSON->new->decode
) instead of decode_json
.
use feature qw( say );
use open 'std', ':encoding(UTF-8)';
use JSON qw( from_json ); # <---
my $json_ucp = do {
open(my $fh, '<', $in_qfn) # <---
or die("Can't open "$in_qfn": $!n");
local $/;
<$fh>;
};
my $data = from_json($json_ucp); # <---
{
open(my $fh, '>', $out_qfn)
or die("Can't create "$out_qfn": $!n");
for my $result (@{ $data->{results} }) {
say $fh $result->{text};
}
}
The arrows point to the three minor differences between the two snippets.
I made a number of cleanups.
- Missing
local $/;
in case there are line breaks in the JSON. - Don't use 2-arg
open
. - Don't needlessly use global variables.
- Use better names for variables.
$data
and$json
were notably reversed, and$file
didn't contain a file. - Limit the scope of your variables, especially if they use up system resources (e.g. file handles).
- Use
:encoding(UTF-8)
(the standard encoding) instead of:encoding(utf8)
(an encoding only used by Perl).:utf8
is even worse as it uses the internal encoding rather than the standard one, and it can lead to corrupt scalars if provided bad input. - Get rid of the noisy quotes around identifiers used as hash keys.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53265573%2fperl-encoding-from-json-issue%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
decode_json
needs UTF-8 encoded input, so use from_json
instead that accepts unicode:
my $json = from_json($data);
Another option would be to encode the data yourself:
use Encode;
my $encoded_data = encode('UTF-8', $data);
...
my $json = decode_json($data);
But it makes little sense to encode data just to decode it.
The first option didn't fix it, but from_json did - thank you!
– Dom Glennon
Nov 12 at 17:04
@DomGlennon: Oh, I forgot to includeuse Encode
, sorry.
– choroba
Nov 12 at 17:05
add a comment |
decode_json
needs UTF-8 encoded input, so use from_json
instead that accepts unicode:
my $json = from_json($data);
Another option would be to encode the data yourself:
use Encode;
my $encoded_data = encode('UTF-8', $data);
...
my $json = decode_json($data);
But it makes little sense to encode data just to decode it.
The first option didn't fix it, but from_json did - thank you!
– Dom Glennon
Nov 12 at 17:04
@DomGlennon: Oh, I forgot to includeuse Encode
, sorry.
– choroba
Nov 12 at 17:05
add a comment |
decode_json
needs UTF-8 encoded input, so use from_json
instead that accepts unicode:
my $json = from_json($data);
Another option would be to encode the data yourself:
use Encode;
my $encoded_data = encode('UTF-8', $data);
...
my $json = decode_json($data);
But it makes little sense to encode data just to decode it.
decode_json
needs UTF-8 encoded input, so use from_json
instead that accepts unicode:
my $json = from_json($data);
Another option would be to encode the data yourself:
use Encode;
my $encoded_data = encode('UTF-8', $data);
...
my $json = decode_json($data);
But it makes little sense to encode data just to decode it.
edited Nov 12 at 20:00
answered Nov 12 at 15:57
choroba
154k14140202
154k14140202
The first option didn't fix it, but from_json did - thank you!
– Dom Glennon
Nov 12 at 17:04
@DomGlennon: Oh, I forgot to includeuse Encode
, sorry.
– choroba
Nov 12 at 17:05
add a comment |
The first option didn't fix it, but from_json did - thank you!
– Dom Glennon
Nov 12 at 17:04
@DomGlennon: Oh, I forgot to includeuse Encode
, sorry.
– choroba
Nov 12 at 17:05
The first option didn't fix it, but from_json did - thank you!
– Dom Glennon
Nov 12 at 17:04
The first option didn't fix it, but from_json did - thank you!
– Dom Glennon
Nov 12 at 17:04
@DomGlennon: Oh, I forgot to include
use Encode
, sorry.– choroba
Nov 12 at 17:05
@DomGlennon: Oh, I forgot to include
use Encode
, sorry.– choroba
Nov 12 at 17:05
add a comment |
decode_json
expects UTF-8, but you're passing decoded text (Unicode Code Points) instead.
So, you could remove the existing character decoding.
use feature qw( say );
use open 'std', ':encoding(UTF-8)';
use JSON qw( decode_json );
my $json_utf8 = do {
open(my $fh, '<:raw', $in_qfn)
or die("Can't open "$in_qfn": $!n");
local $/;
<$fh>;
};
my $data = decode_json($json_utf8);
{
open(my $fh, '>', $out_qfn)
or die("Can't create "$out_qfn": $!n");
for my $result (@{ $data->{results} }) {
say $fh $result->{text};
}
}
Or, you could use from_json
(or JSON->new->decode
) instead of decode_json
.
use feature qw( say );
use open 'std', ':encoding(UTF-8)';
use JSON qw( from_json ); # <---
my $json_ucp = do {
open(my $fh, '<', $in_qfn) # <---
or die("Can't open "$in_qfn": $!n");
local $/;
<$fh>;
};
my $data = from_json($json_ucp); # <---
{
open(my $fh, '>', $out_qfn)
or die("Can't create "$out_qfn": $!n");
for my $result (@{ $data->{results} }) {
say $fh $result->{text};
}
}
The arrows point to the three minor differences between the two snippets.
I made a number of cleanups.
- Missing
local $/;
in case there are line breaks in the JSON. - Don't use 2-arg
open
. - Don't needlessly use global variables.
- Use better names for variables.
$data
and$json
were notably reversed, and$file
didn't contain a file. - Limit the scope of your variables, especially if they use up system resources (e.g. file handles).
- Use
:encoding(UTF-8)
(the standard encoding) instead of:encoding(utf8)
(an encoding only used by Perl).:utf8
is even worse as it uses the internal encoding rather than the standard one, and it can lead to corrupt scalars if provided bad input. - Get rid of the noisy quotes around identifiers used as hash keys.
add a comment |
decode_json
expects UTF-8, but you're passing decoded text (Unicode Code Points) instead.
So, you could remove the existing character decoding.
use feature qw( say );
use open 'std', ':encoding(UTF-8)';
use JSON qw( decode_json );
my $json_utf8 = do {
open(my $fh, '<:raw', $in_qfn)
or die("Can't open "$in_qfn": $!n");
local $/;
<$fh>;
};
my $data = decode_json($json_utf8);
{
open(my $fh, '>', $out_qfn)
or die("Can't create "$out_qfn": $!n");
for my $result (@{ $data->{results} }) {
say $fh $result->{text};
}
}
Or, you could use from_json
(or JSON->new->decode
) instead of decode_json
.
use feature qw( say );
use open 'std', ':encoding(UTF-8)';
use JSON qw( from_json ); # <---
my $json_ucp = do {
open(my $fh, '<', $in_qfn) # <---
or die("Can't open "$in_qfn": $!n");
local $/;
<$fh>;
};
my $data = from_json($json_ucp); # <---
{
open(my $fh, '>', $out_qfn)
or die("Can't create "$out_qfn": $!n");
for my $result (@{ $data->{results} }) {
say $fh $result->{text};
}
}
The arrows point to the three minor differences between the two snippets.
I made a number of cleanups.
- Missing
local $/;
in case there are line breaks in the JSON. - Don't use 2-arg
open
. - Don't needlessly use global variables.
- Use better names for variables.
$data
and$json
were notably reversed, and$file
didn't contain a file. - Limit the scope of your variables, especially if they use up system resources (e.g. file handles).
- Use
:encoding(UTF-8)
(the standard encoding) instead of:encoding(utf8)
(an encoding only used by Perl).:utf8
is even worse as it uses the internal encoding rather than the standard one, and it can lead to corrupt scalars if provided bad input. - Get rid of the noisy quotes around identifiers used as hash keys.
add a comment |
decode_json
expects UTF-8, but you're passing decoded text (Unicode Code Points) instead.
So, you could remove the existing character decoding.
use feature qw( say );
use open 'std', ':encoding(UTF-8)';
use JSON qw( decode_json );
my $json_utf8 = do {
open(my $fh, '<:raw', $in_qfn)
or die("Can't open "$in_qfn": $!n");
local $/;
<$fh>;
};
my $data = decode_json($json_utf8);
{
open(my $fh, '>', $out_qfn)
or die("Can't create "$out_qfn": $!n");
for my $result (@{ $data->{results} }) {
say $fh $result->{text};
}
}
Or, you could use from_json
(or JSON->new->decode
) instead of decode_json
.
use feature qw( say );
use open 'std', ':encoding(UTF-8)';
use JSON qw( from_json ); # <---
my $json_ucp = do {
open(my $fh, '<', $in_qfn) # <---
or die("Can't open "$in_qfn": $!n");
local $/;
<$fh>;
};
my $data = from_json($json_ucp); # <---
{
open(my $fh, '>', $out_qfn)
or die("Can't create "$out_qfn": $!n");
for my $result (@{ $data->{results} }) {
say $fh $result->{text};
}
}
The arrows point to the three minor differences between the two snippets.
I made a number of cleanups.
- Missing
local $/;
in case there are line breaks in the JSON. - Don't use 2-arg
open
. - Don't needlessly use global variables.
- Use better names for variables.
$data
and$json
were notably reversed, and$file
didn't contain a file. - Limit the scope of your variables, especially if they use up system resources (e.g. file handles).
- Use
:encoding(UTF-8)
(the standard encoding) instead of:encoding(utf8)
(an encoding only used by Perl).:utf8
is even worse as it uses the internal encoding rather than the standard one, and it can lead to corrupt scalars if provided bad input. - Get rid of the noisy quotes around identifiers used as hash keys.
decode_json
expects UTF-8, but you're passing decoded text (Unicode Code Points) instead.
So, you could remove the existing character decoding.
use feature qw( say );
use open 'std', ':encoding(UTF-8)';
use JSON qw( decode_json );
my $json_utf8 = do {
open(my $fh, '<:raw', $in_qfn)
or die("Can't open "$in_qfn": $!n");
local $/;
<$fh>;
};
my $data = decode_json($json_utf8);
{
open(my $fh, '>', $out_qfn)
or die("Can't create "$out_qfn": $!n");
for my $result (@{ $data->{results} }) {
say $fh $result->{text};
}
}
Or, you could use from_json
(or JSON->new->decode
) instead of decode_json
.
use feature qw( say );
use open 'std', ':encoding(UTF-8)';
use JSON qw( from_json ); # <---
my $json_ucp = do {
open(my $fh, '<', $in_qfn) # <---
or die("Can't open "$in_qfn": $!n");
local $/;
<$fh>;
};
my $data = from_json($json_ucp); # <---
{
open(my $fh, '>', $out_qfn)
or die("Can't create "$out_qfn": $!n");
for my $result (@{ $data->{results} }) {
say $fh $result->{text};
}
}
The arrows point to the three minor differences between the two snippets.
I made a number of cleanups.
- Missing
local $/;
in case there are line breaks in the JSON. - Don't use 2-arg
open
. - Don't needlessly use global variables.
- Use better names for variables.
$data
and$json
were notably reversed, and$file
didn't contain a file. - Limit the scope of your variables, especially if they use up system resources (e.g. file handles).
- Use
:encoding(UTF-8)
(the standard encoding) instead of:encoding(utf8)
(an encoding only used by Perl).:utf8
is even worse as it uses the internal encoding rather than the standard one, and it can lead to corrupt scalars if provided bad input. - Get rid of the noisy quotes around identifiers used as hash keys.
edited Nov 13 at 1:06
answered Nov 12 at 17:18
ikegami
261k11176396
261k11176396
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53265573%2fperl-encoding-from-json-issue%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown