Perl encoding from JSON issue












0














apologies if this is a really stupid question or already asked elsewhere. I'm reading in some JSON and using decode_json on it, then extracting text from it and outputting that to a file.



My problem is that Unicode characters are encoded as eg u2019 in the JSON, decode_json appears to convert this to x{2019}. When I grab this text and output to a UTF8-encoded file, it appears as garbage.



Sample code:



use warnings;
use strict;
use JSON qw( decode_json );
use Data::Dumper;

open IN, $file or die;
binmode IN, ":utf8";
my $data = <IN>;
my $json = decode_json( $data );
open OUT, ">$outfile" or die;
binmode OUT, ":utf8";
binmode STDOUT, ":utf8";
foreach my $textdat (@{ $json->{'results'} }) {
print STDOUT Dumper($textdat);
my $text = $textdat->{'text'};
print OUT "$textn";
}


The Dumper output shows that the u encoding has been converted to x encoding. What am I doing wrong?










share|improve this question



























    0














    apologies if this is a really stupid question or already asked elsewhere. I'm reading in some JSON and using decode_json on it, then extracting text from it and outputting that to a file.



    My problem is that Unicode characters are encoded as eg u2019 in the JSON, decode_json appears to convert this to x{2019}. When I grab this text and output to a UTF8-encoded file, it appears as garbage.



    Sample code:



    use warnings;
    use strict;
    use JSON qw( decode_json );
    use Data::Dumper;

    open IN, $file or die;
    binmode IN, ":utf8";
    my $data = <IN>;
    my $json = decode_json( $data );
    open OUT, ">$outfile" or die;
    binmode OUT, ":utf8";
    binmode STDOUT, ":utf8";
    foreach my $textdat (@{ $json->{'results'} }) {
    print STDOUT Dumper($textdat);
    my $text = $textdat->{'text'};
    print OUT "$textn";
    }


    The Dumper output shows that the u encoding has been converted to x encoding. What am I doing wrong?










    share|improve this question

























      0












      0








      0







      apologies if this is a really stupid question or already asked elsewhere. I'm reading in some JSON and using decode_json on it, then extracting text from it and outputting that to a file.



      My problem is that Unicode characters are encoded as eg u2019 in the JSON, decode_json appears to convert this to x{2019}. When I grab this text and output to a UTF8-encoded file, it appears as garbage.



      Sample code:



      use warnings;
      use strict;
      use JSON qw( decode_json );
      use Data::Dumper;

      open IN, $file or die;
      binmode IN, ":utf8";
      my $data = <IN>;
      my $json = decode_json( $data );
      open OUT, ">$outfile" or die;
      binmode OUT, ":utf8";
      binmode STDOUT, ":utf8";
      foreach my $textdat (@{ $json->{'results'} }) {
      print STDOUT Dumper($textdat);
      my $text = $textdat->{'text'};
      print OUT "$textn";
      }


      The Dumper output shows that the u encoding has been converted to x encoding. What am I doing wrong?










      share|improve this question













      apologies if this is a really stupid question or already asked elsewhere. I'm reading in some JSON and using decode_json on it, then extracting text from it and outputting that to a file.



      My problem is that Unicode characters are encoded as eg u2019 in the JSON, decode_json appears to convert this to x{2019}. When I grab this text and output to a UTF8-encoded file, it appears as garbage.



      Sample code:



      use warnings;
      use strict;
      use JSON qw( decode_json );
      use Data::Dumper;

      open IN, $file or die;
      binmode IN, ":utf8";
      my $data = <IN>;
      my $json = decode_json( $data );
      open OUT, ">$outfile" or die;
      binmode OUT, ":utf8";
      binmode STDOUT, ":utf8";
      foreach my $textdat (@{ $json->{'results'} }) {
      print STDOUT Dumper($textdat);
      my $text = $textdat->{'text'};
      print OUT "$textn";
      }


      The Dumper output shows that the u encoding has been converted to x encoding. What am I doing wrong?







      json perl unicode






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 12 at 15:46









      Dom Glennon

      112




      112
























          2 Answers
          2






          active

          oldest

          votes


















          2














          decode_json needs UTF-8 encoded input, so use from_json instead that accepts unicode:



          my $json = from_json($data);


          Another option would be to encode the data yourself:



          use Encode;

          my $encoded_data = encode('UTF-8', $data);
          ...
          my $json = decode_json($data);


          But it makes little sense to encode data just to decode it.






          share|improve this answer























          • The first option didn't fix it, but from_json did - thank you!
            – Dom Glennon
            Nov 12 at 17:04










          • @DomGlennon: Oh, I forgot to include use Encode, sorry.
            – choroba
            Nov 12 at 17:05



















          2














          decode_json expects UTF-8, but you're passing decoded text (Unicode Code Points) instead.



          So, you could remove the existing character decoding.



          use feature qw( say );
          use open 'std', ':encoding(UTF-8)';
          use JSON qw( decode_json );

          my $json_utf8 = do {
          open(my $fh, '<:raw', $in_qfn)
          or die("Can't open "$in_qfn": $!n");

          local $/;
          <$fh>;
          };

          my $data = decode_json($json_utf8);

          {
          open(my $fh, '>', $out_qfn)
          or die("Can't create "$out_qfn": $!n");

          for my $result (@{ $data->{results} }) {
          say $fh $result->{text};
          }
          }


          Or, you could use from_json (or JSON->new->decode) instead of decode_json.



          use feature qw( say );
          use open 'std', ':encoding(UTF-8)';
          use JSON qw( from_json ); # <---

          my $json_ucp = do {
          open(my $fh, '<', $in_qfn) # <---
          or die("Can't open "$in_qfn": $!n");

          local $/;
          <$fh>;
          };

          my $data = from_json($json_ucp); # <---

          {
          open(my $fh, '>', $out_qfn)
          or die("Can't create "$out_qfn": $!n");

          for my $result (@{ $data->{results} }) {
          say $fh $result->{text};
          }
          }


          The arrows point to the three minor differences between the two snippets.





          I made a number of cleanups.




          • Missing local $/; in case there are line breaks in the JSON.

          • Don't use 2-arg open.

          • Don't needlessly use global variables.

          • Use better names for variables. $data and $json were notably reversed, and $file didn't contain a file.

          • Limit the scope of your variables, especially if they use up system resources (e.g. file handles).

          • Use :encoding(UTF-8) (the standard encoding) instead of :encoding(utf8) (an encoding only used by Perl). :utf8 is even worse as it uses the internal encoding rather than the standard one, and it can lead to corrupt scalars if provided bad input.

          • Get rid of the noisy quotes around identifiers used as hash keys.






          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53265573%2fperl-encoding-from-json-issue%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            2














            decode_json needs UTF-8 encoded input, so use from_json instead that accepts unicode:



            my $json = from_json($data);


            Another option would be to encode the data yourself:



            use Encode;

            my $encoded_data = encode('UTF-8', $data);
            ...
            my $json = decode_json($data);


            But it makes little sense to encode data just to decode it.






            share|improve this answer























            • The first option didn't fix it, but from_json did - thank you!
              – Dom Glennon
              Nov 12 at 17:04










            • @DomGlennon: Oh, I forgot to include use Encode, sorry.
              – choroba
              Nov 12 at 17:05
















            2














            decode_json needs UTF-8 encoded input, so use from_json instead that accepts unicode:



            my $json = from_json($data);


            Another option would be to encode the data yourself:



            use Encode;

            my $encoded_data = encode('UTF-8', $data);
            ...
            my $json = decode_json($data);


            But it makes little sense to encode data just to decode it.






            share|improve this answer























            • The first option didn't fix it, but from_json did - thank you!
              – Dom Glennon
              Nov 12 at 17:04










            • @DomGlennon: Oh, I forgot to include use Encode, sorry.
              – choroba
              Nov 12 at 17:05














            2












            2








            2






            decode_json needs UTF-8 encoded input, so use from_json instead that accepts unicode:



            my $json = from_json($data);


            Another option would be to encode the data yourself:



            use Encode;

            my $encoded_data = encode('UTF-8', $data);
            ...
            my $json = decode_json($data);


            But it makes little sense to encode data just to decode it.






            share|improve this answer














            decode_json needs UTF-8 encoded input, so use from_json instead that accepts unicode:



            my $json = from_json($data);


            Another option would be to encode the data yourself:



            use Encode;

            my $encoded_data = encode('UTF-8', $data);
            ...
            my $json = decode_json($data);


            But it makes little sense to encode data just to decode it.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 12 at 20:00

























            answered Nov 12 at 15:57









            choroba

            154k14140202




            154k14140202












            • The first option didn't fix it, but from_json did - thank you!
              – Dom Glennon
              Nov 12 at 17:04










            • @DomGlennon: Oh, I forgot to include use Encode, sorry.
              – choroba
              Nov 12 at 17:05


















            • The first option didn't fix it, but from_json did - thank you!
              – Dom Glennon
              Nov 12 at 17:04










            • @DomGlennon: Oh, I forgot to include use Encode, sorry.
              – choroba
              Nov 12 at 17:05
















            The first option didn't fix it, but from_json did - thank you!
            – Dom Glennon
            Nov 12 at 17:04




            The first option didn't fix it, but from_json did - thank you!
            – Dom Glennon
            Nov 12 at 17:04












            @DomGlennon: Oh, I forgot to include use Encode, sorry.
            – choroba
            Nov 12 at 17:05




            @DomGlennon: Oh, I forgot to include use Encode, sorry.
            – choroba
            Nov 12 at 17:05













            2














            decode_json expects UTF-8, but you're passing decoded text (Unicode Code Points) instead.



            So, you could remove the existing character decoding.



            use feature qw( say );
            use open 'std', ':encoding(UTF-8)';
            use JSON qw( decode_json );

            my $json_utf8 = do {
            open(my $fh, '<:raw', $in_qfn)
            or die("Can't open "$in_qfn": $!n");

            local $/;
            <$fh>;
            };

            my $data = decode_json($json_utf8);

            {
            open(my $fh, '>', $out_qfn)
            or die("Can't create "$out_qfn": $!n");

            for my $result (@{ $data->{results} }) {
            say $fh $result->{text};
            }
            }


            Or, you could use from_json (or JSON->new->decode) instead of decode_json.



            use feature qw( say );
            use open 'std', ':encoding(UTF-8)';
            use JSON qw( from_json ); # <---

            my $json_ucp = do {
            open(my $fh, '<', $in_qfn) # <---
            or die("Can't open "$in_qfn": $!n");

            local $/;
            <$fh>;
            };

            my $data = from_json($json_ucp); # <---

            {
            open(my $fh, '>', $out_qfn)
            or die("Can't create "$out_qfn": $!n");

            for my $result (@{ $data->{results} }) {
            say $fh $result->{text};
            }
            }


            The arrows point to the three minor differences between the two snippets.





            I made a number of cleanups.




            • Missing local $/; in case there are line breaks in the JSON.

            • Don't use 2-arg open.

            • Don't needlessly use global variables.

            • Use better names for variables. $data and $json were notably reversed, and $file didn't contain a file.

            • Limit the scope of your variables, especially if they use up system resources (e.g. file handles).

            • Use :encoding(UTF-8) (the standard encoding) instead of :encoding(utf8) (an encoding only used by Perl). :utf8 is even worse as it uses the internal encoding rather than the standard one, and it can lead to corrupt scalars if provided bad input.

            • Get rid of the noisy quotes around identifiers used as hash keys.






            share|improve this answer




























              2














              decode_json expects UTF-8, but you're passing decoded text (Unicode Code Points) instead.



              So, you could remove the existing character decoding.



              use feature qw( say );
              use open 'std', ':encoding(UTF-8)';
              use JSON qw( decode_json );

              my $json_utf8 = do {
              open(my $fh, '<:raw', $in_qfn)
              or die("Can't open "$in_qfn": $!n");

              local $/;
              <$fh>;
              };

              my $data = decode_json($json_utf8);

              {
              open(my $fh, '>', $out_qfn)
              or die("Can't create "$out_qfn": $!n");

              for my $result (@{ $data->{results} }) {
              say $fh $result->{text};
              }
              }


              Or, you could use from_json (or JSON->new->decode) instead of decode_json.



              use feature qw( say );
              use open 'std', ':encoding(UTF-8)';
              use JSON qw( from_json ); # <---

              my $json_ucp = do {
              open(my $fh, '<', $in_qfn) # <---
              or die("Can't open "$in_qfn": $!n");

              local $/;
              <$fh>;
              };

              my $data = from_json($json_ucp); # <---

              {
              open(my $fh, '>', $out_qfn)
              or die("Can't create "$out_qfn": $!n");

              for my $result (@{ $data->{results} }) {
              say $fh $result->{text};
              }
              }


              The arrows point to the three minor differences between the two snippets.





              I made a number of cleanups.




              • Missing local $/; in case there are line breaks in the JSON.

              • Don't use 2-arg open.

              • Don't needlessly use global variables.

              • Use better names for variables. $data and $json were notably reversed, and $file didn't contain a file.

              • Limit the scope of your variables, especially if they use up system resources (e.g. file handles).

              • Use :encoding(UTF-8) (the standard encoding) instead of :encoding(utf8) (an encoding only used by Perl). :utf8 is even worse as it uses the internal encoding rather than the standard one, and it can lead to corrupt scalars if provided bad input.

              • Get rid of the noisy quotes around identifiers used as hash keys.






              share|improve this answer


























                2












                2








                2






                decode_json expects UTF-8, but you're passing decoded text (Unicode Code Points) instead.



                So, you could remove the existing character decoding.



                use feature qw( say );
                use open 'std', ':encoding(UTF-8)';
                use JSON qw( decode_json );

                my $json_utf8 = do {
                open(my $fh, '<:raw', $in_qfn)
                or die("Can't open "$in_qfn": $!n");

                local $/;
                <$fh>;
                };

                my $data = decode_json($json_utf8);

                {
                open(my $fh, '>', $out_qfn)
                or die("Can't create "$out_qfn": $!n");

                for my $result (@{ $data->{results} }) {
                say $fh $result->{text};
                }
                }


                Or, you could use from_json (or JSON->new->decode) instead of decode_json.



                use feature qw( say );
                use open 'std', ':encoding(UTF-8)';
                use JSON qw( from_json ); # <---

                my $json_ucp = do {
                open(my $fh, '<', $in_qfn) # <---
                or die("Can't open "$in_qfn": $!n");

                local $/;
                <$fh>;
                };

                my $data = from_json($json_ucp); # <---

                {
                open(my $fh, '>', $out_qfn)
                or die("Can't create "$out_qfn": $!n");

                for my $result (@{ $data->{results} }) {
                say $fh $result->{text};
                }
                }


                The arrows point to the three minor differences between the two snippets.





                I made a number of cleanups.




                • Missing local $/; in case there are line breaks in the JSON.

                • Don't use 2-arg open.

                • Don't needlessly use global variables.

                • Use better names for variables. $data and $json were notably reversed, and $file didn't contain a file.

                • Limit the scope of your variables, especially if they use up system resources (e.g. file handles).

                • Use :encoding(UTF-8) (the standard encoding) instead of :encoding(utf8) (an encoding only used by Perl). :utf8 is even worse as it uses the internal encoding rather than the standard one, and it can lead to corrupt scalars if provided bad input.

                • Get rid of the noisy quotes around identifiers used as hash keys.






                share|improve this answer














                decode_json expects UTF-8, but you're passing decoded text (Unicode Code Points) instead.



                So, you could remove the existing character decoding.



                use feature qw( say );
                use open 'std', ':encoding(UTF-8)';
                use JSON qw( decode_json );

                my $json_utf8 = do {
                open(my $fh, '<:raw', $in_qfn)
                or die("Can't open "$in_qfn": $!n");

                local $/;
                <$fh>;
                };

                my $data = decode_json($json_utf8);

                {
                open(my $fh, '>', $out_qfn)
                or die("Can't create "$out_qfn": $!n");

                for my $result (@{ $data->{results} }) {
                say $fh $result->{text};
                }
                }


                Or, you could use from_json (or JSON->new->decode) instead of decode_json.



                use feature qw( say );
                use open 'std', ':encoding(UTF-8)';
                use JSON qw( from_json ); # <---

                my $json_ucp = do {
                open(my $fh, '<', $in_qfn) # <---
                or die("Can't open "$in_qfn": $!n");

                local $/;
                <$fh>;
                };

                my $data = from_json($json_ucp); # <---

                {
                open(my $fh, '>', $out_qfn)
                or die("Can't create "$out_qfn": $!n");

                for my $result (@{ $data->{results} }) {
                say $fh $result->{text};
                }
                }


                The arrows point to the three minor differences between the two snippets.





                I made a number of cleanups.




                • Missing local $/; in case there are line breaks in the JSON.

                • Don't use 2-arg open.

                • Don't needlessly use global variables.

                • Use better names for variables. $data and $json were notably reversed, and $file didn't contain a file.

                • Limit the scope of your variables, especially if they use up system resources (e.g. file handles).

                • Use :encoding(UTF-8) (the standard encoding) instead of :encoding(utf8) (an encoding only used by Perl). :utf8 is even worse as it uses the internal encoding rather than the standard one, and it can lead to corrupt scalars if provided bad input.

                • Get rid of the noisy quotes around identifiers used as hash keys.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 13 at 1:06

























                answered Nov 12 at 17:18









                ikegami

                261k11176396




                261k11176396






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53265573%2fperl-encoding-from-json-issue%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Bressuire

                    Vorschmack

                    Quarantine