Parsing JSON with Bash tools





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







1















I am trying to parse a 3 gb json file for specific columns. The columns are to be extracted from twitter json file as "full_text", "created_at", "user.location", "id".



The pandas in jupyter notebook hang my computer for hours. therefore I use bash shell script for faster processing.



my code for extracting 'full_text' columns are as follows.



%%bash -s "$raw_data_path" "$store_file"
grep -Po '"full_text":.*?[^\]",' < $1 > $2


This is referenced from the url: Parsing JSON with Unix tools
I need the four columns just as I mentioned and how to load this into a dataframe in jupyter notebook.



Please see that I am saving the filtered results into a new json file but it is more like a string container and the extracted results for full_text appear as follows.



"full_text": "Good news for hockey in Pakistan as Haier Pakistan becomes the main sponsor of the Pakistan Hockey team .......,
"full_text": "RT @GerardBattenMEP: How low we have sunk. Our Govnt cannot give sanctuary to a woman persecuted by ......,
"full_text": "How low we have sunk. Our Govnt cannot give sanctuary to a woman persecuted by moronic savages in Pakistan because we have so many of the same moronic savaged .......,









share|improve this question























  • Please follow the Minimal, Complete, and Verifiable example guidelines.

    – peak
    Nov 18 '18 at 9:21


















1















I am trying to parse a 3 gb json file for specific columns. The columns are to be extracted from twitter json file as "full_text", "created_at", "user.location", "id".



The pandas in jupyter notebook hang my computer for hours. therefore I use bash shell script for faster processing.



my code for extracting 'full_text' columns are as follows.



%%bash -s "$raw_data_path" "$store_file"
grep -Po '"full_text":.*?[^\]",' < $1 > $2


This is referenced from the url: Parsing JSON with Unix tools
I need the four columns just as I mentioned and how to load this into a dataframe in jupyter notebook.



Please see that I am saving the filtered results into a new json file but it is more like a string container and the extracted results for full_text appear as follows.



"full_text": "Good news for hockey in Pakistan as Haier Pakistan becomes the main sponsor of the Pakistan Hockey team .......,
"full_text": "RT @GerardBattenMEP: How low we have sunk. Our Govnt cannot give sanctuary to a woman persecuted by ......,
"full_text": "How low we have sunk. Our Govnt cannot give sanctuary to a woman persecuted by moronic savages in Pakistan because we have so many of the same moronic savaged .......,









share|improve this question























  • Please follow the Minimal, Complete, and Verifiable example guidelines.

    – peak
    Nov 18 '18 at 9:21














1












1








1








I am trying to parse a 3 gb json file for specific columns. The columns are to be extracted from twitter json file as "full_text", "created_at", "user.location", "id".



The pandas in jupyter notebook hang my computer for hours. therefore I use bash shell script for faster processing.



my code for extracting 'full_text' columns are as follows.



%%bash -s "$raw_data_path" "$store_file"
grep -Po '"full_text":.*?[^\]",' < $1 > $2


This is referenced from the url: Parsing JSON with Unix tools
I need the four columns just as I mentioned and how to load this into a dataframe in jupyter notebook.



Please see that I am saving the filtered results into a new json file but it is more like a string container and the extracted results for full_text appear as follows.



"full_text": "Good news for hockey in Pakistan as Haier Pakistan becomes the main sponsor of the Pakistan Hockey team .......,
"full_text": "RT @GerardBattenMEP: How low we have sunk. Our Govnt cannot give sanctuary to a woman persecuted by ......,
"full_text": "How low we have sunk. Our Govnt cannot give sanctuary to a woman persecuted by moronic savages in Pakistan because we have so many of the same moronic savaged .......,









share|improve this question














I am trying to parse a 3 gb json file for specific columns. The columns are to be extracted from twitter json file as "full_text", "created_at", "user.location", "id".



The pandas in jupyter notebook hang my computer for hours. therefore I use bash shell script for faster processing.



my code for extracting 'full_text' columns are as follows.



%%bash -s "$raw_data_path" "$store_file"
grep -Po '"full_text":.*?[^\]",' < $1 > $2


This is referenced from the url: Parsing JSON with Unix tools
I need the four columns just as I mentioned and how to load this into a dataframe in jupyter notebook.



Please see that I am saving the filtered results into a new json file but it is more like a string container and the extracted results for full_text appear as follows.



"full_text": "Good news for hockey in Pakistan as Haier Pakistan becomes the main sponsor of the Pakistan Hockey team .......,
"full_text": "RT @GerardBattenMEP: How low we have sunk. Our Govnt cannot give sanctuary to a woman persecuted by ......,
"full_text": "How low we have sunk. Our Govnt cannot give sanctuary to a woman persecuted by moronic savages in Pakistan because we have so many of the same moronic savaged .......,






python json bash twitter jupyter-notebook






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 17 '18 at 4:52







user9917517




















  • Please follow the Minimal, Complete, and Verifiable example guidelines.

    – peak
    Nov 18 '18 at 9:21



















  • Please follow the Minimal, Complete, and Verifiable example guidelines.

    – peak
    Nov 18 '18 at 9:21

















Please follow the Minimal, Complete, and Verifiable example guidelines.

– peak
Nov 18 '18 at 9:21





Please follow the Minimal, Complete, and Verifiable example guidelines.

– peak
Nov 18 '18 at 9:21












1 Answer
1






active

oldest

votes


















1














The first answer to your linked question should give you a clue how to trim out 4 columns



https://stackoverflow.com/a/1955555/1542667



jq -r '[.full_text, .col2, .col3, .col4] | @csv' < $raw_data_path > $store_file





share|improve this answer
























  • just a little more explanation. what @csv does. Is output is also a json file or a csv file.

    – user9917517
    Nov 17 '18 at 7:53











  • CSV. It may be cheaper to read the data frame from csv or tsv

    – Yuri Schimke
    Nov 17 '18 at 7:55












Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53348343%2fparsing-json-with-bash-tools%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown
























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














The first answer to your linked question should give you a clue how to trim out 4 columns



https://stackoverflow.com/a/1955555/1542667



jq -r '[.full_text, .col2, .col3, .col4] | @csv' < $raw_data_path > $store_file





share|improve this answer
























  • just a little more explanation. what @csv does. Is output is also a json file or a csv file.

    – user9917517
    Nov 17 '18 at 7:53











  • CSV. It may be cheaper to read the data frame from csv or tsv

    – Yuri Schimke
    Nov 17 '18 at 7:55
















1














The first answer to your linked question should give you a clue how to trim out 4 columns



https://stackoverflow.com/a/1955555/1542667



jq -r '[.full_text, .col2, .col3, .col4] | @csv' < $raw_data_path > $store_file





share|improve this answer
























  • just a little more explanation. what @csv does. Is output is also a json file or a csv file.

    – user9917517
    Nov 17 '18 at 7:53











  • CSV. It may be cheaper to read the data frame from csv or tsv

    – Yuri Schimke
    Nov 17 '18 at 7:55














1












1








1







The first answer to your linked question should give you a clue how to trim out 4 columns



https://stackoverflow.com/a/1955555/1542667



jq -r '[.full_text, .col2, .col3, .col4] | @csv' < $raw_data_path > $store_file





share|improve this answer













The first answer to your linked question should give you a clue how to trim out 4 columns



https://stackoverflow.com/a/1955555/1542667



jq -r '[.full_text, .col2, .col3, .col4] | @csv' < $raw_data_path > $store_file






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 17 '18 at 6:24









Yuri SchimkeYuri Schimke

3,48721322




3,48721322













  • just a little more explanation. what @csv does. Is output is also a json file or a csv file.

    – user9917517
    Nov 17 '18 at 7:53











  • CSV. It may be cheaper to read the data frame from csv or tsv

    – Yuri Schimke
    Nov 17 '18 at 7:55



















  • just a little more explanation. what @csv does. Is output is also a json file or a csv file.

    – user9917517
    Nov 17 '18 at 7:53











  • CSV. It may be cheaper to read the data frame from csv or tsv

    – Yuri Schimke
    Nov 17 '18 at 7:55

















just a little more explanation. what @csv does. Is output is also a json file or a csv file.

– user9917517
Nov 17 '18 at 7:53





just a little more explanation. what @csv does. Is output is also a json file or a csv file.

– user9917517
Nov 17 '18 at 7:53













CSV. It may be cheaper to read the data frame from csv or tsv

– Yuri Schimke
Nov 17 '18 at 7:55





CSV. It may be cheaper to read the data frame from csv or tsv

– Yuri Schimke
Nov 17 '18 at 7:55




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53348343%2fparsing-json-with-bash-tools%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Bressuire

Vorschmack

Quarantine