Parsing JSON with Bash tools
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I am trying to parse a 3 gb json file for specific columns. The columns are to be extracted from twitter json file as "full_text", "created_at", "user.location", "id".
The pandas in jupyter notebook hang my computer for hours. therefore I use bash shell script for faster processing.
my code for extracting 'full_text' columns are as follows.
%%bash -s "$raw_data_path" "$store_file"
grep -Po '"full_text":.*?[^\]",' < $1 > $2
This is referenced from the url: Parsing JSON with Unix tools
I need the four columns just as I mentioned and how to load this into a dataframe in jupyter notebook.
Please see that I am saving the filtered results into a new json file but it is more like a string container and the extracted results for full_text
appear as follows.
"full_text": "Good news for hockey in Pakistan as Haier Pakistan becomes the main sponsor of the Pakistan Hockey team .......,
"full_text": "RT @GerardBattenMEP: How low we have sunk. Our Govnt cannot give sanctuary to a woman persecuted by ......,
"full_text": "How low we have sunk. Our Govnt cannot give sanctuary to a woman persecuted by moronic savages in Pakistan because we have so many of the same moronic savaged .......,
python json bash twitter jupyter-notebook
add a comment |
I am trying to parse a 3 gb json file for specific columns. The columns are to be extracted from twitter json file as "full_text", "created_at", "user.location", "id".
The pandas in jupyter notebook hang my computer for hours. therefore I use bash shell script for faster processing.
my code for extracting 'full_text' columns are as follows.
%%bash -s "$raw_data_path" "$store_file"
grep -Po '"full_text":.*?[^\]",' < $1 > $2
This is referenced from the url: Parsing JSON with Unix tools
I need the four columns just as I mentioned and how to load this into a dataframe in jupyter notebook.
Please see that I am saving the filtered results into a new json file but it is more like a string container and the extracted results for full_text
appear as follows.
"full_text": "Good news for hockey in Pakistan as Haier Pakistan becomes the main sponsor of the Pakistan Hockey team .......,
"full_text": "RT @GerardBattenMEP: How low we have sunk. Our Govnt cannot give sanctuary to a woman persecuted by ......,
"full_text": "How low we have sunk. Our Govnt cannot give sanctuary to a woman persecuted by moronic savages in Pakistan because we have so many of the same moronic savaged .......,
python json bash twitter jupyter-notebook
Please follow the Minimal, Complete, and Verifiable example guidelines.
– peak
Nov 18 '18 at 9:21
add a comment |
I am trying to parse a 3 gb json file for specific columns. The columns are to be extracted from twitter json file as "full_text", "created_at", "user.location", "id".
The pandas in jupyter notebook hang my computer for hours. therefore I use bash shell script for faster processing.
my code for extracting 'full_text' columns are as follows.
%%bash -s "$raw_data_path" "$store_file"
grep -Po '"full_text":.*?[^\]",' < $1 > $2
This is referenced from the url: Parsing JSON with Unix tools
I need the four columns just as I mentioned and how to load this into a dataframe in jupyter notebook.
Please see that I am saving the filtered results into a new json file but it is more like a string container and the extracted results for full_text
appear as follows.
"full_text": "Good news for hockey in Pakistan as Haier Pakistan becomes the main sponsor of the Pakistan Hockey team .......,
"full_text": "RT @GerardBattenMEP: How low we have sunk. Our Govnt cannot give sanctuary to a woman persecuted by ......,
"full_text": "How low we have sunk. Our Govnt cannot give sanctuary to a woman persecuted by moronic savages in Pakistan because we have so many of the same moronic savaged .......,
python json bash twitter jupyter-notebook
I am trying to parse a 3 gb json file for specific columns. The columns are to be extracted from twitter json file as "full_text", "created_at", "user.location", "id".
The pandas in jupyter notebook hang my computer for hours. therefore I use bash shell script for faster processing.
my code for extracting 'full_text' columns are as follows.
%%bash -s "$raw_data_path" "$store_file"
grep -Po '"full_text":.*?[^\]",' < $1 > $2
This is referenced from the url: Parsing JSON with Unix tools
I need the four columns just as I mentioned and how to load this into a dataframe in jupyter notebook.
Please see that I am saving the filtered results into a new json file but it is more like a string container and the extracted results for full_text
appear as follows.
"full_text": "Good news for hockey in Pakistan as Haier Pakistan becomes the main sponsor of the Pakistan Hockey team .......,
"full_text": "RT @GerardBattenMEP: How low we have sunk. Our Govnt cannot give sanctuary to a woman persecuted by ......,
"full_text": "How low we have sunk. Our Govnt cannot give sanctuary to a woman persecuted by moronic savages in Pakistan because we have so many of the same moronic savaged .......,
python json bash twitter jupyter-notebook
python json bash twitter jupyter-notebook
asked Nov 17 '18 at 4:52
user9917517
Please follow the Minimal, Complete, and Verifiable example guidelines.
– peak
Nov 18 '18 at 9:21
add a comment |
Please follow the Minimal, Complete, and Verifiable example guidelines.
– peak
Nov 18 '18 at 9:21
Please follow the Minimal, Complete, and Verifiable example guidelines.
– peak
Nov 18 '18 at 9:21
Please follow the Minimal, Complete, and Verifiable example guidelines.
– peak
Nov 18 '18 at 9:21
add a comment |
1 Answer
1
active
oldest
votes
The first answer to your linked question should give you a clue how to trim out 4 columns
https://stackoverflow.com/a/1955555/1542667
jq -r '[.full_text, .col2, .col3, .col4] | @csv' < $raw_data_path > $store_file
just a little more explanation. what @csv does. Is output is also a json file or a csv file.
– user9917517
Nov 17 '18 at 7:53
CSV. It may be cheaper to read the data frame from csv or tsv
– Yuri Schimke
Nov 17 '18 at 7:55
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53348343%2fparsing-json-with-bash-tools%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The first answer to your linked question should give you a clue how to trim out 4 columns
https://stackoverflow.com/a/1955555/1542667
jq -r '[.full_text, .col2, .col3, .col4] | @csv' < $raw_data_path > $store_file
just a little more explanation. what @csv does. Is output is also a json file or a csv file.
– user9917517
Nov 17 '18 at 7:53
CSV. It may be cheaper to read the data frame from csv or tsv
– Yuri Schimke
Nov 17 '18 at 7:55
add a comment |
The first answer to your linked question should give you a clue how to trim out 4 columns
https://stackoverflow.com/a/1955555/1542667
jq -r '[.full_text, .col2, .col3, .col4] | @csv' < $raw_data_path > $store_file
just a little more explanation. what @csv does. Is output is also a json file or a csv file.
– user9917517
Nov 17 '18 at 7:53
CSV. It may be cheaper to read the data frame from csv or tsv
– Yuri Schimke
Nov 17 '18 at 7:55
add a comment |
The first answer to your linked question should give you a clue how to trim out 4 columns
https://stackoverflow.com/a/1955555/1542667
jq -r '[.full_text, .col2, .col3, .col4] | @csv' < $raw_data_path > $store_file
The first answer to your linked question should give you a clue how to trim out 4 columns
https://stackoverflow.com/a/1955555/1542667
jq -r '[.full_text, .col2, .col3, .col4] | @csv' < $raw_data_path > $store_file
answered Nov 17 '18 at 6:24
Yuri SchimkeYuri Schimke
3,48721322
3,48721322
just a little more explanation. what @csv does. Is output is also a json file or a csv file.
– user9917517
Nov 17 '18 at 7:53
CSV. It may be cheaper to read the data frame from csv or tsv
– Yuri Schimke
Nov 17 '18 at 7:55
add a comment |
just a little more explanation. what @csv does. Is output is also a json file or a csv file.
– user9917517
Nov 17 '18 at 7:53
CSV. It may be cheaper to read the data frame from csv or tsv
– Yuri Schimke
Nov 17 '18 at 7:55
just a little more explanation. what @csv does. Is output is also a json file or a csv file.
– user9917517
Nov 17 '18 at 7:53
just a little more explanation. what @csv does. Is output is also a json file or a csv file.
– user9917517
Nov 17 '18 at 7:53
CSV. It may be cheaper to read the data frame from csv or tsv
– Yuri Schimke
Nov 17 '18 at 7:55
CSV. It may be cheaper to read the data frame from csv or tsv
– Yuri Schimke
Nov 17 '18 at 7:55
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53348343%2fparsing-json-with-bash-tools%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Please follow the Minimal, Complete, and Verifiable example guidelines.
– peak
Nov 18 '18 at 9:21