what is the difference between `json.loads()` and `.apply(json.loads)`?
up vote
-1
down vote
favorite
I am quite new to coding, and now I am trying to work on TMDB_5000 dataset from kaggle.
I ran into a problem when trying to deal with json format data like this.
[{"cast_id": 242, "character": "Jake Sully", "credit_id": "5602a8a7c3a3685532001c9a", "gender": 2, "id": 65731, "name": "Sam Worthington", "order": 0}, {"cast_id": 3, "character": "Neytiri", "credit_i...}]
I am trying to use json.loads()
to deal with data, the code is credits['cast'] = json.loads(credits['cast'])
. But it give me an error like this
---------------------------------------------------------------------------
TypeError Traceback (most recent call
last)
in ()
----> 1 credits['cast'] = json.loads(credits['cast'])
/anaconda3/lib/python3.6/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant,
object_pairs_hook, **kw)
346 if not isinstance(s, (bytes, bytearray)):
347 raise TypeError('the JSON object must be str, bytes or bytearray, '
--> 348 'not {!r}'.format(s.class.name))
349 s = s.decode(detect_encoding(s), 'surrogatepass')
350
TypeError: the JSON object must be str, bytes or bytearray, not 'Series'
However, the code credits['cast'] = credits['cast'].apply(json.loads)
works. So I am very confused, because I think there isn't difference between this two lines of code.
Can anyone explain that to me?
python pandas
add a comment |
up vote
-1
down vote
favorite
I am quite new to coding, and now I am trying to work on TMDB_5000 dataset from kaggle.
I ran into a problem when trying to deal with json format data like this.
[{"cast_id": 242, "character": "Jake Sully", "credit_id": "5602a8a7c3a3685532001c9a", "gender": 2, "id": 65731, "name": "Sam Worthington", "order": 0}, {"cast_id": 3, "character": "Neytiri", "credit_i...}]
I am trying to use json.loads()
to deal with data, the code is credits['cast'] = json.loads(credits['cast'])
. But it give me an error like this
---------------------------------------------------------------------------
TypeError Traceback (most recent call
last)
in ()
----> 1 credits['cast'] = json.loads(credits['cast'])
/anaconda3/lib/python3.6/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant,
object_pairs_hook, **kw)
346 if not isinstance(s, (bytes, bytearray)):
347 raise TypeError('the JSON object must be str, bytes or bytearray, '
--> 348 'not {!r}'.format(s.class.name))
349 s = s.decode(detect_encoding(s), 'surrogatepass')
350
TypeError: the JSON object must be str, bytes or bytearray, not 'Series'
However, the code credits['cast'] = credits['cast'].apply(json.loads)
works. So I am very confused, because I think there isn't difference between this two lines of code.
Can anyone explain that to me?
python pandas
to make it clear, the cell number 7 works
– Qiaoyi Li
Nov 11 at 3:53
when I am trying to load json format data, this onecredits['cast'] = json.loads(credits['cast'])
doesn't work and gives me error"the JSON object must be str, bytes or bytearray, not 'Series'". However, this one works `credits['cast'] = credits['cast'].apply(json.loads)
. I don't understand, is there any difference between this two lines of code?
– Qiaoyi Li
Nov 11 at 4:04
Will not be good to first load your data and then do panadas operation?
– pygo
Nov 11 at 4:32
add a comment |
up vote
-1
down vote
favorite
up vote
-1
down vote
favorite
I am quite new to coding, and now I am trying to work on TMDB_5000 dataset from kaggle.
I ran into a problem when trying to deal with json format data like this.
[{"cast_id": 242, "character": "Jake Sully", "credit_id": "5602a8a7c3a3685532001c9a", "gender": 2, "id": 65731, "name": "Sam Worthington", "order": 0}, {"cast_id": 3, "character": "Neytiri", "credit_i...}]
I am trying to use json.loads()
to deal with data, the code is credits['cast'] = json.loads(credits['cast'])
. But it give me an error like this
---------------------------------------------------------------------------
TypeError Traceback (most recent call
last)
in ()
----> 1 credits['cast'] = json.loads(credits['cast'])
/anaconda3/lib/python3.6/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant,
object_pairs_hook, **kw)
346 if not isinstance(s, (bytes, bytearray)):
347 raise TypeError('the JSON object must be str, bytes or bytearray, '
--> 348 'not {!r}'.format(s.class.name))
349 s = s.decode(detect_encoding(s), 'surrogatepass')
350
TypeError: the JSON object must be str, bytes or bytearray, not 'Series'
However, the code credits['cast'] = credits['cast'].apply(json.loads)
works. So I am very confused, because I think there isn't difference between this two lines of code.
Can anyone explain that to me?
python pandas
I am quite new to coding, and now I am trying to work on TMDB_5000 dataset from kaggle.
I ran into a problem when trying to deal with json format data like this.
[{"cast_id": 242, "character": "Jake Sully", "credit_id": "5602a8a7c3a3685532001c9a", "gender": 2, "id": 65731, "name": "Sam Worthington", "order": 0}, {"cast_id": 3, "character": "Neytiri", "credit_i...}]
I am trying to use json.loads()
to deal with data, the code is credits['cast'] = json.loads(credits['cast'])
. But it give me an error like this
---------------------------------------------------------------------------
TypeError Traceback (most recent call
last)
in ()
----> 1 credits['cast'] = json.loads(credits['cast'])
/anaconda3/lib/python3.6/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant,
object_pairs_hook, **kw)
346 if not isinstance(s, (bytes, bytearray)):
347 raise TypeError('the JSON object must be str, bytes or bytearray, '
--> 348 'not {!r}'.format(s.class.name))
349 s = s.decode(detect_encoding(s), 'surrogatepass')
350
TypeError: the JSON object must be str, bytes or bytearray, not 'Series'
However, the code credits['cast'] = credits['cast'].apply(json.loads)
works. So I am very confused, because I think there isn't difference between this two lines of code.
Can anyone explain that to me?
python pandas
python pandas
edited Nov 11 at 7:21
pygo
1,654416
1,654416
asked Nov 11 at 3:53
Qiaoyi Li
82
82
to make it clear, the cell number 7 works
– Qiaoyi Li
Nov 11 at 3:53
when I am trying to load json format data, this onecredits['cast'] = json.loads(credits['cast'])
doesn't work and gives me error"the JSON object must be str, bytes or bytearray, not 'Series'". However, this one works `credits['cast'] = credits['cast'].apply(json.loads)
. I don't understand, is there any difference between this two lines of code?
– Qiaoyi Li
Nov 11 at 4:04
Will not be good to first load your data and then do panadas operation?
– pygo
Nov 11 at 4:32
add a comment |
to make it clear, the cell number 7 works
– Qiaoyi Li
Nov 11 at 3:53
when I am trying to load json format data, this onecredits['cast'] = json.loads(credits['cast'])
doesn't work and gives me error"the JSON object must be str, bytes or bytearray, not 'Series'". However, this one works `credits['cast'] = credits['cast'].apply(json.loads)
. I don't understand, is there any difference between this two lines of code?
– Qiaoyi Li
Nov 11 at 4:04
Will not be good to first load your data and then do panadas operation?
– pygo
Nov 11 at 4:32
to make it clear, the cell number 7 works
– Qiaoyi Li
Nov 11 at 3:53
to make it clear, the cell number 7 works
– Qiaoyi Li
Nov 11 at 3:53
when I am trying to load json format data, this one
credits['cast'] = json.loads(credits['cast'])
doesn't work and gives me error"the JSON object must be str, bytes or bytearray, not 'Series'". However, this one works ` credits['cast'] = credits['cast'].apply(json.loads)
. I don't understand, is there any difference between this two lines of code?– Qiaoyi Li
Nov 11 at 4:04
when I am trying to load json format data, this one
credits['cast'] = json.loads(credits['cast'])
doesn't work and gives me error"the JSON object must be str, bytes or bytearray, not 'Series'". However, this one works ` credits['cast'] = credits['cast'].apply(json.loads)
. I don't understand, is there any difference between this two lines of code?– Qiaoyi Li
Nov 11 at 4:04
Will not be good to first load your data and then do panadas operation?
– pygo
Nov 11 at 4:32
Will not be good to first load your data and then do panadas operation?
– pygo
Nov 11 at 4:32
add a comment |
3 Answers
3
active
oldest
votes
up vote
0
down vote
accepted
The issue is that your credits
variable is a Pandas DataFrame
and so credits['cast']
is a Series
). The json.loads
function doesn't know how to deal with data types from pandas
, so you get an error when you do json.loads(credits['cast'])
.
The Series
type however has an apply
method that accepts a function to be called on each value it contains. That's why credits['cast'].apply(json.loads)
works, it passes json.loads
as the argument to apply
.
add a comment |
up vote
0
down vote
The following code:
credits['cast'] = credits['cast'].apply(json.loads)
applies function json.loads
to each row of credits['cast']
(each row being a string). The result is a series of decoded objects.
The following code:
credits['cast'] = json.loads(credits['cast'])
attempts to apply the same function to the Series credits['cast']
, but the function cannot be applied to a Series.
Thank you, it is very explicitly explained~😊
– Qiaoyi Li
Nov 11 at 6:33
add a comment |
up vote
0
down vote
However explanation with great details already been provided, but would like to add in case you are using pandas to read and process data then you can use:
import pandas as pd
d_list = [{"cast_id": 242, "character": "Jake Sully", "credit_id": "5602a8a7c3a3685532001c9a", "gender": 2, "id": 65731, "name": "Sam Worthington", "order": 0}, {"cast_id": 3, "character": "Neytiri"}]
Create a DataFrame with using DataFrame.from_dict
df = pd.DataFrame.from_dict(d_list)
print(df)
cast_id character credit_id gender id name order
0 242 Jake Sully 5602a8a7c3a3685532001c9a 2.0 65731.0 Sam Worthington 0.0
1 3 Neytiri NaN NaN NaN NaN NaN
Another way around which suited for this ppurpose is pd.read_json
with orient='records'
.
import pandas as pd
d_list = [{"cast_id": 242, "character": "Jake Sully", "credit_id": "5602a8a7c3a3685532001c9a", "gender": 2, "id": 65731, "name": "Sam Worthington", "order": 0}, {"cast_id": 3, "character": "Neytiri", "credit_i...}]
df = pd.read_json(d_list, orient='records')
print(df
You can accept an answer which is really useful in your case by marking it green beside the answer in left hand side.
– pygo
Nov 11 at 6:41
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
The issue is that your credits
variable is a Pandas DataFrame
and so credits['cast']
is a Series
). The json.loads
function doesn't know how to deal with data types from pandas
, so you get an error when you do json.loads(credits['cast'])
.
The Series
type however has an apply
method that accepts a function to be called on each value it contains. That's why credits['cast'].apply(json.loads)
works, it passes json.loads
as the argument to apply
.
add a comment |
up vote
0
down vote
accepted
The issue is that your credits
variable is a Pandas DataFrame
and so credits['cast']
is a Series
). The json.loads
function doesn't know how to deal with data types from pandas
, so you get an error when you do json.loads(credits['cast'])
.
The Series
type however has an apply
method that accepts a function to be called on each value it contains. That's why credits['cast'].apply(json.loads)
works, it passes json.loads
as the argument to apply
.
add a comment |
up vote
0
down vote
accepted
up vote
0
down vote
accepted
The issue is that your credits
variable is a Pandas DataFrame
and so credits['cast']
is a Series
). The json.loads
function doesn't know how to deal with data types from pandas
, so you get an error when you do json.loads(credits['cast'])
.
The Series
type however has an apply
method that accepts a function to be called on each value it contains. That's why credits['cast'].apply(json.loads)
works, it passes json.loads
as the argument to apply
.
The issue is that your credits
variable is a Pandas DataFrame
and so credits['cast']
is a Series
). The json.loads
function doesn't know how to deal with data types from pandas
, so you get an error when you do json.loads(credits['cast'])
.
The Series
type however has an apply
method that accepts a function to be called on each value it contains. That's why credits['cast'].apply(json.loads)
works, it passes json.loads
as the argument to apply
.
answered Nov 11 at 4:23
Blckknght
61.2k55599
61.2k55599
add a comment |
add a comment |
up vote
0
down vote
The following code:
credits['cast'] = credits['cast'].apply(json.loads)
applies function json.loads
to each row of credits['cast']
(each row being a string). The result is a series of decoded objects.
The following code:
credits['cast'] = json.loads(credits['cast'])
attempts to apply the same function to the Series credits['cast']
, but the function cannot be applied to a Series.
Thank you, it is very explicitly explained~😊
– Qiaoyi Li
Nov 11 at 6:33
add a comment |
up vote
0
down vote
The following code:
credits['cast'] = credits['cast'].apply(json.loads)
applies function json.loads
to each row of credits['cast']
(each row being a string). The result is a series of decoded objects.
The following code:
credits['cast'] = json.loads(credits['cast'])
attempts to apply the same function to the Series credits['cast']
, but the function cannot be applied to a Series.
Thank you, it is very explicitly explained~😊
– Qiaoyi Li
Nov 11 at 6:33
add a comment |
up vote
0
down vote
up vote
0
down vote
The following code:
credits['cast'] = credits['cast'].apply(json.loads)
applies function json.loads
to each row of credits['cast']
(each row being a string). The result is a series of decoded objects.
The following code:
credits['cast'] = json.loads(credits['cast'])
attempts to apply the same function to the Series credits['cast']
, but the function cannot be applied to a Series.
The following code:
credits['cast'] = credits['cast'].apply(json.loads)
applies function json.loads
to each row of credits['cast']
(each row being a string). The result is a series of decoded objects.
The following code:
credits['cast'] = json.loads(credits['cast'])
attempts to apply the same function to the Series credits['cast']
, but the function cannot be applied to a Series.
answered Nov 11 at 4:23
DYZ
24.1k61948
24.1k61948
Thank you, it is very explicitly explained~😊
– Qiaoyi Li
Nov 11 at 6:33
add a comment |
Thank you, it is very explicitly explained~😊
– Qiaoyi Li
Nov 11 at 6:33
Thank you, it is very explicitly explained~😊
– Qiaoyi Li
Nov 11 at 6:33
Thank you, it is very explicitly explained~😊
– Qiaoyi Li
Nov 11 at 6:33
add a comment |
up vote
0
down vote
However explanation with great details already been provided, but would like to add in case you are using pandas to read and process data then you can use:
import pandas as pd
d_list = [{"cast_id": 242, "character": "Jake Sully", "credit_id": "5602a8a7c3a3685532001c9a", "gender": 2, "id": 65731, "name": "Sam Worthington", "order": 0}, {"cast_id": 3, "character": "Neytiri"}]
Create a DataFrame with using DataFrame.from_dict
df = pd.DataFrame.from_dict(d_list)
print(df)
cast_id character credit_id gender id name order
0 242 Jake Sully 5602a8a7c3a3685532001c9a 2.0 65731.0 Sam Worthington 0.0
1 3 Neytiri NaN NaN NaN NaN NaN
Another way around which suited for this ppurpose is pd.read_json
with orient='records'
.
import pandas as pd
d_list = [{"cast_id": 242, "character": "Jake Sully", "credit_id": "5602a8a7c3a3685532001c9a", "gender": 2, "id": 65731, "name": "Sam Worthington", "order": 0}, {"cast_id": 3, "character": "Neytiri", "credit_i...}]
df = pd.read_json(d_list, orient='records')
print(df
You can accept an answer which is really useful in your case by marking it green beside the answer in left hand side.
– pygo
Nov 11 at 6:41
add a comment |
up vote
0
down vote
However explanation with great details already been provided, but would like to add in case you are using pandas to read and process data then you can use:
import pandas as pd
d_list = [{"cast_id": 242, "character": "Jake Sully", "credit_id": "5602a8a7c3a3685532001c9a", "gender": 2, "id": 65731, "name": "Sam Worthington", "order": 0}, {"cast_id": 3, "character": "Neytiri"}]
Create a DataFrame with using DataFrame.from_dict
df = pd.DataFrame.from_dict(d_list)
print(df)
cast_id character credit_id gender id name order
0 242 Jake Sully 5602a8a7c3a3685532001c9a 2.0 65731.0 Sam Worthington 0.0
1 3 Neytiri NaN NaN NaN NaN NaN
Another way around which suited for this ppurpose is pd.read_json
with orient='records'
.
import pandas as pd
d_list = [{"cast_id": 242, "character": "Jake Sully", "credit_id": "5602a8a7c3a3685532001c9a", "gender": 2, "id": 65731, "name": "Sam Worthington", "order": 0}, {"cast_id": 3, "character": "Neytiri", "credit_i...}]
df = pd.read_json(d_list, orient='records')
print(df
You can accept an answer which is really useful in your case by marking it green beside the answer in left hand side.
– pygo
Nov 11 at 6:41
add a comment |
up vote
0
down vote
up vote
0
down vote
However explanation with great details already been provided, but would like to add in case you are using pandas to read and process data then you can use:
import pandas as pd
d_list = [{"cast_id": 242, "character": "Jake Sully", "credit_id": "5602a8a7c3a3685532001c9a", "gender": 2, "id": 65731, "name": "Sam Worthington", "order": 0}, {"cast_id": 3, "character": "Neytiri"}]
Create a DataFrame with using DataFrame.from_dict
df = pd.DataFrame.from_dict(d_list)
print(df)
cast_id character credit_id gender id name order
0 242 Jake Sully 5602a8a7c3a3685532001c9a 2.0 65731.0 Sam Worthington 0.0
1 3 Neytiri NaN NaN NaN NaN NaN
Another way around which suited for this ppurpose is pd.read_json
with orient='records'
.
import pandas as pd
d_list = [{"cast_id": 242, "character": "Jake Sully", "credit_id": "5602a8a7c3a3685532001c9a", "gender": 2, "id": 65731, "name": "Sam Worthington", "order": 0}, {"cast_id": 3, "character": "Neytiri", "credit_i...}]
df = pd.read_json(d_list, orient='records')
print(df
However explanation with great details already been provided, but would like to add in case you are using pandas to read and process data then you can use:
import pandas as pd
d_list = [{"cast_id": 242, "character": "Jake Sully", "credit_id": "5602a8a7c3a3685532001c9a", "gender": 2, "id": 65731, "name": "Sam Worthington", "order": 0}, {"cast_id": 3, "character": "Neytiri"}]
Create a DataFrame with using DataFrame.from_dict
df = pd.DataFrame.from_dict(d_list)
print(df)
cast_id character credit_id gender id name order
0 242 Jake Sully 5602a8a7c3a3685532001c9a 2.0 65731.0 Sam Worthington 0.0
1 3 Neytiri NaN NaN NaN NaN NaN
Another way around which suited for this ppurpose is pd.read_json
with orient='records'
.
import pandas as pd
d_list = [{"cast_id": 242, "character": "Jake Sully", "credit_id": "5602a8a7c3a3685532001c9a", "gender": 2, "id": 65731, "name": "Sam Worthington", "order": 0}, {"cast_id": 3, "character": "Neytiri", "credit_i...}]
df = pd.read_json(d_list, orient='records')
print(df
edited Nov 11 at 4:52
answered Nov 11 at 4:45
pygo
1,654416
1,654416
You can accept an answer which is really useful in your case by marking it green beside the answer in left hand side.
– pygo
Nov 11 at 6:41
add a comment |
You can accept an answer which is really useful in your case by marking it green beside the answer in left hand side.
– pygo
Nov 11 at 6:41
You can accept an answer which is really useful in your case by marking it green beside the answer in left hand side.
– pygo
Nov 11 at 6:41
You can accept an answer which is really useful in your case by marking it green beside the answer in left hand side.
– pygo
Nov 11 at 6:41
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53245697%2fwhat-is-the-difference-between-json-loads-and-applyjson-loads%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
to make it clear, the cell number 7 works
– Qiaoyi Li
Nov 11 at 3:53
when I am trying to load json format data, this one
credits['cast'] = json.loads(credits['cast'])
doesn't work and gives me error"the JSON object must be str, bytes or bytearray, not 'Series'". However, this one works `credits['cast'] = credits['cast'].apply(json.loads)
. I don't understand, is there any difference between this two lines of code?– Qiaoyi Li
Nov 11 at 4:04
Will not be good to first load your data and then do panadas operation?
– pygo
Nov 11 at 4:32