Convert JSON column in dataframe to simple array of values
up vote
0
down vote
favorite
I am trying to convert the JSON in the bbox (bounding box) column into a simple array of values for a DL project in python in a Jupyter notebook.
The possible labels are the following categories: [glass, cardboard, trash, metal, paper].
[{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]
TO
([191 70 183 311], 0)
I'm looking for help to convert the bbox column from the JSON object for a single CSV that contains all the image names and the related bboxes.

UPDATE
The current column is a series so I keep getting a "TypeError: the JSON object must be str, bytes or bytearray, not 'Series'" any time I try to apply JSON operations on the column. So far I have tried to convert the column into JSON object and then pull out the values from the keys.
BB_CSV


python json pandas jupyter-notebook
add a comment |
up vote
0
down vote
favorite
I am trying to convert the JSON in the bbox (bounding box) column into a simple array of values for a DL project in python in a Jupyter notebook.
The possible labels are the following categories: [glass, cardboard, trash, metal, paper].
[{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]
TO
([191 70 183 311], 0)
I'm looking for help to convert the bbox column from the JSON object for a single CSV that contains all the image names and the related bboxes.

UPDATE
The current column is a series so I keep getting a "TypeError: the JSON object must be str, bytes or bytearray, not 'Series'" any time I try to apply JSON operations on the column. So far I have tried to convert the column into JSON object and then pull out the values from the keys.
BB_CSV


python json pandas jupyter-notebook
2
what is 0 for ?
– W-B
Nov 12 at 2:11
@W-B 0 is the number that represents the label. 1 would be cardboard, 2 would be trash so on and so forth
– cleme001
Nov 12 at 2:15
@cleme001, have you tried anything as of now in order to achieve it, that will also provide a clue what really is needed.
– pygo
Nov 12 at 2:20
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I am trying to convert the JSON in the bbox (bounding box) column into a simple array of values for a DL project in python in a Jupyter notebook.
The possible labels are the following categories: [glass, cardboard, trash, metal, paper].
[{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]
TO
([191 70 183 311], 0)
I'm looking for help to convert the bbox column from the JSON object for a single CSV that contains all the image names and the related bboxes.

UPDATE
The current column is a series so I keep getting a "TypeError: the JSON object must be str, bytes or bytearray, not 'Series'" any time I try to apply JSON operations on the column. So far I have tried to convert the column into JSON object and then pull out the values from the keys.
BB_CSV


python json pandas jupyter-notebook
I am trying to convert the JSON in the bbox (bounding box) column into a simple array of values for a DL project in python in a Jupyter notebook.
The possible labels are the following categories: [glass, cardboard, trash, metal, paper].
[{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]
TO
([191 70 183 311], 0)
I'm looking for help to convert the bbox column from the JSON object for a single CSV that contains all the image names and the related bboxes.

UPDATE
The current column is a series so I keep getting a "TypeError: the JSON object must be str, bytes or bytearray, not 'Series'" any time I try to apply JSON operations on the column. So far I have tried to convert the column into JSON object and then pull out the values from the keys.
BB_CSV


python json pandas jupyter-notebook
python json pandas jupyter-notebook
edited Nov 12 at 3:11
asked Nov 12 at 2:07
cleme001
998
998
2
what is 0 for ?
– W-B
Nov 12 at 2:11
@W-B 0 is the number that represents the label. 1 would be cardboard, 2 would be trash so on and so forth
– cleme001
Nov 12 at 2:15
@cleme001, have you tried anything as of now in order to achieve it, that will also provide a clue what really is needed.
– pygo
Nov 12 at 2:20
add a comment |
2
what is 0 for ?
– W-B
Nov 12 at 2:11
@W-B 0 is the number that represents the label. 1 would be cardboard, 2 would be trash so on and so forth
– cleme001
Nov 12 at 2:15
@cleme001, have you tried anything as of now in order to achieve it, that will also provide a clue what really is needed.
– pygo
Nov 12 at 2:20
2
2
what is 0 for ?
– W-B
Nov 12 at 2:11
what is 0 for ?
– W-B
Nov 12 at 2:11
@W-B 0 is the number that represents the label. 1 would be cardboard, 2 would be trash so on and so forth
– cleme001
Nov 12 at 2:15
@W-B 0 is the number that represents the label. 1 would be cardboard, 2 would be trash so on and so forth
– cleme001
Nov 12 at 2:15
@cleme001, have you tried anything as of now in order to achieve it, that will also provide a clue what really is needed.
– pygo
Nov 12 at 2:20
@cleme001, have you tried anything as of now in order to achieve it, that will also provide a clue what really is needed.
– pygo
Nov 12 at 2:20
add a comment |
2 Answers
2
active
oldest
votes
up vote
1
down vote
You'll want to use a JSON decoder: https://docs.python.org/3/library/json.html
import json
li = json.loads('''[{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]''')
d = dictionary = li[0]
result = ([d[key] for key in "left top width height".split()], 0)
print(result)
Edit:
If you want map the operation of extracting the values from the dictionary to all element of the list, you can do:
extracted =
for element in li:
result = ([element[key] for key in "left top width height".split()], 0)
extracted.append(result)
# print(extracted)
print(extracted[:10])
# `[:10]` is there to limit the number of item displayed to 10
Similarly, as per my comment, if you do not want commas between the extracted numbers in the list, you can use:
without_comma =
for element, zero in extracted:
result_string = "([{}], 0)".format(" ".join([str(value) for value in element]))
without_comma.append(result_string)
You can addassert dictionary["label"] in "glass cardboard trash metal paper".split()if you want to check that the label is value is correct, and get an error if it isn't.
– Mathieu CAROFF
Nov 12 at 2:21
If you do not want commas between the values in the bbox, useprint("([{}], 0)".format(" ".join(str(d[key]) for key in "left top width height".split())))
– Mathieu CAROFF
Nov 12 at 2:25
So I managed to get this to work for my use case tweaking your code a bit but since .split only does it for the first instance of the json object the pictures that I have with multiple annotations don't get properly converted. e.g. [{"left":191,"top":70,"width":183,"height":311,"label":"glass"}, {"left":200,"top":60,"width":132,"height":318,"label":"glass"}]
– cleme001
Nov 15 at 18:15
add a comment |
up vote
0
down vote
It looks like each row of your bbox column contains a dictionary inside of a list. I've tried to replicate your problem as follows. Edit: Clarifying that the below solution assumes that what you're referring to as a "JSON object" is represented as a list containing a single dictionary, which is what it appears to be per your example and screenshot.
# Create empty sample DataFrame with one row
df = pd.DataFrame([None],columns=['bbox'])
# Assign your sample item to the first row
df['bbox'][0] = [{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]
Now, to simple unpack the row you can do:
df['bbox_unpacked'] = df['bbox'].map(lambda x: x[0].values())
Which will get you a new column with a tuple of 5 items.
If you want to go further and apply your labels, you'll likely want to create a dictionary to contain your labeling logic. Per the example you're given in the comments, I've done:
labels = {
'cardboard': 1,
'trash': 2,
'glass': 3
}
This should get your your desired layout if you want a one-line solution without writing your own function.
df['bbox_unpacked'] = df['bbox'].map(lambda x: (list(x[0].values())[:4],labels.get(list(x[0].values())[-1])))
A more readable solution would be to define your own function using the .apply() method. Edit: Since it looks like your JSON object is being stored as a str inside your DataFrame rows, I added json.loads(row) to process the string first before retrieving the keys. You'll need to import json to run.
import json
def unpack_bbox(row, labels):
# load the string into a JSON object (in this
# case a list of length one containing the dictionary;
# index the list to its first item [0] and use the .values()
# dictionary method to access the values only
keys = list(json.loads(row)[0].values())
bbox_values = keys[:4]
bbox_label = keys[-1]
label_value = labels.get(bbox_label)
return bbox_values, label_value
df['bbox_unpacked'] = df['bbox'].apply(unpack_bbox,args=(labels,))
For some reason I get multiple rows when I do BB_CSV['bbox'][0] - added an image to the post.
– cleme001
Nov 12 at 3:10
@cleme001 Thedf['bbox'][0]snippet was just to show how I assigned your sample list to a row of a sampleDataFrameto replicate your issue. Have you triedBB_CSV['bbox'].map(lambda x: x[0].values())to unpack your rows?
– dmitriys
Nov 12 at 3:30
The issue seems to be that the row it self is not a string but rather a numpyndarray so I'm getting this error. BB_CSV['bbox'][0].values() --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-29-57c5ca36959f> in <module>() ----> 1 BB_CSV['bbox'][0].values() TypeError: 'numpy.ndarray' object is not callable
– cleme001
Nov 13 at 16:04
Also if you look at the image I attached to the original post you'll see that when I even try to look at a single row I get back 5 different things not just the value of the row.
– cleme001
Nov 13 at 16:07
@cleme001 Does your DataFrame have duplicate values in it's index? Can you trydf.reset_index(drop=True, inplace=False)and try it again?
– dmitriys
Nov 13 at 17:02
|
show 4 more comments
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
You'll want to use a JSON decoder: https://docs.python.org/3/library/json.html
import json
li = json.loads('''[{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]''')
d = dictionary = li[0]
result = ([d[key] for key in "left top width height".split()], 0)
print(result)
Edit:
If you want map the operation of extracting the values from the dictionary to all element of the list, you can do:
extracted =
for element in li:
result = ([element[key] for key in "left top width height".split()], 0)
extracted.append(result)
# print(extracted)
print(extracted[:10])
# `[:10]` is there to limit the number of item displayed to 10
Similarly, as per my comment, if you do not want commas between the extracted numbers in the list, you can use:
without_comma =
for element, zero in extracted:
result_string = "([{}], 0)".format(" ".join([str(value) for value in element]))
without_comma.append(result_string)
You can addassert dictionary["label"] in "glass cardboard trash metal paper".split()if you want to check that the label is value is correct, and get an error if it isn't.
– Mathieu CAROFF
Nov 12 at 2:21
If you do not want commas between the values in the bbox, useprint("([{}], 0)".format(" ".join(str(d[key]) for key in "left top width height".split())))
– Mathieu CAROFF
Nov 12 at 2:25
So I managed to get this to work for my use case tweaking your code a bit but since .split only does it for the first instance of the json object the pictures that I have with multiple annotations don't get properly converted. e.g. [{"left":191,"top":70,"width":183,"height":311,"label":"glass"}, {"left":200,"top":60,"width":132,"height":318,"label":"glass"}]
– cleme001
Nov 15 at 18:15
add a comment |
up vote
1
down vote
You'll want to use a JSON decoder: https://docs.python.org/3/library/json.html
import json
li = json.loads('''[{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]''')
d = dictionary = li[0]
result = ([d[key] for key in "left top width height".split()], 0)
print(result)
Edit:
If you want map the operation of extracting the values from the dictionary to all element of the list, you can do:
extracted =
for element in li:
result = ([element[key] for key in "left top width height".split()], 0)
extracted.append(result)
# print(extracted)
print(extracted[:10])
# `[:10]` is there to limit the number of item displayed to 10
Similarly, as per my comment, if you do not want commas between the extracted numbers in the list, you can use:
without_comma =
for element, zero in extracted:
result_string = "([{}], 0)".format(" ".join([str(value) for value in element]))
without_comma.append(result_string)
You can addassert dictionary["label"] in "glass cardboard trash metal paper".split()if you want to check that the label is value is correct, and get an error if it isn't.
– Mathieu CAROFF
Nov 12 at 2:21
If you do not want commas between the values in the bbox, useprint("([{}], 0)".format(" ".join(str(d[key]) for key in "left top width height".split())))
– Mathieu CAROFF
Nov 12 at 2:25
So I managed to get this to work for my use case tweaking your code a bit but since .split only does it for the first instance of the json object the pictures that I have with multiple annotations don't get properly converted. e.g. [{"left":191,"top":70,"width":183,"height":311,"label":"glass"}, {"left":200,"top":60,"width":132,"height":318,"label":"glass"}]
– cleme001
Nov 15 at 18:15
add a comment |
up vote
1
down vote
up vote
1
down vote
You'll want to use a JSON decoder: https://docs.python.org/3/library/json.html
import json
li = json.loads('''[{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]''')
d = dictionary = li[0]
result = ([d[key] for key in "left top width height".split()], 0)
print(result)
Edit:
If you want map the operation of extracting the values from the dictionary to all element of the list, you can do:
extracted =
for element in li:
result = ([element[key] for key in "left top width height".split()], 0)
extracted.append(result)
# print(extracted)
print(extracted[:10])
# `[:10]` is there to limit the number of item displayed to 10
Similarly, as per my comment, if you do not want commas between the extracted numbers in the list, you can use:
without_comma =
for element, zero in extracted:
result_string = "([{}], 0)".format(" ".join([str(value) for value in element]))
without_comma.append(result_string)
You'll want to use a JSON decoder: https://docs.python.org/3/library/json.html
import json
li = json.loads('''[{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]''')
d = dictionary = li[0]
result = ([d[key] for key in "left top width height".split()], 0)
print(result)
Edit:
If you want map the operation of extracting the values from the dictionary to all element of the list, you can do:
extracted =
for element in li:
result = ([element[key] for key in "left top width height".split()], 0)
extracted.append(result)
# print(extracted)
print(extracted[:10])
# `[:10]` is there to limit the number of item displayed to 10
Similarly, as per my comment, if you do not want commas between the extracted numbers in the list, you can use:
without_comma =
for element, zero in extracted:
result_string = "([{}], 0)".format(" ".join([str(value) for value in element]))
without_comma.append(result_string)
edited Nov 16 at 20:02
answered Nov 12 at 2:20
Mathieu CAROFF
3467
3467
You can addassert dictionary["label"] in "glass cardboard trash metal paper".split()if you want to check that the label is value is correct, and get an error if it isn't.
– Mathieu CAROFF
Nov 12 at 2:21
If you do not want commas between the values in the bbox, useprint("([{}], 0)".format(" ".join(str(d[key]) for key in "left top width height".split())))
– Mathieu CAROFF
Nov 12 at 2:25
So I managed to get this to work for my use case tweaking your code a bit but since .split only does it for the first instance of the json object the pictures that I have with multiple annotations don't get properly converted. e.g. [{"left":191,"top":70,"width":183,"height":311,"label":"glass"}, {"left":200,"top":60,"width":132,"height":318,"label":"glass"}]
– cleme001
Nov 15 at 18:15
add a comment |
You can addassert dictionary["label"] in "glass cardboard trash metal paper".split()if you want to check that the label is value is correct, and get an error if it isn't.
– Mathieu CAROFF
Nov 12 at 2:21
If you do not want commas between the values in the bbox, useprint("([{}], 0)".format(" ".join(str(d[key]) for key in "left top width height".split())))
– Mathieu CAROFF
Nov 12 at 2:25
So I managed to get this to work for my use case tweaking your code a bit but since .split only does it for the first instance of the json object the pictures that I have with multiple annotations don't get properly converted. e.g. [{"left":191,"top":70,"width":183,"height":311,"label":"glass"}, {"left":200,"top":60,"width":132,"height":318,"label":"glass"}]
– cleme001
Nov 15 at 18:15
You can add
assert dictionary["label"] in "glass cardboard trash metal paper".split() if you want to check that the label is value is correct, and get an error if it isn't.– Mathieu CAROFF
Nov 12 at 2:21
You can add
assert dictionary["label"] in "glass cardboard trash metal paper".split() if you want to check that the label is value is correct, and get an error if it isn't.– Mathieu CAROFF
Nov 12 at 2:21
If you do not want commas between the values in the bbox, use
print("([{}], 0)".format(" ".join(str(d[key]) for key in "left top width height".split())))– Mathieu CAROFF
Nov 12 at 2:25
If you do not want commas between the values in the bbox, use
print("([{}], 0)".format(" ".join(str(d[key]) for key in "left top width height".split())))– Mathieu CAROFF
Nov 12 at 2:25
So I managed to get this to work for my use case tweaking your code a bit but since .split only does it for the first instance of the json object the pictures that I have with multiple annotations don't get properly converted. e.g. [{"left":191,"top":70,"width":183,"height":311,"label":"glass"}, {"left":200,"top":60,"width":132,"height":318,"label":"glass"}]
– cleme001
Nov 15 at 18:15
So I managed to get this to work for my use case tweaking your code a bit but since .split only does it for the first instance of the json object the pictures that I have with multiple annotations don't get properly converted. e.g. [{"left":191,"top":70,"width":183,"height":311,"label":"glass"}, {"left":200,"top":60,"width":132,"height":318,"label":"glass"}]
– cleme001
Nov 15 at 18:15
add a comment |
up vote
0
down vote
It looks like each row of your bbox column contains a dictionary inside of a list. I've tried to replicate your problem as follows. Edit: Clarifying that the below solution assumes that what you're referring to as a "JSON object" is represented as a list containing a single dictionary, which is what it appears to be per your example and screenshot.
# Create empty sample DataFrame with one row
df = pd.DataFrame([None],columns=['bbox'])
# Assign your sample item to the first row
df['bbox'][0] = [{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]
Now, to simple unpack the row you can do:
df['bbox_unpacked'] = df['bbox'].map(lambda x: x[0].values())
Which will get you a new column with a tuple of 5 items.
If you want to go further and apply your labels, you'll likely want to create a dictionary to contain your labeling logic. Per the example you're given in the comments, I've done:
labels = {
'cardboard': 1,
'trash': 2,
'glass': 3
}
This should get your your desired layout if you want a one-line solution without writing your own function.
df['bbox_unpacked'] = df['bbox'].map(lambda x: (list(x[0].values())[:4],labels.get(list(x[0].values())[-1])))
A more readable solution would be to define your own function using the .apply() method. Edit: Since it looks like your JSON object is being stored as a str inside your DataFrame rows, I added json.loads(row) to process the string first before retrieving the keys. You'll need to import json to run.
import json
def unpack_bbox(row, labels):
# load the string into a JSON object (in this
# case a list of length one containing the dictionary;
# index the list to its first item [0] and use the .values()
# dictionary method to access the values only
keys = list(json.loads(row)[0].values())
bbox_values = keys[:4]
bbox_label = keys[-1]
label_value = labels.get(bbox_label)
return bbox_values, label_value
df['bbox_unpacked'] = df['bbox'].apply(unpack_bbox,args=(labels,))
For some reason I get multiple rows when I do BB_CSV['bbox'][0] - added an image to the post.
– cleme001
Nov 12 at 3:10
@cleme001 Thedf['bbox'][0]snippet was just to show how I assigned your sample list to a row of a sampleDataFrameto replicate your issue. Have you triedBB_CSV['bbox'].map(lambda x: x[0].values())to unpack your rows?
– dmitriys
Nov 12 at 3:30
The issue seems to be that the row it self is not a string but rather a numpyndarray so I'm getting this error. BB_CSV['bbox'][0].values() --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-29-57c5ca36959f> in <module>() ----> 1 BB_CSV['bbox'][0].values() TypeError: 'numpy.ndarray' object is not callable
– cleme001
Nov 13 at 16:04
Also if you look at the image I attached to the original post you'll see that when I even try to look at a single row I get back 5 different things not just the value of the row.
– cleme001
Nov 13 at 16:07
@cleme001 Does your DataFrame have duplicate values in it's index? Can you trydf.reset_index(drop=True, inplace=False)and try it again?
– dmitriys
Nov 13 at 17:02
|
show 4 more comments
up vote
0
down vote
It looks like each row of your bbox column contains a dictionary inside of a list. I've tried to replicate your problem as follows. Edit: Clarifying that the below solution assumes that what you're referring to as a "JSON object" is represented as a list containing a single dictionary, which is what it appears to be per your example and screenshot.
# Create empty sample DataFrame with one row
df = pd.DataFrame([None],columns=['bbox'])
# Assign your sample item to the first row
df['bbox'][0] = [{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]
Now, to simple unpack the row you can do:
df['bbox_unpacked'] = df['bbox'].map(lambda x: x[0].values())
Which will get you a new column with a tuple of 5 items.
If you want to go further and apply your labels, you'll likely want to create a dictionary to contain your labeling logic. Per the example you're given in the comments, I've done:
labels = {
'cardboard': 1,
'trash': 2,
'glass': 3
}
This should get your your desired layout if you want a one-line solution without writing your own function.
df['bbox_unpacked'] = df['bbox'].map(lambda x: (list(x[0].values())[:4],labels.get(list(x[0].values())[-1])))
A more readable solution would be to define your own function using the .apply() method. Edit: Since it looks like your JSON object is being stored as a str inside your DataFrame rows, I added json.loads(row) to process the string first before retrieving the keys. You'll need to import json to run.
import json
def unpack_bbox(row, labels):
# load the string into a JSON object (in this
# case a list of length one containing the dictionary;
# index the list to its first item [0] and use the .values()
# dictionary method to access the values only
keys = list(json.loads(row)[0].values())
bbox_values = keys[:4]
bbox_label = keys[-1]
label_value = labels.get(bbox_label)
return bbox_values, label_value
df['bbox_unpacked'] = df['bbox'].apply(unpack_bbox,args=(labels,))
For some reason I get multiple rows when I do BB_CSV['bbox'][0] - added an image to the post.
– cleme001
Nov 12 at 3:10
@cleme001 Thedf['bbox'][0]snippet was just to show how I assigned your sample list to a row of a sampleDataFrameto replicate your issue. Have you triedBB_CSV['bbox'].map(lambda x: x[0].values())to unpack your rows?
– dmitriys
Nov 12 at 3:30
The issue seems to be that the row it self is not a string but rather a numpyndarray so I'm getting this error. BB_CSV['bbox'][0].values() --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-29-57c5ca36959f> in <module>() ----> 1 BB_CSV['bbox'][0].values() TypeError: 'numpy.ndarray' object is not callable
– cleme001
Nov 13 at 16:04
Also if you look at the image I attached to the original post you'll see that when I even try to look at a single row I get back 5 different things not just the value of the row.
– cleme001
Nov 13 at 16:07
@cleme001 Does your DataFrame have duplicate values in it's index? Can you trydf.reset_index(drop=True, inplace=False)and try it again?
– dmitriys
Nov 13 at 17:02
|
show 4 more comments
up vote
0
down vote
up vote
0
down vote
It looks like each row of your bbox column contains a dictionary inside of a list. I've tried to replicate your problem as follows. Edit: Clarifying that the below solution assumes that what you're referring to as a "JSON object" is represented as a list containing a single dictionary, which is what it appears to be per your example and screenshot.
# Create empty sample DataFrame with one row
df = pd.DataFrame([None],columns=['bbox'])
# Assign your sample item to the first row
df['bbox'][0] = [{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]
Now, to simple unpack the row you can do:
df['bbox_unpacked'] = df['bbox'].map(lambda x: x[0].values())
Which will get you a new column with a tuple of 5 items.
If you want to go further and apply your labels, you'll likely want to create a dictionary to contain your labeling logic. Per the example you're given in the comments, I've done:
labels = {
'cardboard': 1,
'trash': 2,
'glass': 3
}
This should get your your desired layout if you want a one-line solution without writing your own function.
df['bbox_unpacked'] = df['bbox'].map(lambda x: (list(x[0].values())[:4],labels.get(list(x[0].values())[-1])))
A more readable solution would be to define your own function using the .apply() method. Edit: Since it looks like your JSON object is being stored as a str inside your DataFrame rows, I added json.loads(row) to process the string first before retrieving the keys. You'll need to import json to run.
import json
def unpack_bbox(row, labels):
# load the string into a JSON object (in this
# case a list of length one containing the dictionary;
# index the list to its first item [0] and use the .values()
# dictionary method to access the values only
keys = list(json.loads(row)[0].values())
bbox_values = keys[:4]
bbox_label = keys[-1]
label_value = labels.get(bbox_label)
return bbox_values, label_value
df['bbox_unpacked'] = df['bbox'].apply(unpack_bbox,args=(labels,))
It looks like each row of your bbox column contains a dictionary inside of a list. I've tried to replicate your problem as follows. Edit: Clarifying that the below solution assumes that what you're referring to as a "JSON object" is represented as a list containing a single dictionary, which is what it appears to be per your example and screenshot.
# Create empty sample DataFrame with one row
df = pd.DataFrame([None],columns=['bbox'])
# Assign your sample item to the first row
df['bbox'][0] = [{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]
Now, to simple unpack the row you can do:
df['bbox_unpacked'] = df['bbox'].map(lambda x: x[0].values())
Which will get you a new column with a tuple of 5 items.
If you want to go further and apply your labels, you'll likely want to create a dictionary to contain your labeling logic. Per the example you're given in the comments, I've done:
labels = {
'cardboard': 1,
'trash': 2,
'glass': 3
}
This should get your your desired layout if you want a one-line solution without writing your own function.
df['bbox_unpacked'] = df['bbox'].map(lambda x: (list(x[0].values())[:4],labels.get(list(x[0].values())[-1])))
A more readable solution would be to define your own function using the .apply() method. Edit: Since it looks like your JSON object is being stored as a str inside your DataFrame rows, I added json.loads(row) to process the string first before retrieving the keys. You'll need to import json to run.
import json
def unpack_bbox(row, labels):
# load the string into a JSON object (in this
# case a list of length one containing the dictionary;
# index the list to its first item [0] and use the .values()
# dictionary method to access the values only
keys = list(json.loads(row)[0].values())
bbox_values = keys[:4]
bbox_label = keys[-1]
label_value = labels.get(bbox_label)
return bbox_values, label_value
df['bbox_unpacked'] = df['bbox'].apply(unpack_bbox,args=(labels,))
edited Nov 13 at 18:01
answered Nov 12 at 2:26
dmitriys
14619
14619
For some reason I get multiple rows when I do BB_CSV['bbox'][0] - added an image to the post.
– cleme001
Nov 12 at 3:10
@cleme001 Thedf['bbox'][0]snippet was just to show how I assigned your sample list to a row of a sampleDataFrameto replicate your issue. Have you triedBB_CSV['bbox'].map(lambda x: x[0].values())to unpack your rows?
– dmitriys
Nov 12 at 3:30
The issue seems to be that the row it self is not a string but rather a numpyndarray so I'm getting this error. BB_CSV['bbox'][0].values() --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-29-57c5ca36959f> in <module>() ----> 1 BB_CSV['bbox'][0].values() TypeError: 'numpy.ndarray' object is not callable
– cleme001
Nov 13 at 16:04
Also if you look at the image I attached to the original post you'll see that when I even try to look at a single row I get back 5 different things not just the value of the row.
– cleme001
Nov 13 at 16:07
@cleme001 Does your DataFrame have duplicate values in it's index? Can you trydf.reset_index(drop=True, inplace=False)and try it again?
– dmitriys
Nov 13 at 17:02
|
show 4 more comments
For some reason I get multiple rows when I do BB_CSV['bbox'][0] - added an image to the post.
– cleme001
Nov 12 at 3:10
@cleme001 Thedf['bbox'][0]snippet was just to show how I assigned your sample list to a row of a sampleDataFrameto replicate your issue. Have you triedBB_CSV['bbox'].map(lambda x: x[0].values())to unpack your rows?
– dmitriys
Nov 12 at 3:30
The issue seems to be that the row it self is not a string but rather a numpyndarray so I'm getting this error. BB_CSV['bbox'][0].values() --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-29-57c5ca36959f> in <module>() ----> 1 BB_CSV['bbox'][0].values() TypeError: 'numpy.ndarray' object is not callable
– cleme001
Nov 13 at 16:04
Also if you look at the image I attached to the original post you'll see that when I even try to look at a single row I get back 5 different things not just the value of the row.
– cleme001
Nov 13 at 16:07
@cleme001 Does your DataFrame have duplicate values in it's index? Can you trydf.reset_index(drop=True, inplace=False)and try it again?
– dmitriys
Nov 13 at 17:02
For some reason I get multiple rows when I do BB_CSV['bbox'][0] - added an image to the post.
– cleme001
Nov 12 at 3:10
For some reason I get multiple rows when I do BB_CSV['bbox'][0] - added an image to the post.
– cleme001
Nov 12 at 3:10
@cleme001 The
df['bbox'][0] snippet was just to show how I assigned your sample list to a row of a sample DataFrame to replicate your issue. Have you tried BB_CSV['bbox'].map(lambda x: x[0].values()) to unpack your rows?– dmitriys
Nov 12 at 3:30
@cleme001 The
df['bbox'][0] snippet was just to show how I assigned your sample list to a row of a sample DataFrame to replicate your issue. Have you tried BB_CSV['bbox'].map(lambda x: x[0].values()) to unpack your rows?– dmitriys
Nov 12 at 3:30
The issue seems to be that the row it self is not a string but rather a numpyndarray so I'm getting this error. BB_CSV['bbox'][0].values() --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-29-57c5ca36959f> in <module>() ----> 1 BB_CSV['bbox'][0].values() TypeError: 'numpy.ndarray' object is not callable
– cleme001
Nov 13 at 16:04
The issue seems to be that the row it self is not a string but rather a numpyndarray so I'm getting this error. BB_CSV['bbox'][0].values() --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-29-57c5ca36959f> in <module>() ----> 1 BB_CSV['bbox'][0].values() TypeError: 'numpy.ndarray' object is not callable
– cleme001
Nov 13 at 16:04
Also if you look at the image I attached to the original post you'll see that when I even try to look at a single row I get back 5 different things not just the value of the row.
– cleme001
Nov 13 at 16:07
Also if you look at the image I attached to the original post you'll see that when I even try to look at a single row I get back 5 different things not just the value of the row.
– cleme001
Nov 13 at 16:07
@cleme001 Does your DataFrame have duplicate values in it's index? Can you try
df.reset_index(drop=True, inplace=False) and try it again?– dmitriys
Nov 13 at 17:02
@cleme001 Does your DataFrame have duplicate values in it's index? Can you try
df.reset_index(drop=True, inplace=False) and try it again?– dmitriys
Nov 13 at 17:02
|
show 4 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53255172%2fconvert-json-column-in-dataframe-to-simple-array-of-values%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
what is 0 for ?
– W-B
Nov 12 at 2:11
@W-B 0 is the number that represents the label. 1 would be cardboard, 2 would be trash so on and so forth
– cleme001
Nov 12 at 2:15
@cleme001, have you tried anything as of now in order to achieve it, that will also provide a clue what really is needed.
– pygo
Nov 12 at 2:20