Convert JSON column in dataframe to simple array of values











up vote
0
down vote

favorite












I am trying to convert the JSON in the bbox (bounding box) column into a simple array of values for a DL project in python in a Jupyter notebook.



The possible labels are the following categories: [glass, cardboard, trash, metal, paper].



[{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]

TO

([191 70 183 311], 0)


I'm looking for help to convert the bbox column from the JSON object for a single CSV that contains all the image names and the related bboxes.



FN and BBOX



UPDATE



The current column is a series so I keep getting a "TypeError: the JSON object must be str, bytes or bytearray, not 'Series'" any time I try to apply JSON operations on the column. So far I have tried to convert the column into JSON object and then pull out the values from the keys.



BB_CSV
enter image description here



enter image description here










share|improve this question




















  • 2




    what is 0 for ?
    – W-B
    Nov 12 at 2:11










  • @W-B 0 is the number that represents the label. 1 would be cardboard, 2 would be trash so on and so forth
    – cleme001
    Nov 12 at 2:15












  • @cleme001, have you tried anything as of now in order to achieve it, that will also provide a clue what really is needed.
    – pygo
    Nov 12 at 2:20















up vote
0
down vote

favorite












I am trying to convert the JSON in the bbox (bounding box) column into a simple array of values for a DL project in python in a Jupyter notebook.



The possible labels are the following categories: [glass, cardboard, trash, metal, paper].



[{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]

TO

([191 70 183 311], 0)


I'm looking for help to convert the bbox column from the JSON object for a single CSV that contains all the image names and the related bboxes.



FN and BBOX



UPDATE



The current column is a series so I keep getting a "TypeError: the JSON object must be str, bytes or bytearray, not 'Series'" any time I try to apply JSON operations on the column. So far I have tried to convert the column into JSON object and then pull out the values from the keys.



BB_CSV
enter image description here



enter image description here










share|improve this question




















  • 2




    what is 0 for ?
    – W-B
    Nov 12 at 2:11










  • @W-B 0 is the number that represents the label. 1 would be cardboard, 2 would be trash so on and so forth
    – cleme001
    Nov 12 at 2:15












  • @cleme001, have you tried anything as of now in order to achieve it, that will also provide a clue what really is needed.
    – pygo
    Nov 12 at 2:20













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I am trying to convert the JSON in the bbox (bounding box) column into a simple array of values for a DL project in python in a Jupyter notebook.



The possible labels are the following categories: [glass, cardboard, trash, metal, paper].



[{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]

TO

([191 70 183 311], 0)


I'm looking for help to convert the bbox column from the JSON object for a single CSV that contains all the image names and the related bboxes.



FN and BBOX



UPDATE



The current column is a series so I keep getting a "TypeError: the JSON object must be str, bytes or bytearray, not 'Series'" any time I try to apply JSON operations on the column. So far I have tried to convert the column into JSON object and then pull out the values from the keys.



BB_CSV
enter image description here



enter image description here










share|improve this question















I am trying to convert the JSON in the bbox (bounding box) column into a simple array of values for a DL project in python in a Jupyter notebook.



The possible labels are the following categories: [glass, cardboard, trash, metal, paper].



[{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]

TO

([191 70 183 311], 0)


I'm looking for help to convert the bbox column from the JSON object for a single CSV that contains all the image names and the related bboxes.



FN and BBOX



UPDATE



The current column is a series so I keep getting a "TypeError: the JSON object must be str, bytes or bytearray, not 'Series'" any time I try to apply JSON operations on the column. So far I have tried to convert the column into JSON object and then pull out the values from the keys.



BB_CSV
enter image description here



enter image description here







python json pandas jupyter-notebook






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 12 at 3:11

























asked Nov 12 at 2:07









cleme001

998




998








  • 2




    what is 0 for ?
    – W-B
    Nov 12 at 2:11










  • @W-B 0 is the number that represents the label. 1 would be cardboard, 2 would be trash so on and so forth
    – cleme001
    Nov 12 at 2:15












  • @cleme001, have you tried anything as of now in order to achieve it, that will also provide a clue what really is needed.
    – pygo
    Nov 12 at 2:20














  • 2




    what is 0 for ?
    – W-B
    Nov 12 at 2:11










  • @W-B 0 is the number that represents the label. 1 would be cardboard, 2 would be trash so on and so forth
    – cleme001
    Nov 12 at 2:15












  • @cleme001, have you tried anything as of now in order to achieve it, that will also provide a clue what really is needed.
    – pygo
    Nov 12 at 2:20








2




2




what is 0 for ?
– W-B
Nov 12 at 2:11




what is 0 for ?
– W-B
Nov 12 at 2:11












@W-B 0 is the number that represents the label. 1 would be cardboard, 2 would be trash so on and so forth
– cleme001
Nov 12 at 2:15






@W-B 0 is the number that represents the label. 1 would be cardboard, 2 would be trash so on and so forth
– cleme001
Nov 12 at 2:15














@cleme001, have you tried anything as of now in order to achieve it, that will also provide a clue what really is needed.
– pygo
Nov 12 at 2:20




@cleme001, have you tried anything as of now in order to achieve it, that will also provide a clue what really is needed.
– pygo
Nov 12 at 2:20












2 Answers
2






active

oldest

votes

















up vote
1
down vote













You'll want to use a JSON decoder: https://docs.python.org/3/library/json.html





import json
li = json.loads('''[{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]''')
d = dictionary = li[0]
result = ([d[key] for key in "left top width height".split()], 0)
print(result)




Edit:



If you want map the operation of extracting the values from the dictionary to all element of the list, you can do:



extracted = 
for element in li:
result = ([element[key] for key in "left top width height".split()], 0)
extracted.append(result)

# print(extracted)
print(extracted[:10])
# `[:10]` is there to limit the number of item displayed to 10


Similarly, as per my comment, if you do not want commas between the extracted numbers in the list, you can use:



without_comma = 
for element, zero in extracted:
result_string = "([{}], 0)".format(" ".join([str(value) for value in element]))
without_comma.append(result_string)





share|improve this answer























  • You can add assert dictionary["label"] in "glass cardboard trash metal paper".split() if you want to check that the label is value is correct, and get an error if it isn't.
    – Mathieu CAROFF
    Nov 12 at 2:21










  • If you do not want commas between the values in the bbox, use print("([{}], 0)".format(" ".join(str(d[key]) for key in "left top width height".split())))
    – Mathieu CAROFF
    Nov 12 at 2:25










  • So I managed to get this to work for my use case tweaking your code a bit but since .split only does it for the first instance of the json object the pictures that I have with multiple annotations don't get properly converted. e.g. [{"left":191,"top":70,"width":183,"height":311,"label":"glass"}, {"left":200,"top":60,"width":132,"height":318,"label":"glass"}]
    – cleme001
    Nov 15 at 18:15


















up vote
0
down vote













It looks like each row of your bbox column contains a dictionary inside of a list. I've tried to replicate your problem as follows. Edit: Clarifying that the below solution assumes that what you're referring to as a "JSON object" is represented as a list containing a single dictionary, which is what it appears to be per your example and screenshot.



# Create empty sample DataFrame with one row
df = pd.DataFrame([None],columns=['bbox'])

# Assign your sample item to the first row
df['bbox'][0] = [{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]


Now, to simple unpack the row you can do:



df['bbox_unpacked'] = df['bbox'].map(lambda x: x[0].values())


Which will get you a new column with a tuple of 5 items.



If you want to go further and apply your labels, you'll likely want to create a dictionary to contain your labeling logic. Per the example you're given in the comments, I've done:



labels = {
'cardboard': 1,
'trash': 2,
'glass': 3
}


This should get your your desired layout if you want a one-line solution without writing your own function.



df['bbox_unpacked'] = df['bbox'].map(lambda x: (list(x[0].values())[:4],labels.get(list(x[0].values())[-1])))


A more readable solution would be to define your own function using the .apply() method. Edit: Since it looks like your JSON object is being stored as a str inside your DataFrame rows, I added json.loads(row) to process the string first before retrieving the keys. You'll need to import json to run.



import json    

def unpack_bbox(row, labels):

# load the string into a JSON object (in this
# case a list of length one containing the dictionary;
# index the list to its first item [0] and use the .values()
# dictionary method to access the values only

keys = list(json.loads(row)[0].values())

bbox_values = keys[:4]
bbox_label = keys[-1]

label_value = labels.get(bbox_label)

return bbox_values, label_value

df['bbox_unpacked'] = df['bbox'].apply(unpack_bbox,args=(labels,))





share|improve this answer























  • For some reason I get multiple rows when I do BB_CSV['bbox'][0] - added an image to the post.
    – cleme001
    Nov 12 at 3:10










  • @cleme001 The df['bbox'][0] snippet was just to show how I assigned your sample list to a row of a sample DataFrame to replicate your issue. Have you tried BB_CSV['bbox'].map(lambda x: x[0].values()) to unpack your rows?
    – dmitriys
    Nov 12 at 3:30










  • The issue seems to be that the row it self is not a string but rather a numpyndarray so I'm getting this error. BB_CSV['bbox'][0].values() --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-29-57c5ca36959f> in <module>() ----> 1 BB_CSV['bbox'][0].values() TypeError: 'numpy.ndarray' object is not callable
    – cleme001
    Nov 13 at 16:04










  • Also if you look at the image I attached to the original post you'll see that when I even try to look at a single row I get back 5 different things not just the value of the row.
    – cleme001
    Nov 13 at 16:07










  • @cleme001 Does your DataFrame have duplicate values in it's index? Can you try df.reset_index(drop=True, inplace=False) and try it again?
    – dmitriys
    Nov 13 at 17:02











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53255172%2fconvert-json-column-in-dataframe-to-simple-array-of-values%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote













You'll want to use a JSON decoder: https://docs.python.org/3/library/json.html





import json
li = json.loads('''[{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]''')
d = dictionary = li[0]
result = ([d[key] for key in "left top width height".split()], 0)
print(result)




Edit:



If you want map the operation of extracting the values from the dictionary to all element of the list, you can do:



extracted = 
for element in li:
result = ([element[key] for key in "left top width height".split()], 0)
extracted.append(result)

# print(extracted)
print(extracted[:10])
# `[:10]` is there to limit the number of item displayed to 10


Similarly, as per my comment, if you do not want commas between the extracted numbers in the list, you can use:



without_comma = 
for element, zero in extracted:
result_string = "([{}], 0)".format(" ".join([str(value) for value in element]))
without_comma.append(result_string)





share|improve this answer























  • You can add assert dictionary["label"] in "glass cardboard trash metal paper".split() if you want to check that the label is value is correct, and get an error if it isn't.
    – Mathieu CAROFF
    Nov 12 at 2:21










  • If you do not want commas between the values in the bbox, use print("([{}], 0)".format(" ".join(str(d[key]) for key in "left top width height".split())))
    – Mathieu CAROFF
    Nov 12 at 2:25










  • So I managed to get this to work for my use case tweaking your code a bit but since .split only does it for the first instance of the json object the pictures that I have with multiple annotations don't get properly converted. e.g. [{"left":191,"top":70,"width":183,"height":311,"label":"glass"}, {"left":200,"top":60,"width":132,"height":318,"label":"glass"}]
    – cleme001
    Nov 15 at 18:15















up vote
1
down vote













You'll want to use a JSON decoder: https://docs.python.org/3/library/json.html





import json
li = json.loads('''[{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]''')
d = dictionary = li[0]
result = ([d[key] for key in "left top width height".split()], 0)
print(result)




Edit:



If you want map the operation of extracting the values from the dictionary to all element of the list, you can do:



extracted = 
for element in li:
result = ([element[key] for key in "left top width height".split()], 0)
extracted.append(result)

# print(extracted)
print(extracted[:10])
# `[:10]` is there to limit the number of item displayed to 10


Similarly, as per my comment, if you do not want commas between the extracted numbers in the list, you can use:



without_comma = 
for element, zero in extracted:
result_string = "([{}], 0)".format(" ".join([str(value) for value in element]))
without_comma.append(result_string)





share|improve this answer























  • You can add assert dictionary["label"] in "glass cardboard trash metal paper".split() if you want to check that the label is value is correct, and get an error if it isn't.
    – Mathieu CAROFF
    Nov 12 at 2:21










  • If you do not want commas between the values in the bbox, use print("([{}], 0)".format(" ".join(str(d[key]) for key in "left top width height".split())))
    – Mathieu CAROFF
    Nov 12 at 2:25










  • So I managed to get this to work for my use case tweaking your code a bit but since .split only does it for the first instance of the json object the pictures that I have with multiple annotations don't get properly converted. e.g. [{"left":191,"top":70,"width":183,"height":311,"label":"glass"}, {"left":200,"top":60,"width":132,"height":318,"label":"glass"}]
    – cleme001
    Nov 15 at 18:15













up vote
1
down vote










up vote
1
down vote









You'll want to use a JSON decoder: https://docs.python.org/3/library/json.html





import json
li = json.loads('''[{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]''')
d = dictionary = li[0]
result = ([d[key] for key in "left top width height".split()], 0)
print(result)




Edit:



If you want map the operation of extracting the values from the dictionary to all element of the list, you can do:



extracted = 
for element in li:
result = ([element[key] for key in "left top width height".split()], 0)
extracted.append(result)

# print(extracted)
print(extracted[:10])
# `[:10]` is there to limit the number of item displayed to 10


Similarly, as per my comment, if you do not want commas between the extracted numbers in the list, you can use:



without_comma = 
for element, zero in extracted:
result_string = "([{}], 0)".format(" ".join([str(value) for value in element]))
without_comma.append(result_string)





share|improve this answer














You'll want to use a JSON decoder: https://docs.python.org/3/library/json.html





import json
li = json.loads('''[{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]''')
d = dictionary = li[0]
result = ([d[key] for key in "left top width height".split()], 0)
print(result)




Edit:



If you want map the operation of extracting the values from the dictionary to all element of the list, you can do:



extracted = 
for element in li:
result = ([element[key] for key in "left top width height".split()], 0)
extracted.append(result)

# print(extracted)
print(extracted[:10])
# `[:10]` is there to limit the number of item displayed to 10


Similarly, as per my comment, if you do not want commas between the extracted numbers in the list, you can use:



without_comma = 
for element, zero in extracted:
result_string = "([{}], 0)".format(" ".join([str(value) for value in element]))
without_comma.append(result_string)






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 16 at 20:02

























answered Nov 12 at 2:20









Mathieu CAROFF

3467




3467












  • You can add assert dictionary["label"] in "glass cardboard trash metal paper".split() if you want to check that the label is value is correct, and get an error if it isn't.
    – Mathieu CAROFF
    Nov 12 at 2:21










  • If you do not want commas between the values in the bbox, use print("([{}], 0)".format(" ".join(str(d[key]) for key in "left top width height".split())))
    – Mathieu CAROFF
    Nov 12 at 2:25










  • So I managed to get this to work for my use case tweaking your code a bit but since .split only does it for the first instance of the json object the pictures that I have with multiple annotations don't get properly converted. e.g. [{"left":191,"top":70,"width":183,"height":311,"label":"glass"}, {"left":200,"top":60,"width":132,"height":318,"label":"glass"}]
    – cleme001
    Nov 15 at 18:15


















  • You can add assert dictionary["label"] in "glass cardboard trash metal paper".split() if you want to check that the label is value is correct, and get an error if it isn't.
    – Mathieu CAROFF
    Nov 12 at 2:21










  • If you do not want commas between the values in the bbox, use print("([{}], 0)".format(" ".join(str(d[key]) for key in "left top width height".split())))
    – Mathieu CAROFF
    Nov 12 at 2:25










  • So I managed to get this to work for my use case tweaking your code a bit but since .split only does it for the first instance of the json object the pictures that I have with multiple annotations don't get properly converted. e.g. [{"left":191,"top":70,"width":183,"height":311,"label":"glass"}, {"left":200,"top":60,"width":132,"height":318,"label":"glass"}]
    – cleme001
    Nov 15 at 18:15
















You can add assert dictionary["label"] in "glass cardboard trash metal paper".split() if you want to check that the label is value is correct, and get an error if it isn't.
– Mathieu CAROFF
Nov 12 at 2:21




You can add assert dictionary["label"] in "glass cardboard trash metal paper".split() if you want to check that the label is value is correct, and get an error if it isn't.
– Mathieu CAROFF
Nov 12 at 2:21












If you do not want commas between the values in the bbox, use print("([{}], 0)".format(" ".join(str(d[key]) for key in "left top width height".split())))
– Mathieu CAROFF
Nov 12 at 2:25




If you do not want commas between the values in the bbox, use print("([{}], 0)".format(" ".join(str(d[key]) for key in "left top width height".split())))
– Mathieu CAROFF
Nov 12 at 2:25












So I managed to get this to work for my use case tweaking your code a bit but since .split only does it for the first instance of the json object the pictures that I have with multiple annotations don't get properly converted. e.g. [{"left":191,"top":70,"width":183,"height":311,"label":"glass"}, {"left":200,"top":60,"width":132,"height":318,"label":"glass"}]
– cleme001
Nov 15 at 18:15




So I managed to get this to work for my use case tweaking your code a bit but since .split only does it for the first instance of the json object the pictures that I have with multiple annotations don't get properly converted. e.g. [{"left":191,"top":70,"width":183,"height":311,"label":"glass"}, {"left":200,"top":60,"width":132,"height":318,"label":"glass"}]
– cleme001
Nov 15 at 18:15












up vote
0
down vote













It looks like each row of your bbox column contains a dictionary inside of a list. I've tried to replicate your problem as follows. Edit: Clarifying that the below solution assumes that what you're referring to as a "JSON object" is represented as a list containing a single dictionary, which is what it appears to be per your example and screenshot.



# Create empty sample DataFrame with one row
df = pd.DataFrame([None],columns=['bbox'])

# Assign your sample item to the first row
df['bbox'][0] = [{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]


Now, to simple unpack the row you can do:



df['bbox_unpacked'] = df['bbox'].map(lambda x: x[0].values())


Which will get you a new column with a tuple of 5 items.



If you want to go further and apply your labels, you'll likely want to create a dictionary to contain your labeling logic. Per the example you're given in the comments, I've done:



labels = {
'cardboard': 1,
'trash': 2,
'glass': 3
}


This should get your your desired layout if you want a one-line solution without writing your own function.



df['bbox_unpacked'] = df['bbox'].map(lambda x: (list(x[0].values())[:4],labels.get(list(x[0].values())[-1])))


A more readable solution would be to define your own function using the .apply() method. Edit: Since it looks like your JSON object is being stored as a str inside your DataFrame rows, I added json.loads(row) to process the string first before retrieving the keys. You'll need to import json to run.



import json    

def unpack_bbox(row, labels):

# load the string into a JSON object (in this
# case a list of length one containing the dictionary;
# index the list to its first item [0] and use the .values()
# dictionary method to access the values only

keys = list(json.loads(row)[0].values())

bbox_values = keys[:4]
bbox_label = keys[-1]

label_value = labels.get(bbox_label)

return bbox_values, label_value

df['bbox_unpacked'] = df['bbox'].apply(unpack_bbox,args=(labels,))





share|improve this answer























  • For some reason I get multiple rows when I do BB_CSV['bbox'][0] - added an image to the post.
    – cleme001
    Nov 12 at 3:10










  • @cleme001 The df['bbox'][0] snippet was just to show how I assigned your sample list to a row of a sample DataFrame to replicate your issue. Have you tried BB_CSV['bbox'].map(lambda x: x[0].values()) to unpack your rows?
    – dmitriys
    Nov 12 at 3:30










  • The issue seems to be that the row it self is not a string but rather a numpyndarray so I'm getting this error. BB_CSV['bbox'][0].values() --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-29-57c5ca36959f> in <module>() ----> 1 BB_CSV['bbox'][0].values() TypeError: 'numpy.ndarray' object is not callable
    – cleme001
    Nov 13 at 16:04










  • Also if you look at the image I attached to the original post you'll see that when I even try to look at a single row I get back 5 different things not just the value of the row.
    – cleme001
    Nov 13 at 16:07










  • @cleme001 Does your DataFrame have duplicate values in it's index? Can you try df.reset_index(drop=True, inplace=False) and try it again?
    – dmitriys
    Nov 13 at 17:02















up vote
0
down vote













It looks like each row of your bbox column contains a dictionary inside of a list. I've tried to replicate your problem as follows. Edit: Clarifying that the below solution assumes that what you're referring to as a "JSON object" is represented as a list containing a single dictionary, which is what it appears to be per your example and screenshot.



# Create empty sample DataFrame with one row
df = pd.DataFrame([None],columns=['bbox'])

# Assign your sample item to the first row
df['bbox'][0] = [{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]


Now, to simple unpack the row you can do:



df['bbox_unpacked'] = df['bbox'].map(lambda x: x[0].values())


Which will get you a new column with a tuple of 5 items.



If you want to go further and apply your labels, you'll likely want to create a dictionary to contain your labeling logic. Per the example you're given in the comments, I've done:



labels = {
'cardboard': 1,
'trash': 2,
'glass': 3
}


This should get your your desired layout if you want a one-line solution without writing your own function.



df['bbox_unpacked'] = df['bbox'].map(lambda x: (list(x[0].values())[:4],labels.get(list(x[0].values())[-1])))


A more readable solution would be to define your own function using the .apply() method. Edit: Since it looks like your JSON object is being stored as a str inside your DataFrame rows, I added json.loads(row) to process the string first before retrieving the keys. You'll need to import json to run.



import json    

def unpack_bbox(row, labels):

# load the string into a JSON object (in this
# case a list of length one containing the dictionary;
# index the list to its first item [0] and use the .values()
# dictionary method to access the values only

keys = list(json.loads(row)[0].values())

bbox_values = keys[:4]
bbox_label = keys[-1]

label_value = labels.get(bbox_label)

return bbox_values, label_value

df['bbox_unpacked'] = df['bbox'].apply(unpack_bbox,args=(labels,))





share|improve this answer























  • For some reason I get multiple rows when I do BB_CSV['bbox'][0] - added an image to the post.
    – cleme001
    Nov 12 at 3:10










  • @cleme001 The df['bbox'][0] snippet was just to show how I assigned your sample list to a row of a sample DataFrame to replicate your issue. Have you tried BB_CSV['bbox'].map(lambda x: x[0].values()) to unpack your rows?
    – dmitriys
    Nov 12 at 3:30










  • The issue seems to be that the row it self is not a string but rather a numpyndarray so I'm getting this error. BB_CSV['bbox'][0].values() --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-29-57c5ca36959f> in <module>() ----> 1 BB_CSV['bbox'][0].values() TypeError: 'numpy.ndarray' object is not callable
    – cleme001
    Nov 13 at 16:04










  • Also if you look at the image I attached to the original post you'll see that when I even try to look at a single row I get back 5 different things not just the value of the row.
    – cleme001
    Nov 13 at 16:07










  • @cleme001 Does your DataFrame have duplicate values in it's index? Can you try df.reset_index(drop=True, inplace=False) and try it again?
    – dmitriys
    Nov 13 at 17:02













up vote
0
down vote










up vote
0
down vote









It looks like each row of your bbox column contains a dictionary inside of a list. I've tried to replicate your problem as follows. Edit: Clarifying that the below solution assumes that what you're referring to as a "JSON object" is represented as a list containing a single dictionary, which is what it appears to be per your example and screenshot.



# Create empty sample DataFrame with one row
df = pd.DataFrame([None],columns=['bbox'])

# Assign your sample item to the first row
df['bbox'][0] = [{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]


Now, to simple unpack the row you can do:



df['bbox_unpacked'] = df['bbox'].map(lambda x: x[0].values())


Which will get you a new column with a tuple of 5 items.



If you want to go further and apply your labels, you'll likely want to create a dictionary to contain your labeling logic. Per the example you're given in the comments, I've done:



labels = {
'cardboard': 1,
'trash': 2,
'glass': 3
}


This should get your your desired layout if you want a one-line solution without writing your own function.



df['bbox_unpacked'] = df['bbox'].map(lambda x: (list(x[0].values())[:4],labels.get(list(x[0].values())[-1])))


A more readable solution would be to define your own function using the .apply() method. Edit: Since it looks like your JSON object is being stored as a str inside your DataFrame rows, I added json.loads(row) to process the string first before retrieving the keys. You'll need to import json to run.



import json    

def unpack_bbox(row, labels):

# load the string into a JSON object (in this
# case a list of length one containing the dictionary;
# index the list to its first item [0] and use the .values()
# dictionary method to access the values only

keys = list(json.loads(row)[0].values())

bbox_values = keys[:4]
bbox_label = keys[-1]

label_value = labels.get(bbox_label)

return bbox_values, label_value

df['bbox_unpacked'] = df['bbox'].apply(unpack_bbox,args=(labels,))





share|improve this answer














It looks like each row of your bbox column contains a dictionary inside of a list. I've tried to replicate your problem as follows. Edit: Clarifying that the below solution assumes that what you're referring to as a "JSON object" is represented as a list containing a single dictionary, which is what it appears to be per your example and screenshot.



# Create empty sample DataFrame with one row
df = pd.DataFrame([None],columns=['bbox'])

# Assign your sample item to the first row
df['bbox'][0] = [{"left":191,"top":70,"width":183,"height":311,"label":"glass"}]


Now, to simple unpack the row you can do:



df['bbox_unpacked'] = df['bbox'].map(lambda x: x[0].values())


Which will get you a new column with a tuple of 5 items.



If you want to go further and apply your labels, you'll likely want to create a dictionary to contain your labeling logic. Per the example you're given in the comments, I've done:



labels = {
'cardboard': 1,
'trash': 2,
'glass': 3
}


This should get your your desired layout if you want a one-line solution without writing your own function.



df['bbox_unpacked'] = df['bbox'].map(lambda x: (list(x[0].values())[:4],labels.get(list(x[0].values())[-1])))


A more readable solution would be to define your own function using the .apply() method. Edit: Since it looks like your JSON object is being stored as a str inside your DataFrame rows, I added json.loads(row) to process the string first before retrieving the keys. You'll need to import json to run.



import json    

def unpack_bbox(row, labels):

# load the string into a JSON object (in this
# case a list of length one containing the dictionary;
# index the list to its first item [0] and use the .values()
# dictionary method to access the values only

keys = list(json.loads(row)[0].values())

bbox_values = keys[:4]
bbox_label = keys[-1]

label_value = labels.get(bbox_label)

return bbox_values, label_value

df['bbox_unpacked'] = df['bbox'].apply(unpack_bbox,args=(labels,))






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 13 at 18:01

























answered Nov 12 at 2:26









dmitriys

14619




14619












  • For some reason I get multiple rows when I do BB_CSV['bbox'][0] - added an image to the post.
    – cleme001
    Nov 12 at 3:10










  • @cleme001 The df['bbox'][0] snippet was just to show how I assigned your sample list to a row of a sample DataFrame to replicate your issue. Have you tried BB_CSV['bbox'].map(lambda x: x[0].values()) to unpack your rows?
    – dmitriys
    Nov 12 at 3:30










  • The issue seems to be that the row it self is not a string but rather a numpyndarray so I'm getting this error. BB_CSV['bbox'][0].values() --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-29-57c5ca36959f> in <module>() ----> 1 BB_CSV['bbox'][0].values() TypeError: 'numpy.ndarray' object is not callable
    – cleme001
    Nov 13 at 16:04










  • Also if you look at the image I attached to the original post you'll see that when I even try to look at a single row I get back 5 different things not just the value of the row.
    – cleme001
    Nov 13 at 16:07










  • @cleme001 Does your DataFrame have duplicate values in it's index? Can you try df.reset_index(drop=True, inplace=False) and try it again?
    – dmitriys
    Nov 13 at 17:02


















  • For some reason I get multiple rows when I do BB_CSV['bbox'][0] - added an image to the post.
    – cleme001
    Nov 12 at 3:10










  • @cleme001 The df['bbox'][0] snippet was just to show how I assigned your sample list to a row of a sample DataFrame to replicate your issue. Have you tried BB_CSV['bbox'].map(lambda x: x[0].values()) to unpack your rows?
    – dmitriys
    Nov 12 at 3:30










  • The issue seems to be that the row it self is not a string but rather a numpyndarray so I'm getting this error. BB_CSV['bbox'][0].values() --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-29-57c5ca36959f> in <module>() ----> 1 BB_CSV['bbox'][0].values() TypeError: 'numpy.ndarray' object is not callable
    – cleme001
    Nov 13 at 16:04










  • Also if you look at the image I attached to the original post you'll see that when I even try to look at a single row I get back 5 different things not just the value of the row.
    – cleme001
    Nov 13 at 16:07










  • @cleme001 Does your DataFrame have duplicate values in it's index? Can you try df.reset_index(drop=True, inplace=False) and try it again?
    – dmitriys
    Nov 13 at 17:02
















For some reason I get multiple rows when I do BB_CSV['bbox'][0] - added an image to the post.
– cleme001
Nov 12 at 3:10




For some reason I get multiple rows when I do BB_CSV['bbox'][0] - added an image to the post.
– cleme001
Nov 12 at 3:10












@cleme001 The df['bbox'][0] snippet was just to show how I assigned your sample list to a row of a sample DataFrame to replicate your issue. Have you tried BB_CSV['bbox'].map(lambda x: x[0].values()) to unpack your rows?
– dmitriys
Nov 12 at 3:30




@cleme001 The df['bbox'][0] snippet was just to show how I assigned your sample list to a row of a sample DataFrame to replicate your issue. Have you tried BB_CSV['bbox'].map(lambda x: x[0].values()) to unpack your rows?
– dmitriys
Nov 12 at 3:30












The issue seems to be that the row it self is not a string but rather a numpyndarray so I'm getting this error. BB_CSV['bbox'][0].values() --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-29-57c5ca36959f> in <module>() ----> 1 BB_CSV['bbox'][0].values() TypeError: 'numpy.ndarray' object is not callable
– cleme001
Nov 13 at 16:04




The issue seems to be that the row it self is not a string but rather a numpyndarray so I'm getting this error. BB_CSV['bbox'][0].values() --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-29-57c5ca36959f> in <module>() ----> 1 BB_CSV['bbox'][0].values() TypeError: 'numpy.ndarray' object is not callable
– cleme001
Nov 13 at 16:04












Also if you look at the image I attached to the original post you'll see that when I even try to look at a single row I get back 5 different things not just the value of the row.
– cleme001
Nov 13 at 16:07




Also if you look at the image I attached to the original post you'll see that when I even try to look at a single row I get back 5 different things not just the value of the row.
– cleme001
Nov 13 at 16:07












@cleme001 Does your DataFrame have duplicate values in it's index? Can you try df.reset_index(drop=True, inplace=False) and try it again?
– dmitriys
Nov 13 at 17:02




@cleme001 Does your DataFrame have duplicate values in it's index? Can you try df.reset_index(drop=True, inplace=False) and try it again?
– dmitriys
Nov 13 at 17:02


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53255172%2fconvert-json-column-in-dataframe-to-simple-array-of-values%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

List item for chat from Array inside array React Native

Thiostrepton

Caerphilly