How can I iterate through a column of a spark dataframe and access the values in it one by one?

I have spark dataframe
Here it is

I would like to fetch the values of a column one by one and need to assign it to some variable?How can it be done in pyspark.Sorry I am a newbie to spark as well as stackoverflow.Please forgive the lack of clarity in question

asked Nov 13 '18 at 14:24

RAM SHANKER G

417

For which column you want to do this?

– karma4917
Nov 13 '18 at 15:36

There are some fundamental misunderstandings here about how spark dataframes work. Don't think about iterating through values one by one- instead think about operating on all the values at the same time (after all, it's a parallel, distributed architecture). This seems like an XY problem. Please explain, in detail, what you are trying to do and try to edit your question to provide a reproducible example.

– pault
Nov 13 '18 at 15:38

Also, don't post pictures of or links to code/data.

– pault
Nov 13 '18 at 15:41

add a comment |

I have spark dataframe
Here it is

asked Nov 13 '18 at 14:24

RAM SHANKER G

417

For which column you want to do this?

– karma4917
Nov 13 '18 at 15:36

There are some fundamental misunderstandings here about how spark dataframes work. Don't think about iterating through values one by one- instead think about operating on all the values at the same time (after all, it's a parallel, distributed architecture). This seems like an XY problem. Please explain, in detail, what you are trying to do and try to edit your question to provide a reproducible example.

– pault
Nov 13 '18 at 15:38

Also, don't post pictures of or links to code/data.

– pault
Nov 13 '18 at 15:41

add a comment |

I have spark dataframe
Here it is

asked Nov 13 '18 at 14:24

RAM SHANKER G

417

I have spark dataframe
Here it is

pyspark apache-spark-sql

asked Nov 13 '18 at 14:24

RAM SHANKER G

417

asked Nov 13 '18 at 14:24

RAM SHANKER G

417

asked Nov 13 '18 at 14:24

RAM SHANKER G

417

asked Nov 13 '18 at 14:24

RAM SHANKER G

417

asked Nov 13 '18 at 14:24

RAM SHANKER G

417

For which column you want to do this?

– karma4917
Nov 13 '18 at 15:36

There are some fundamental misunderstandings here about how spark dataframes work. Don't think about iterating through values one by one- instead think about operating on all the values at the same time (after all, it's a parallel, distributed architecture). This seems like an XY problem. Please explain, in detail, what you are trying to do and try to edit your question to provide a reproducible example.

– pault
Nov 13 '18 at 15:38

Also, don't post pictures of or links to code/data.

– pault
Nov 13 '18 at 15:41

add a comment |

For which column you want to do this?

– karma4917
Nov 13 '18 at 15:36

There are some fundamental misunderstandings here about how spark dataframes work. Don't think about iterating through values one by one- instead think about operating on all the values at the same time (after all, it's a parallel, distributed architecture). This seems like an XY problem. Please explain, in detail, what you are trying to do and try to edit your question to provide a reproducible example.

– pault
Nov 13 '18 at 15:38

Also, don't post pictures of or links to code/data.

– pault
Nov 13 '18 at 15:41

For which column you want to do this?

– karma4917
Nov 13 '18 at 15:36

There are some fundamental misunderstandings here about how spark dataframes work. Don't think about iterating through values one by one- instead think about operating on all the values at the same time (after all, it's a parallel, distributed architecture). This seems like an XY problem. Please explain, in detail, what you are trying to do and try to edit your question to provide a reproducible example.

– pault
Nov 13 '18 at 15:38

Also, don't post pictures of or links to code/data.

– pault
Nov 13 '18 at 15:41

add a comment |

2 Answers
2

active

oldest

votes

I don't understand exactly what you are asking, but if you want to store them in a variable outside of the dataframes that spark offers, the best option is to select the column you want and store it as a panda series (if they are not a lot, because your memory is limited).

from pyspark.sql import functions as F



var = df.select(F.col('column_you_want')).toPandas()

Then you can iterate on it like a normal pandas series.

answered Nov 13 '18 at 15:14

Manrique

500113

No ,I need to access one value in each iteration and store it in a variable.I dont want to use toPandas as it consumes more memory!

– RAM SHANKER G
Nov 13 '18 at 15:16

add a comment |

col1=df.select(df.column_of_df).collect()

list1=[str(i[0]) for i in col1]

#after this we can iterate through list (list1 in this case)

answered Nov 13 '18 at 16:25

Avinash

What if there are more number of rows? collect() operation will be costly right?

– karma4917
Nov 13 '18 at 16:40

repartition the dataframe to the same number of nodes, the instance is running on, before using collect() to reduce time and memory costs.

– Avinash
Nov 13 '18 at 17:58

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53283172%2fhow-can-i-iterate-through-a-column-of-a-spark-dataframe-and-access-the-values-in%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

from pyspark.sql import functions as F



var = df.select(F.col('column_you_want')).toPandas()

Then you can iterate on it like a normal pandas series.

answered Nov 13 '18 at 15:14

Manrique

500113

No ,I need to access one value in each iteration and store it in a variable.I dont want to use toPandas as it consumes more memory!

– RAM SHANKER G
Nov 13 '18 at 15:16

add a comment |

from pyspark.sql import functions as F



var = df.select(F.col('column_you_want')).toPandas()

Then you can iterate on it like a normal pandas series.

answered Nov 13 '18 at 15:14

Manrique

500113

No ,I need to access one value in each iteration and store it in a variable.I dont want to use toPandas as it consumes more memory!

– RAM SHANKER G
Nov 13 '18 at 15:16

add a comment |

from pyspark.sql import functions as F



var = df.select(F.col('column_you_want')).toPandas()

Then you can iterate on it like a normal pandas series.

answered Nov 13 '18 at 15:14

Manrique

500113

from pyspark.sql import functions as F



var = df.select(F.col('column_you_want')).toPandas()

Then you can iterate on it like a normal pandas series.

answered Nov 13 '18 at 15:14

Manrique

500113

answered Nov 13 '18 at 15:14

Manrique

500113

answered Nov 13 '18 at 15:14

Manrique

500113

answered Nov 13 '18 at 15:14

Manrique

500113

No ,I need to access one value in each iteration and store it in a variable.I dont want to use toPandas as it consumes more memory!

– RAM SHANKER G
Nov 13 '18 at 15:16

add a comment |

No ,I need to access one value in each iteration and store it in a variable.I dont want to use toPandas as it consumes more memory!

– RAM SHANKER G
Nov 13 '18 at 15:16

No ,I need to access one value in each iteration and store it in a variable.I dont want to use toPandas as it consumes more memory!

– RAM SHANKER G
Nov 13 '18 at 15:16

add a comment |

col1=df.select(df.column_of_df).collect()

list1=[str(i[0]) for i in col1]

#after this we can iterate through list (list1 in this case)

answered Nov 13 '18 at 16:25

Avinash

What if there are more number of rows? collect() operation will be costly right?

– karma4917
Nov 13 '18 at 16:40

repartition the dataframe to the same number of nodes, the instance is running on, before using collect() to reduce time and memory costs.

– Avinash
Nov 13 '18 at 17:58

add a comment |

col1=df.select(df.column_of_df).collect()

list1=[str(i[0]) for i in col1]

#after this we can iterate through list (list1 in this case)

answered Nov 13 '18 at 16:25

Avinash

What if there are more number of rows? collect() operation will be costly right?

– karma4917
Nov 13 '18 at 16:40

repartition the dataframe to the same number of nodes, the instance is running on, before using collect() to reduce time and memory costs.

– Avinash
Nov 13 '18 at 17:58

add a comment |

col1=df.select(df.column_of_df).collect()

list1=[str(i[0]) for i in col1]

#after this we can iterate through list (list1 in this case)

answered Nov 13 '18 at 16:25

Avinash

col1=df.select(df.column_of_df).collect()

list1=[str(i[0]) for i in col1]

#after this we can iterate through list (list1 in this case)

answered Nov 13 '18 at 16:25

Avinash

answered Nov 13 '18 at 16:25

Avinash

answered Nov 13 '18 at 16:25

Avinash

answered Nov 13 '18 at 16:25

Avinash

What if there are more number of rows? collect() operation will be costly right?

– karma4917
Nov 13 '18 at 16:40

repartition the dataframe to the same number of nodes, the instance is running on, before using collect() to reduce time and memory costs.

– Avinash
Nov 13 '18 at 17:58

add a comment |

What if there are more number of rows? collect() operation will be costly right?

– karma4917
Nov 13 '18 at 16:40

repartition the dataframe to the same number of nodes, the instance is running on, before using collect() to reduce time and memory costs.

– Avinash
Nov 13 '18 at 17:58

What if there are more number of rows? collect() operation will be costly right?

– karma4917
Nov 13 '18 at 16:40

repartition the dataframe to the same number of nodes, the instance is running on, before using collect() to reduce time and memory costs.

– Avinash
Nov 13 '18 at 17:58

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

1tCn CsU omLvR4XT11JQ4sYYcebZO CYfk9CmdMztByhYB,U2EnZy

搜尋此網誌

Vfrdtyky