is there a way convert a spark dataframe generated from a sql statement into an rdd?











up vote
0
down vote

favorite












if i use this spark sql statement:



df = spark.sql('SELECT col_name FROM table_name')


it will return a spark dataframe object. How can i convert this to an rdd? is there a way to read a table directly using sql but generating an rdd instead of a dataframe?



Thanks in advance










share|improve this question


















  • 1




    df.rdd should give you the RDD
    – sramalingam24
    Nov 11 at 16:28










  • i tried that but no, instead, i get the following error: PicklingError: Could not serialize object: Py4JError: An error occurred while calling o60.__getstate__. Trace: py4j.Py4JException: Method __getstate__() does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79)
    – Miguel 2488
    Nov 11 at 16:32










  • i have visited a good bunch of posts here talking about more or less the same thing, but i get this error instead
    – Miguel 2488
    Nov 11 at 16:33






  • 1




    stackoverflow.com/questions/29000514/… Try one of the alternatives suggested here
    – sramalingam24
    Nov 11 at 16:40










  • Possible duplicate of Pyspark - Why i can't convert a sql dataframe to an rdd?
    – user10465355
    Nov 11 at 16:40















up vote
0
down vote

favorite












if i use this spark sql statement:



df = spark.sql('SELECT col_name FROM table_name')


it will return a spark dataframe object. How can i convert this to an rdd? is there a way to read a table directly using sql but generating an rdd instead of a dataframe?



Thanks in advance










share|improve this question


















  • 1




    df.rdd should give you the RDD
    – sramalingam24
    Nov 11 at 16:28










  • i tried that but no, instead, i get the following error: PicklingError: Could not serialize object: Py4JError: An error occurred while calling o60.__getstate__. Trace: py4j.Py4JException: Method __getstate__() does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79)
    – Miguel 2488
    Nov 11 at 16:32










  • i have visited a good bunch of posts here talking about more or less the same thing, but i get this error instead
    – Miguel 2488
    Nov 11 at 16:33






  • 1




    stackoverflow.com/questions/29000514/… Try one of the alternatives suggested here
    – sramalingam24
    Nov 11 at 16:40










  • Possible duplicate of Pyspark - Why i can't convert a sql dataframe to an rdd?
    – user10465355
    Nov 11 at 16:40













up vote
0
down vote

favorite









up vote
0
down vote

favorite











if i use this spark sql statement:



df = spark.sql('SELECT col_name FROM table_name')


it will return a spark dataframe object. How can i convert this to an rdd? is there a way to read a table directly using sql but generating an rdd instead of a dataframe?



Thanks in advance










share|improve this question













if i use this spark sql statement:



df = spark.sql('SELECT col_name FROM table_name')


it will return a spark dataframe object. How can i convert this to an rdd? is there a way to read a table directly using sql but generating an rdd instead of a dataframe?



Thanks in advance







python python-3.x apache-spark dataframe pyspark






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 11 at 16:26









Miguel 2488

18712




18712








  • 1




    df.rdd should give you the RDD
    – sramalingam24
    Nov 11 at 16:28










  • i tried that but no, instead, i get the following error: PicklingError: Could not serialize object: Py4JError: An error occurred while calling o60.__getstate__. Trace: py4j.Py4JException: Method __getstate__() does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79)
    – Miguel 2488
    Nov 11 at 16:32










  • i have visited a good bunch of posts here talking about more or less the same thing, but i get this error instead
    – Miguel 2488
    Nov 11 at 16:33






  • 1




    stackoverflow.com/questions/29000514/… Try one of the alternatives suggested here
    – sramalingam24
    Nov 11 at 16:40










  • Possible duplicate of Pyspark - Why i can't convert a sql dataframe to an rdd?
    – user10465355
    Nov 11 at 16:40














  • 1




    df.rdd should give you the RDD
    – sramalingam24
    Nov 11 at 16:28










  • i tried that but no, instead, i get the following error: PicklingError: Could not serialize object: Py4JError: An error occurred while calling o60.__getstate__. Trace: py4j.Py4JException: Method __getstate__() does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79)
    – Miguel 2488
    Nov 11 at 16:32










  • i have visited a good bunch of posts here talking about more or less the same thing, but i get this error instead
    – Miguel 2488
    Nov 11 at 16:33






  • 1




    stackoverflow.com/questions/29000514/… Try one of the alternatives suggested here
    – sramalingam24
    Nov 11 at 16:40










  • Possible duplicate of Pyspark - Why i can't convert a sql dataframe to an rdd?
    – user10465355
    Nov 11 at 16:40








1




1




df.rdd should give you the RDD
– sramalingam24
Nov 11 at 16:28




df.rdd should give you the RDD
– sramalingam24
Nov 11 at 16:28












i tried that but no, instead, i get the following error: PicklingError: Could not serialize object: Py4JError: An error occurred while calling o60.__getstate__. Trace: py4j.Py4JException: Method __getstate__() does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79)
– Miguel 2488
Nov 11 at 16:32




i tried that but no, instead, i get the following error: PicklingError: Could not serialize object: Py4JError: An error occurred while calling o60.__getstate__. Trace: py4j.Py4JException: Method __getstate__() does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79)
– Miguel 2488
Nov 11 at 16:32












i have visited a good bunch of posts here talking about more or less the same thing, but i get this error instead
– Miguel 2488
Nov 11 at 16:33




i have visited a good bunch of posts here talking about more or less the same thing, but i get this error instead
– Miguel 2488
Nov 11 at 16:33




1




1




stackoverflow.com/questions/29000514/… Try one of the alternatives suggested here
– sramalingam24
Nov 11 at 16:40




stackoverflow.com/questions/29000514/… Try one of the alternatives suggested here
– sramalingam24
Nov 11 at 16:40












Possible duplicate of Pyspark - Why i can't convert a sql dataframe to an rdd?
– user10465355
Nov 11 at 16:40




Possible duplicate of Pyspark - Why i can't convert a sql dataframe to an rdd?
– user10465355
Nov 11 at 16:40












1 Answer
1






active

oldest

votes

















up vote
2
down vote



accepted










df = spark.sql('SELECT col_name FROM table_name')


df.rdd # you can save it, perform transformations etc.



df.rdd returns the content as an pyspark.RDD of Row.



You can then map on that RDD of Row transforming every Row into a numpy vector. I can't be more specific about the transformation since I don't know what your vector represents with the information given.



Note 1: df is the variable define our Dataframe.



Note 2: this function is available since Spark 1.3






share|improve this answer























  • Did this solve the PicklingError: Could not serialize object: Py4JError: An error occurred while calling o60.__getstate__. Trace: py4j.Py4JException: Method __getstate__() does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) error?
    – karma4917
    Nov 12 at 16:13











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53250760%2fis-there-a-way-convert-a-spark-dataframe-generated-from-a-sql-statement-into-an%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
2
down vote



accepted










df = spark.sql('SELECT col_name FROM table_name')


df.rdd # you can save it, perform transformations etc.



df.rdd returns the content as an pyspark.RDD of Row.



You can then map on that RDD of Row transforming every Row into a numpy vector. I can't be more specific about the transformation since I don't know what your vector represents with the information given.



Note 1: df is the variable define our Dataframe.



Note 2: this function is available since Spark 1.3






share|improve this answer























  • Did this solve the PicklingError: Could not serialize object: Py4JError: An error occurred while calling o60.__getstate__. Trace: py4j.Py4JException: Method __getstate__() does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) error?
    – karma4917
    Nov 12 at 16:13















up vote
2
down vote



accepted










df = spark.sql('SELECT col_name FROM table_name')


df.rdd # you can save it, perform transformations etc.



df.rdd returns the content as an pyspark.RDD of Row.



You can then map on that RDD of Row transforming every Row into a numpy vector. I can't be more specific about the transformation since I don't know what your vector represents with the information given.



Note 1: df is the variable define our Dataframe.



Note 2: this function is available since Spark 1.3






share|improve this answer























  • Did this solve the PicklingError: Could not serialize object: Py4JError: An error occurred while calling o60.__getstate__. Trace: py4j.Py4JException: Method __getstate__() does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) error?
    – karma4917
    Nov 12 at 16:13













up vote
2
down vote



accepted







up vote
2
down vote



accepted






df = spark.sql('SELECT col_name FROM table_name')


df.rdd # you can save it, perform transformations etc.



df.rdd returns the content as an pyspark.RDD of Row.



You can then map on that RDD of Row transforming every Row into a numpy vector. I can't be more specific about the transformation since I don't know what your vector represents with the information given.



Note 1: df is the variable define our Dataframe.



Note 2: this function is available since Spark 1.3






share|improve this answer














df = spark.sql('SELECT col_name FROM table_name')


df.rdd # you can save it, perform transformations etc.



df.rdd returns the content as an pyspark.RDD of Row.



You can then map on that RDD of Row transforming every Row into a numpy vector. I can't be more specific about the transformation since I don't know what your vector represents with the information given.



Note 1: df is the variable define our Dataframe.



Note 2: this function is available since Spark 1.3







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 12 at 7:38









Ali AzG

587515




587515










answered Nov 11 at 20:29









Nagilla Venkatesh

362




362












  • Did this solve the PicklingError: Could not serialize object: Py4JError: An error occurred while calling o60.__getstate__. Trace: py4j.Py4JException: Method __getstate__() does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) error?
    – karma4917
    Nov 12 at 16:13


















  • Did this solve the PicklingError: Could not serialize object: Py4JError: An error occurred while calling o60.__getstate__. Trace: py4j.Py4JException: Method __getstate__() does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) error?
    – karma4917
    Nov 12 at 16:13
















Did this solve the PicklingError: Could not serialize object: Py4JError: An error occurred while calling o60.__getstate__. Trace: py4j.Py4JException: Method __getstate__() does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) error?
– karma4917
Nov 12 at 16:13




Did this solve the PicklingError: Could not serialize object: Py4JError: An error occurred while calling o60.__getstate__. Trace: py4j.Py4JException: Method __getstate__() does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) error?
– karma4917
Nov 12 at 16:13


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53250760%2fis-there-a-way-convert-a-spark-dataframe-generated-from-a-sql-statement-into-an%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

List item for chat from Array inside array React Native

Thiostrepton

Caerphilly