is there a way convert a spark dataframe generated from a sql statement into an rdd?
up vote
0
down vote
favorite
if i use this spark sql statement:
df = spark.sql('SELECT col_name FROM table_name')
it will return a spark dataframe object. How can i convert this to an rdd? is there a way to read a table directly using sql but generating an rdd instead of a dataframe?
Thanks in advance
python python-3.x apache-spark dataframe pyspark
|
show 3 more comments
up vote
0
down vote
favorite
if i use this spark sql statement:
df = spark.sql('SELECT col_name FROM table_name')
it will return a spark dataframe object. How can i convert this to an rdd? is there a way to read a table directly using sql but generating an rdd instead of a dataframe?
Thanks in advance
python python-3.x apache-spark dataframe pyspark
1
df.rdd should give you the RDD
– sramalingam24
Nov 11 at 16:28
i tried that but no, instead, i get the following error:PicklingError: Could not serialize object: Py4JError: An error occurred while calling o60.__getstate__. Trace: py4j.Py4JException: Method __getstate__() does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79)
– Miguel 2488
Nov 11 at 16:32
i have visited a good bunch of posts here talking about more or less the same thing, but i get this error instead
– Miguel 2488
Nov 11 at 16:33
1
stackoverflow.com/questions/29000514/… Try one of the alternatives suggested here
– sramalingam24
Nov 11 at 16:40
Possible duplicate of Pyspark - Why i can't convert a sql dataframe to an rdd?
– user10465355
Nov 11 at 16:40
|
show 3 more comments
up vote
0
down vote
favorite
up vote
0
down vote
favorite
if i use this spark sql statement:
df = spark.sql('SELECT col_name FROM table_name')
it will return a spark dataframe object. How can i convert this to an rdd? is there a way to read a table directly using sql but generating an rdd instead of a dataframe?
Thanks in advance
python python-3.x apache-spark dataframe pyspark
if i use this spark sql statement:
df = spark.sql('SELECT col_name FROM table_name')
it will return a spark dataframe object. How can i convert this to an rdd? is there a way to read a table directly using sql but generating an rdd instead of a dataframe?
Thanks in advance
python python-3.x apache-spark dataframe pyspark
python python-3.x apache-spark dataframe pyspark
asked Nov 11 at 16:26
Miguel 2488
18712
18712
1
df.rdd should give you the RDD
– sramalingam24
Nov 11 at 16:28
i tried that but no, instead, i get the following error:PicklingError: Could not serialize object: Py4JError: An error occurred while calling o60.__getstate__. Trace: py4j.Py4JException: Method __getstate__() does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79)
– Miguel 2488
Nov 11 at 16:32
i have visited a good bunch of posts here talking about more or less the same thing, but i get this error instead
– Miguel 2488
Nov 11 at 16:33
1
stackoverflow.com/questions/29000514/… Try one of the alternatives suggested here
– sramalingam24
Nov 11 at 16:40
Possible duplicate of Pyspark - Why i can't convert a sql dataframe to an rdd?
– user10465355
Nov 11 at 16:40
|
show 3 more comments
1
df.rdd should give you the RDD
– sramalingam24
Nov 11 at 16:28
i tried that but no, instead, i get the following error:PicklingError: Could not serialize object: Py4JError: An error occurred while calling o60.__getstate__. Trace: py4j.Py4JException: Method __getstate__() does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79)
– Miguel 2488
Nov 11 at 16:32
i have visited a good bunch of posts here talking about more or less the same thing, but i get this error instead
– Miguel 2488
Nov 11 at 16:33
1
stackoverflow.com/questions/29000514/… Try one of the alternatives suggested here
– sramalingam24
Nov 11 at 16:40
Possible duplicate of Pyspark - Why i can't convert a sql dataframe to an rdd?
– user10465355
Nov 11 at 16:40
1
1
df.rdd should give you the RDD
– sramalingam24
Nov 11 at 16:28
df.rdd should give you the RDD
– sramalingam24
Nov 11 at 16:28
i tried that but no, instead, i get the following error:
PicklingError: Could not serialize object: Py4JError: An error occurred while calling o60.__getstate__. Trace: py4j.Py4JException: Method __getstate__() does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) – Miguel 2488
Nov 11 at 16:32
i tried that but no, instead, i get the following error:
PicklingError: Could not serialize object: Py4JError: An error occurred while calling o60.__getstate__. Trace: py4j.Py4JException: Method __getstate__() does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) – Miguel 2488
Nov 11 at 16:32
i have visited a good bunch of posts here talking about more or less the same thing, but i get this error instead
– Miguel 2488
Nov 11 at 16:33
i have visited a good bunch of posts here talking about more or less the same thing, but i get this error instead
– Miguel 2488
Nov 11 at 16:33
1
1
stackoverflow.com/questions/29000514/… Try one of the alternatives suggested here
– sramalingam24
Nov 11 at 16:40
stackoverflow.com/questions/29000514/… Try one of the alternatives suggested here
– sramalingam24
Nov 11 at 16:40
Possible duplicate of Pyspark - Why i can't convert a sql dataframe to an rdd?
– user10465355
Nov 11 at 16:40
Possible duplicate of Pyspark - Why i can't convert a sql dataframe to an rdd?
– user10465355
Nov 11 at 16:40
|
show 3 more comments
1 Answer
1
active
oldest
votes
up vote
2
down vote
accepted
df = spark.sql('SELECT col_name FROM table_name')
df.rdd # you can save it, perform transformations etc.
df.rdd returns the content as an pyspark.RDD of Row.
You can then map on that RDD of Row transforming every Row into a numpy vector. I can't be more specific about the transformation since I don't know what your vector represents with the information given.
Note 1: df is the variable define our Dataframe.
Note 2: this function is available since Spark 1.3
Did this solve thePicklingError: Could not serialize object: Py4JError: An error occurred while calling o60.__getstate__. Trace: py4j.Py4JException: Method __getstate__() does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79)error?
– karma4917
Nov 12 at 16:13
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
df = spark.sql('SELECT col_name FROM table_name')
df.rdd # you can save it, perform transformations etc.
df.rdd returns the content as an pyspark.RDD of Row.
You can then map on that RDD of Row transforming every Row into a numpy vector. I can't be more specific about the transformation since I don't know what your vector represents with the information given.
Note 1: df is the variable define our Dataframe.
Note 2: this function is available since Spark 1.3
Did this solve thePicklingError: Could not serialize object: Py4JError: An error occurred while calling o60.__getstate__. Trace: py4j.Py4JException: Method __getstate__() does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79)error?
– karma4917
Nov 12 at 16:13
add a comment |
up vote
2
down vote
accepted
df = spark.sql('SELECT col_name FROM table_name')
df.rdd # you can save it, perform transformations etc.
df.rdd returns the content as an pyspark.RDD of Row.
You can then map on that RDD of Row transforming every Row into a numpy vector. I can't be more specific about the transformation since I don't know what your vector represents with the information given.
Note 1: df is the variable define our Dataframe.
Note 2: this function is available since Spark 1.3
Did this solve thePicklingError: Could not serialize object: Py4JError: An error occurred while calling o60.__getstate__. Trace: py4j.Py4JException: Method __getstate__() does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79)error?
– karma4917
Nov 12 at 16:13
add a comment |
up vote
2
down vote
accepted
up vote
2
down vote
accepted
df = spark.sql('SELECT col_name FROM table_name')
df.rdd # you can save it, perform transformations etc.
df.rdd returns the content as an pyspark.RDD of Row.
You can then map on that RDD of Row transforming every Row into a numpy vector. I can't be more specific about the transformation since I don't know what your vector represents with the information given.
Note 1: df is the variable define our Dataframe.
Note 2: this function is available since Spark 1.3
df = spark.sql('SELECT col_name FROM table_name')
df.rdd # you can save it, perform transformations etc.
df.rdd returns the content as an pyspark.RDD of Row.
You can then map on that RDD of Row transforming every Row into a numpy vector. I can't be more specific about the transformation since I don't know what your vector represents with the information given.
Note 1: df is the variable define our Dataframe.
Note 2: this function is available since Spark 1.3
edited Nov 12 at 7:38
Ali AzG
587515
587515
answered Nov 11 at 20:29
Nagilla Venkatesh
362
362
Did this solve thePicklingError: Could not serialize object: Py4JError: An error occurred while calling o60.__getstate__. Trace: py4j.Py4JException: Method __getstate__() does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79)error?
– karma4917
Nov 12 at 16:13
add a comment |
Did this solve thePicklingError: Could not serialize object: Py4JError: An error occurred while calling o60.__getstate__. Trace: py4j.Py4JException: Method __getstate__() does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79)error?
– karma4917
Nov 12 at 16:13
Did this solve the
PicklingError: Could not serialize object: Py4JError: An error occurred while calling o60.__getstate__. Trace: py4j.Py4JException: Method __getstate__() does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) error?– karma4917
Nov 12 at 16:13
Did this solve the
PicklingError: Could not serialize object: Py4JError: An error occurred while calling o60.__getstate__. Trace: py4j.Py4JException: Method __getstate__() does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) error?– karma4917
Nov 12 at 16:13
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53250760%2fis-there-a-way-convert-a-spark-dataframe-generated-from-a-sql-statement-into-an%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
df.rdd should give you the RDD
– sramalingam24
Nov 11 at 16:28
i tried that but no, instead, i get the following error:
PicklingError: Could not serialize object: Py4JError: An error occurred while calling o60.__getstate__. Trace: py4j.Py4JException: Method __getstate__() does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326) at py4j.Gateway.invoke(Gateway.java:274) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79)– Miguel 2488
Nov 11 at 16:32
i have visited a good bunch of posts here talking about more or less the same thing, but i get this error instead
– Miguel 2488
Nov 11 at 16:33
1
stackoverflow.com/questions/29000514/… Try one of the alternatives suggested here
– sramalingam24
Nov 11 at 16:40
Possible duplicate of Pyspark - Why i can't convert a sql dataframe to an rdd?
– user10465355
Nov 11 at 16:40