Scala: How can I replace value in Dataframes using scala
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
For example I want to replace all numbers equal to 0.2 in a column to 0. How can I do that in Scala? Thanks
Edit:
|year| make|model| comment |blank|
|2012|Tesla| S | No comment | |
|1997| Ford| E350|Go get one now th...| |
|2015|Chevy| Volt| null | null|
This is my Dataframe I'm trying to change Tesla in make column to S
scala apache-spark dataframe
|
show 7 more comments
For example I want to replace all numbers equal to 0.2 in a column to 0. How can I do that in Scala? Thanks
Edit:
|year| make|model| comment |blank|
|2012|Tesla| S | No comment | |
|1997| Ford| E350|Go get one now th...| |
|2015|Chevy| Volt| null | null|
This is my Dataframe I'm trying to change Tesla in make column to S
scala apache-spark dataframe
by converting to RDD with.rdd
and usingmap
to change to 0 if 0.2 ?
– ccheneson
Sep 2 '15 at 15:58
What is the map command for change to 0 if 0.2?
– Tong
Sep 2 '15 at 16:30
And how can i focus on a specific column?
– Tong
Sep 2 '15 at 16:36
Give us an example of your data, what you have tried so far.
– ccheneson
Sep 2 '15 at 17:17
+----+-----+-----+--------------------+-----+ |year| make|model| comment|blank| +----+-----+-----+--------------------+-----+ |2012|Tesla| S| No comment| | |1997| Ford| E350|Go get one now th...| | |2015|Chevy| Volt| null| null| This is my Dataframe I'm trying to change Tesla in make column to S. I have just start learning Scala. Really appreciate your help!
– Tong
Sep 2 '15 at 17:37
|
show 7 more comments
For example I want to replace all numbers equal to 0.2 in a column to 0. How can I do that in Scala? Thanks
Edit:
|year| make|model| comment |blank|
|2012|Tesla| S | No comment | |
|1997| Ford| E350|Go get one now th...| |
|2015|Chevy| Volt| null | null|
This is my Dataframe I'm trying to change Tesla in make column to S
scala apache-spark dataframe
For example I want to replace all numbers equal to 0.2 in a column to 0. How can I do that in Scala? Thanks
Edit:
|year| make|model| comment |blank|
|2012|Tesla| S | No comment | |
|1997| Ford| E350|Go get one now th...| |
|2015|Chevy| Volt| null | null|
This is my Dataframe I'm trying to change Tesla in make column to S
scala apache-spark dataframe
scala apache-spark dataframe
edited Aug 25 '17 at 13:27
Javier Montón
471512
471512
asked Sep 2 '15 at 15:55
TongTong
199249
199249
by converting to RDD with.rdd
and usingmap
to change to 0 if 0.2 ?
– ccheneson
Sep 2 '15 at 15:58
What is the map command for change to 0 if 0.2?
– Tong
Sep 2 '15 at 16:30
And how can i focus on a specific column?
– Tong
Sep 2 '15 at 16:36
Give us an example of your data, what you have tried so far.
– ccheneson
Sep 2 '15 at 17:17
+----+-----+-----+--------------------+-----+ |year| make|model| comment|blank| +----+-----+-----+--------------------+-----+ |2012|Tesla| S| No comment| | |1997| Ford| E350|Go get one now th...| | |2015|Chevy| Volt| null| null| This is my Dataframe I'm trying to change Tesla in make column to S. I have just start learning Scala. Really appreciate your help!
– Tong
Sep 2 '15 at 17:37
|
show 7 more comments
by converting to RDD with.rdd
and usingmap
to change to 0 if 0.2 ?
– ccheneson
Sep 2 '15 at 15:58
What is the map command for change to 0 if 0.2?
– Tong
Sep 2 '15 at 16:30
And how can i focus on a specific column?
– Tong
Sep 2 '15 at 16:36
Give us an example of your data, what you have tried so far.
– ccheneson
Sep 2 '15 at 17:17
+----+-----+-----+--------------------+-----+ |year| make|model| comment|blank| +----+-----+-----+--------------------+-----+ |2012|Tesla| S| No comment| | |1997| Ford| E350|Go get one now th...| | |2015|Chevy| Volt| null| null| This is my Dataframe I'm trying to change Tesla in make column to S. I have just start learning Scala. Really appreciate your help!
– Tong
Sep 2 '15 at 17:37
by converting to RDD with
.rdd
and using map
to change to 0 if 0.2 ?– ccheneson
Sep 2 '15 at 15:58
by converting to RDD with
.rdd
and using map
to change to 0 if 0.2 ?– ccheneson
Sep 2 '15 at 15:58
What is the map command for change to 0 if 0.2?
– Tong
Sep 2 '15 at 16:30
What is the map command for change to 0 if 0.2?
– Tong
Sep 2 '15 at 16:30
And how can i focus on a specific column?
– Tong
Sep 2 '15 at 16:36
And how can i focus on a specific column?
– Tong
Sep 2 '15 at 16:36
Give us an example of your data, what you have tried so far.
– ccheneson
Sep 2 '15 at 17:17
Give us an example of your data, what you have tried so far.
– ccheneson
Sep 2 '15 at 17:17
+----+-----+-----+--------------------+-----+ |year| make|model| comment|blank| +----+-----+-----+--------------------+-----+ |2012|Tesla| S| No comment| | |1997| Ford| E350|Go get one now th...| | |2015|Chevy| Volt| null| null| This is my Dataframe I'm trying to change Tesla in make column to S. I have just start learning Scala. Really appreciate your help!
– Tong
Sep 2 '15 at 17:37
+----+-----+-----+--------------------+-----+ |year| make|model| comment|blank| +----+-----+-----+--------------------+-----+ |2012|Tesla| S| No comment| | |1997| Ford| E350|Go get one now th...| | |2015|Chevy| Volt| null| null| This is my Dataframe I'm trying to change Tesla in make column to S. I have just start learning Scala. Really appreciate your help!
– Tong
Sep 2 '15 at 17:37
|
show 7 more comments
4 Answers
4
active
oldest
votes
Note:
As mentionned by Olivier Girardot, this answer is not optimized and the withColumn
solution is the one to use (Azeroth2b answer)
Can not delete this answer as it has been accepted
Here is my take on this one:
val rdd = sc.parallelize(
List( (2012,"Tesla","S"), (1997,"Ford","E350"), (2015,"Chevy","Volt"))
)
val sqlContext = new SQLContext(sc)
// this is used to implicitly convert an RDD to a DataFrame.
import sqlContext.implicits._
val dataframe = rdd.toDF()
dataframe.foreach(println)
dataframe.map(row => {
val row1 = row.getAs[String](1)
val make = if (row1.toLowerCase == "tesla") "S" else row1
Row(row(0),make,row(2))
}).collect().foreach(println)
//[2012,S,S]
//[1997,Ford,E350]
//[2015,Chevy,Volt]
You can actually use directly map
on the DataFrame
.
So you basically check the column 1 for the String tesla
.
If it's tesla
, use the value S
for make
else you the current value of column 1
Then build a tuple with all data from the row using the indexes (zero based) (Row(row(0),make,row(2))
) in my example)
There is probably a better way to do it. I am not that familiar yet with the Spark umbrella
Thanks soooo much!!
– Tong
Sep 2 '15 at 19:15
Thanks for your help. I have one more question. Your solution can printout the strings I want. However what if I want to change the value within the dataframe itself? When I do dataframe.show() the the value is still tesla
– Tong
Sep 2 '15 at 20:38
2
Dataframe are based on RDDs which are immutable. Tryval newDF = dataframe.map(row => { val row1 = row.getAs[String](1) val make = if (row1.toLowerCase == "tesla") "S" else row1 Row(row(0),make,row(2)) })
that should construct new DataFrame.
– ccheneson
Sep 2 '15 at 21:04
Thanks! It works! Feels so good! I set a new data frame and add a new column.
– Tong
Sep 3 '15 at 1:07
4
this will break spark's catalyst optimisations, and therefore is not the best practice, the withColumn approach is best suited for this.
– Olivier Girardot
Jan 5 '17 at 19:29
|
show 8 more comments
Spark 1.6.2, Java code (sorry), this will change every instance of Tesla to S for the entire dataframe without passing through an RDD:
dataframe.withColumn("make", when(col("make").equalTo("Tesla"), "S")
.otherwise(col("make")
);
Edited to add @marshall245 "otherwise" to ensure non-Tesla columns aren't converted to NULL.
hey man, what if i want to change a column with a value from another dataframe column (both dataframes have an id column) i can't seem to make it in java spark.
– Vasile Surdu
Mar 9 '17 at 16:05
This is probably better served with a select .. join on id, given that, sounds like a new question. Hope that gets you started.
– Azeroth2b
Mar 11 '17 at 18:02
yeah i just used the 'case' in select :) worked
– Vasile Surdu
Mar 14 '17 at 12:03
Why to edit this one and make it the same answer as @marshall245?
– Eduardo Reis
Nov 18 '18 at 5:45
Where can I find the doc for withColumn function? I actually have more conditions and more columns to change the values of. I got this docs.azuredatabricks.net/spark/1.6/sparkr/functions/… but this is not helping. Can anyone help?
– GadaaDhaariGeek
Dec 27 '18 at 6:22
|
show 1 more comment
Building off of the solution from @Azeroth2b. If you want to replace only a couple of items and leave the rest unchanged. Do the following. Without using the otherwise(...) method, the remainder of the column becomes null.
import org.apache.spark.sql.functions._
val newsdf = sdf.withColumn("make", when(col("make") === "Tesla", "S")
.otherwise(col("make"))
);
Old DataFrame
+-----+-----+
| make|model|
+-----+-----+
|Tesla| S|
| Ford| E350|
|Chevy| Volt|
+-----+-----+
New Datarame
+-----+-----+
| make|model|
+-----+-----+
| S| S|
| Ford| E350|
|Chevy| Volt|
+-----+-----+
add a comment |
This can be achieved in dataframes with user defined functions (udf).
import org.apache.spark.sql.functions._
val sqlcont = new org.apache.spark.sql.SQLContext(sc)
val df1 = sqlcont.jsonRDD(sc.parallelize(Array(
"""{"year":2012, "make": "Tesla", "model": "S", "comment": "No Comment", "blank": ""}""",
"""{"year":1997, "make": "Ford", "model": "E350", "comment": "Get one", "blank": ""}""",
"""{"year":2015, "make": "Chevy", "model": "Volt", "comment": "", "blank": ""}"""
)))
val makeSIfTesla = udf {(make: String) =>
if(make == "Tesla") "S" else make
}
df1.withColumn("make", makeSIfTesla(df1("make"))).show
1
I guess this will improve the performance because you are not converting df to rdd and adding a new column.
– Nandakishore
Oct 31 '16 at 19:41
This does not result in duplicatemake
columns?
– javadba
Jul 24 '18 at 0:01
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f32357774%2fscala-how-can-i-replace-value-in-dataframes-using-scala%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
Note:
As mentionned by Olivier Girardot, this answer is not optimized and the withColumn
solution is the one to use (Azeroth2b answer)
Can not delete this answer as it has been accepted
Here is my take on this one:
val rdd = sc.parallelize(
List( (2012,"Tesla","S"), (1997,"Ford","E350"), (2015,"Chevy","Volt"))
)
val sqlContext = new SQLContext(sc)
// this is used to implicitly convert an RDD to a DataFrame.
import sqlContext.implicits._
val dataframe = rdd.toDF()
dataframe.foreach(println)
dataframe.map(row => {
val row1 = row.getAs[String](1)
val make = if (row1.toLowerCase == "tesla") "S" else row1
Row(row(0),make,row(2))
}).collect().foreach(println)
//[2012,S,S]
//[1997,Ford,E350]
//[2015,Chevy,Volt]
You can actually use directly map
on the DataFrame
.
So you basically check the column 1 for the String tesla
.
If it's tesla
, use the value S
for make
else you the current value of column 1
Then build a tuple with all data from the row using the indexes (zero based) (Row(row(0),make,row(2))
) in my example)
There is probably a better way to do it. I am not that familiar yet with the Spark umbrella
Thanks soooo much!!
– Tong
Sep 2 '15 at 19:15
Thanks for your help. I have one more question. Your solution can printout the strings I want. However what if I want to change the value within the dataframe itself? When I do dataframe.show() the the value is still tesla
– Tong
Sep 2 '15 at 20:38
2
Dataframe are based on RDDs which are immutable. Tryval newDF = dataframe.map(row => { val row1 = row.getAs[String](1) val make = if (row1.toLowerCase == "tesla") "S" else row1 Row(row(0),make,row(2)) })
that should construct new DataFrame.
– ccheneson
Sep 2 '15 at 21:04
Thanks! It works! Feels so good! I set a new data frame and add a new column.
– Tong
Sep 3 '15 at 1:07
4
this will break spark's catalyst optimisations, and therefore is not the best practice, the withColumn approach is best suited for this.
– Olivier Girardot
Jan 5 '17 at 19:29
|
show 8 more comments
Note:
As mentionned by Olivier Girardot, this answer is not optimized and the withColumn
solution is the one to use (Azeroth2b answer)
Can not delete this answer as it has been accepted
Here is my take on this one:
val rdd = sc.parallelize(
List( (2012,"Tesla","S"), (1997,"Ford","E350"), (2015,"Chevy","Volt"))
)
val sqlContext = new SQLContext(sc)
// this is used to implicitly convert an RDD to a DataFrame.
import sqlContext.implicits._
val dataframe = rdd.toDF()
dataframe.foreach(println)
dataframe.map(row => {
val row1 = row.getAs[String](1)
val make = if (row1.toLowerCase == "tesla") "S" else row1
Row(row(0),make,row(2))
}).collect().foreach(println)
//[2012,S,S]
//[1997,Ford,E350]
//[2015,Chevy,Volt]
You can actually use directly map
on the DataFrame
.
So you basically check the column 1 for the String tesla
.
If it's tesla
, use the value S
for make
else you the current value of column 1
Then build a tuple with all data from the row using the indexes (zero based) (Row(row(0),make,row(2))
) in my example)
There is probably a better way to do it. I am not that familiar yet with the Spark umbrella
Thanks soooo much!!
– Tong
Sep 2 '15 at 19:15
Thanks for your help. I have one more question. Your solution can printout the strings I want. However what if I want to change the value within the dataframe itself? When I do dataframe.show() the the value is still tesla
– Tong
Sep 2 '15 at 20:38
2
Dataframe are based on RDDs which are immutable. Tryval newDF = dataframe.map(row => { val row1 = row.getAs[String](1) val make = if (row1.toLowerCase == "tesla") "S" else row1 Row(row(0),make,row(2)) })
that should construct new DataFrame.
– ccheneson
Sep 2 '15 at 21:04
Thanks! It works! Feels so good! I set a new data frame and add a new column.
– Tong
Sep 3 '15 at 1:07
4
this will break spark's catalyst optimisations, and therefore is not the best practice, the withColumn approach is best suited for this.
– Olivier Girardot
Jan 5 '17 at 19:29
|
show 8 more comments
Note:
As mentionned by Olivier Girardot, this answer is not optimized and the withColumn
solution is the one to use (Azeroth2b answer)
Can not delete this answer as it has been accepted
Here is my take on this one:
val rdd = sc.parallelize(
List( (2012,"Tesla","S"), (1997,"Ford","E350"), (2015,"Chevy","Volt"))
)
val sqlContext = new SQLContext(sc)
// this is used to implicitly convert an RDD to a DataFrame.
import sqlContext.implicits._
val dataframe = rdd.toDF()
dataframe.foreach(println)
dataframe.map(row => {
val row1 = row.getAs[String](1)
val make = if (row1.toLowerCase == "tesla") "S" else row1
Row(row(0),make,row(2))
}).collect().foreach(println)
//[2012,S,S]
//[1997,Ford,E350]
//[2015,Chevy,Volt]
You can actually use directly map
on the DataFrame
.
So you basically check the column 1 for the String tesla
.
If it's tesla
, use the value S
for make
else you the current value of column 1
Then build a tuple with all data from the row using the indexes (zero based) (Row(row(0),make,row(2))
) in my example)
There is probably a better way to do it. I am not that familiar yet with the Spark umbrella
Note:
As mentionned by Olivier Girardot, this answer is not optimized and the withColumn
solution is the one to use (Azeroth2b answer)
Can not delete this answer as it has been accepted
Here is my take on this one:
val rdd = sc.parallelize(
List( (2012,"Tesla","S"), (1997,"Ford","E350"), (2015,"Chevy","Volt"))
)
val sqlContext = new SQLContext(sc)
// this is used to implicitly convert an RDD to a DataFrame.
import sqlContext.implicits._
val dataframe = rdd.toDF()
dataframe.foreach(println)
dataframe.map(row => {
val row1 = row.getAs[String](1)
val make = if (row1.toLowerCase == "tesla") "S" else row1
Row(row(0),make,row(2))
}).collect().foreach(println)
//[2012,S,S]
//[1997,Ford,E350]
//[2015,Chevy,Volt]
You can actually use directly map
on the DataFrame
.
So you basically check the column 1 for the String tesla
.
If it's tesla
, use the value S
for make
else you the current value of column 1
Then build a tuple with all data from the row using the indexes (zero based) (Row(row(0),make,row(2))
) in my example)
There is probably a better way to do it. I am not that familiar yet with the Spark umbrella
edited Mar 10 '17 at 13:33
answered Sep 2 '15 at 18:54
cchenesonccheneson
42.5k85365
42.5k85365
Thanks soooo much!!
– Tong
Sep 2 '15 at 19:15
Thanks for your help. I have one more question. Your solution can printout the strings I want. However what if I want to change the value within the dataframe itself? When I do dataframe.show() the the value is still tesla
– Tong
Sep 2 '15 at 20:38
2
Dataframe are based on RDDs which are immutable. Tryval newDF = dataframe.map(row => { val row1 = row.getAs[String](1) val make = if (row1.toLowerCase == "tesla") "S" else row1 Row(row(0),make,row(2)) })
that should construct new DataFrame.
– ccheneson
Sep 2 '15 at 21:04
Thanks! It works! Feels so good! I set a new data frame and add a new column.
– Tong
Sep 3 '15 at 1:07
4
this will break spark's catalyst optimisations, and therefore is not the best practice, the withColumn approach is best suited for this.
– Olivier Girardot
Jan 5 '17 at 19:29
|
show 8 more comments
Thanks soooo much!!
– Tong
Sep 2 '15 at 19:15
Thanks for your help. I have one more question. Your solution can printout the strings I want. However what if I want to change the value within the dataframe itself? When I do dataframe.show() the the value is still tesla
– Tong
Sep 2 '15 at 20:38
2
Dataframe are based on RDDs which are immutable. Tryval newDF = dataframe.map(row => { val row1 = row.getAs[String](1) val make = if (row1.toLowerCase == "tesla") "S" else row1 Row(row(0),make,row(2)) })
that should construct new DataFrame.
– ccheneson
Sep 2 '15 at 21:04
Thanks! It works! Feels so good! I set a new data frame and add a new column.
– Tong
Sep 3 '15 at 1:07
4
this will break spark's catalyst optimisations, and therefore is not the best practice, the withColumn approach is best suited for this.
– Olivier Girardot
Jan 5 '17 at 19:29
Thanks soooo much!!
– Tong
Sep 2 '15 at 19:15
Thanks soooo much!!
– Tong
Sep 2 '15 at 19:15
Thanks for your help. I have one more question. Your solution can printout the strings I want. However what if I want to change the value within the dataframe itself? When I do dataframe.show() the the value is still tesla
– Tong
Sep 2 '15 at 20:38
Thanks for your help. I have one more question. Your solution can printout the strings I want. However what if I want to change the value within the dataframe itself? When I do dataframe.show() the the value is still tesla
– Tong
Sep 2 '15 at 20:38
2
2
Dataframe are based on RDDs which are immutable. Try
val newDF = dataframe.map(row => { val row1 = row.getAs[String](1) val make = if (row1.toLowerCase == "tesla") "S" else row1 Row(row(0),make,row(2)) })
that should construct new DataFrame.– ccheneson
Sep 2 '15 at 21:04
Dataframe are based on RDDs which are immutable. Try
val newDF = dataframe.map(row => { val row1 = row.getAs[String](1) val make = if (row1.toLowerCase == "tesla") "S" else row1 Row(row(0),make,row(2)) })
that should construct new DataFrame.– ccheneson
Sep 2 '15 at 21:04
Thanks! It works! Feels so good! I set a new data frame and add a new column.
– Tong
Sep 3 '15 at 1:07
Thanks! It works! Feels so good! I set a new data frame and add a new column.
– Tong
Sep 3 '15 at 1:07
4
4
this will break spark's catalyst optimisations, and therefore is not the best practice, the withColumn approach is best suited for this.
– Olivier Girardot
Jan 5 '17 at 19:29
this will break spark's catalyst optimisations, and therefore is not the best practice, the withColumn approach is best suited for this.
– Olivier Girardot
Jan 5 '17 at 19:29
|
show 8 more comments
Spark 1.6.2, Java code (sorry), this will change every instance of Tesla to S for the entire dataframe without passing through an RDD:
dataframe.withColumn("make", when(col("make").equalTo("Tesla"), "S")
.otherwise(col("make")
);
Edited to add @marshall245 "otherwise" to ensure non-Tesla columns aren't converted to NULL.
hey man, what if i want to change a column with a value from another dataframe column (both dataframes have an id column) i can't seem to make it in java spark.
– Vasile Surdu
Mar 9 '17 at 16:05
This is probably better served with a select .. join on id, given that, sounds like a new question. Hope that gets you started.
– Azeroth2b
Mar 11 '17 at 18:02
yeah i just used the 'case' in select :) worked
– Vasile Surdu
Mar 14 '17 at 12:03
Why to edit this one and make it the same answer as @marshall245?
– Eduardo Reis
Nov 18 '18 at 5:45
Where can I find the doc for withColumn function? I actually have more conditions and more columns to change the values of. I got this docs.azuredatabricks.net/spark/1.6/sparkr/functions/… but this is not helping. Can anyone help?
– GadaaDhaariGeek
Dec 27 '18 at 6:22
|
show 1 more comment
Spark 1.6.2, Java code (sorry), this will change every instance of Tesla to S for the entire dataframe without passing through an RDD:
dataframe.withColumn("make", when(col("make").equalTo("Tesla"), "S")
.otherwise(col("make")
);
Edited to add @marshall245 "otherwise" to ensure non-Tesla columns aren't converted to NULL.
hey man, what if i want to change a column with a value from another dataframe column (both dataframes have an id column) i can't seem to make it in java spark.
– Vasile Surdu
Mar 9 '17 at 16:05
This is probably better served with a select .. join on id, given that, sounds like a new question. Hope that gets you started.
– Azeroth2b
Mar 11 '17 at 18:02
yeah i just used the 'case' in select :) worked
– Vasile Surdu
Mar 14 '17 at 12:03
Why to edit this one and make it the same answer as @marshall245?
– Eduardo Reis
Nov 18 '18 at 5:45
Where can I find the doc for withColumn function? I actually have more conditions and more columns to change the values of. I got this docs.azuredatabricks.net/spark/1.6/sparkr/functions/… but this is not helping. Can anyone help?
– GadaaDhaariGeek
Dec 27 '18 at 6:22
|
show 1 more comment
Spark 1.6.2, Java code (sorry), this will change every instance of Tesla to S for the entire dataframe without passing through an RDD:
dataframe.withColumn("make", when(col("make").equalTo("Tesla"), "S")
.otherwise(col("make")
);
Edited to add @marshall245 "otherwise" to ensure non-Tesla columns aren't converted to NULL.
Spark 1.6.2, Java code (sorry), this will change every instance of Tesla to S for the entire dataframe without passing through an RDD:
dataframe.withColumn("make", when(col("make").equalTo("Tesla"), "S")
.otherwise(col("make")
);
Edited to add @marshall245 "otherwise" to ensure non-Tesla columns aren't converted to NULL.
edited Nov 16 '18 at 13:32
ChrisOdney
1,79182538
1,79182538
answered Oct 25 '16 at 16:16
Azeroth2bAzeroth2b
437146
437146
hey man, what if i want to change a column with a value from another dataframe column (both dataframes have an id column) i can't seem to make it in java spark.
– Vasile Surdu
Mar 9 '17 at 16:05
This is probably better served with a select .. join on id, given that, sounds like a new question. Hope that gets you started.
– Azeroth2b
Mar 11 '17 at 18:02
yeah i just used the 'case' in select :) worked
– Vasile Surdu
Mar 14 '17 at 12:03
Why to edit this one and make it the same answer as @marshall245?
– Eduardo Reis
Nov 18 '18 at 5:45
Where can I find the doc for withColumn function? I actually have more conditions and more columns to change the values of. I got this docs.azuredatabricks.net/spark/1.6/sparkr/functions/… but this is not helping. Can anyone help?
– GadaaDhaariGeek
Dec 27 '18 at 6:22
|
show 1 more comment
hey man, what if i want to change a column with a value from another dataframe column (both dataframes have an id column) i can't seem to make it in java spark.
– Vasile Surdu
Mar 9 '17 at 16:05
This is probably better served with a select .. join on id, given that, sounds like a new question. Hope that gets you started.
– Azeroth2b
Mar 11 '17 at 18:02
yeah i just used the 'case' in select :) worked
– Vasile Surdu
Mar 14 '17 at 12:03
Why to edit this one and make it the same answer as @marshall245?
– Eduardo Reis
Nov 18 '18 at 5:45
Where can I find the doc for withColumn function? I actually have more conditions and more columns to change the values of. I got this docs.azuredatabricks.net/spark/1.6/sparkr/functions/… but this is not helping. Can anyone help?
– GadaaDhaariGeek
Dec 27 '18 at 6:22
hey man, what if i want to change a column with a value from another dataframe column (both dataframes have an id column) i can't seem to make it in java spark.
– Vasile Surdu
Mar 9 '17 at 16:05
hey man, what if i want to change a column with a value from another dataframe column (both dataframes have an id column) i can't seem to make it in java spark.
– Vasile Surdu
Mar 9 '17 at 16:05
This is probably better served with a select .. join on id, given that, sounds like a new question. Hope that gets you started.
– Azeroth2b
Mar 11 '17 at 18:02
This is probably better served with a select .. join on id, given that, sounds like a new question. Hope that gets you started.
– Azeroth2b
Mar 11 '17 at 18:02
yeah i just used the 'case' in select :) worked
– Vasile Surdu
Mar 14 '17 at 12:03
yeah i just used the 'case' in select :) worked
– Vasile Surdu
Mar 14 '17 at 12:03
Why to edit this one and make it the same answer as @marshall245?
– Eduardo Reis
Nov 18 '18 at 5:45
Why to edit this one and make it the same answer as @marshall245?
– Eduardo Reis
Nov 18 '18 at 5:45
Where can I find the doc for withColumn function? I actually have more conditions and more columns to change the values of. I got this docs.azuredatabricks.net/spark/1.6/sparkr/functions/… but this is not helping. Can anyone help?
– GadaaDhaariGeek
Dec 27 '18 at 6:22
Where can I find the doc for withColumn function? I actually have more conditions and more columns to change the values of. I got this docs.azuredatabricks.net/spark/1.6/sparkr/functions/… but this is not helping. Can anyone help?
– GadaaDhaariGeek
Dec 27 '18 at 6:22
|
show 1 more comment
Building off of the solution from @Azeroth2b. If you want to replace only a couple of items and leave the rest unchanged. Do the following. Without using the otherwise(...) method, the remainder of the column becomes null.
import org.apache.spark.sql.functions._
val newsdf = sdf.withColumn("make", when(col("make") === "Tesla", "S")
.otherwise(col("make"))
);
Old DataFrame
+-----+-----+
| make|model|
+-----+-----+
|Tesla| S|
| Ford| E350|
|Chevy| Volt|
+-----+-----+
New Datarame
+-----+-----+
| make|model|
+-----+-----+
| S| S|
| Ford| E350|
|Chevy| Volt|
+-----+-----+
add a comment |
Building off of the solution from @Azeroth2b. If you want to replace only a couple of items and leave the rest unchanged. Do the following. Without using the otherwise(...) method, the remainder of the column becomes null.
import org.apache.spark.sql.functions._
val newsdf = sdf.withColumn("make", when(col("make") === "Tesla", "S")
.otherwise(col("make"))
);
Old DataFrame
+-----+-----+
| make|model|
+-----+-----+
|Tesla| S|
| Ford| E350|
|Chevy| Volt|
+-----+-----+
New Datarame
+-----+-----+
| make|model|
+-----+-----+
| S| S|
| Ford| E350|
|Chevy| Volt|
+-----+-----+
add a comment |
Building off of the solution from @Azeroth2b. If you want to replace only a couple of items and leave the rest unchanged. Do the following. Without using the otherwise(...) method, the remainder of the column becomes null.
import org.apache.spark.sql.functions._
val newsdf = sdf.withColumn("make", when(col("make") === "Tesla", "S")
.otherwise(col("make"))
);
Old DataFrame
+-----+-----+
| make|model|
+-----+-----+
|Tesla| S|
| Ford| E350|
|Chevy| Volt|
+-----+-----+
New Datarame
+-----+-----+
| make|model|
+-----+-----+
| S| S|
| Ford| E350|
|Chevy| Volt|
+-----+-----+
Building off of the solution from @Azeroth2b. If you want to replace only a couple of items and leave the rest unchanged. Do the following. Without using the otherwise(...) method, the remainder of the column becomes null.
import org.apache.spark.sql.functions._
val newsdf = sdf.withColumn("make", when(col("make") === "Tesla", "S")
.otherwise(col("make"))
);
Old DataFrame
+-----+-----+
| make|model|
+-----+-----+
|Tesla| S|
| Ford| E350|
|Chevy| Volt|
+-----+-----+
New Datarame
+-----+-----+
| make|model|
+-----+-----+
| S| S|
| Ford| E350|
|Chevy| Volt|
+-----+-----+
edited Nov 16 '18 at 14:46
ChrisOdney
1,79182538
1,79182538
answered Apr 7 '17 at 14:56
marshall245marshall245
21123
21123
add a comment |
add a comment |
This can be achieved in dataframes with user defined functions (udf).
import org.apache.spark.sql.functions._
val sqlcont = new org.apache.spark.sql.SQLContext(sc)
val df1 = sqlcont.jsonRDD(sc.parallelize(Array(
"""{"year":2012, "make": "Tesla", "model": "S", "comment": "No Comment", "blank": ""}""",
"""{"year":1997, "make": "Ford", "model": "E350", "comment": "Get one", "blank": ""}""",
"""{"year":2015, "make": "Chevy", "model": "Volt", "comment": "", "blank": ""}"""
)))
val makeSIfTesla = udf {(make: String) =>
if(make == "Tesla") "S" else make
}
df1.withColumn("make", makeSIfTesla(df1("make"))).show
1
I guess this will improve the performance because you are not converting df to rdd and adding a new column.
– Nandakishore
Oct 31 '16 at 19:41
This does not result in duplicatemake
columns?
– javadba
Jul 24 '18 at 0:01
add a comment |
This can be achieved in dataframes with user defined functions (udf).
import org.apache.spark.sql.functions._
val sqlcont = new org.apache.spark.sql.SQLContext(sc)
val df1 = sqlcont.jsonRDD(sc.parallelize(Array(
"""{"year":2012, "make": "Tesla", "model": "S", "comment": "No Comment", "blank": ""}""",
"""{"year":1997, "make": "Ford", "model": "E350", "comment": "Get one", "blank": ""}""",
"""{"year":2015, "make": "Chevy", "model": "Volt", "comment": "", "blank": ""}"""
)))
val makeSIfTesla = udf {(make: String) =>
if(make == "Tesla") "S" else make
}
df1.withColumn("make", makeSIfTesla(df1("make"))).show
1
I guess this will improve the performance because you are not converting df to rdd and adding a new column.
– Nandakishore
Oct 31 '16 at 19:41
This does not result in duplicatemake
columns?
– javadba
Jul 24 '18 at 0:01
add a comment |
This can be achieved in dataframes with user defined functions (udf).
import org.apache.spark.sql.functions._
val sqlcont = new org.apache.spark.sql.SQLContext(sc)
val df1 = sqlcont.jsonRDD(sc.parallelize(Array(
"""{"year":2012, "make": "Tesla", "model": "S", "comment": "No Comment", "blank": ""}""",
"""{"year":1997, "make": "Ford", "model": "E350", "comment": "Get one", "blank": ""}""",
"""{"year":2015, "make": "Chevy", "model": "Volt", "comment": "", "blank": ""}"""
)))
val makeSIfTesla = udf {(make: String) =>
if(make == "Tesla") "S" else make
}
df1.withColumn("make", makeSIfTesla(df1("make"))).show
This can be achieved in dataframes with user defined functions (udf).
import org.apache.spark.sql.functions._
val sqlcont = new org.apache.spark.sql.SQLContext(sc)
val df1 = sqlcont.jsonRDD(sc.parallelize(Array(
"""{"year":2012, "make": "Tesla", "model": "S", "comment": "No Comment", "blank": ""}""",
"""{"year":1997, "make": "Ford", "model": "E350", "comment": "Get one", "blank": ""}""",
"""{"year":2015, "make": "Chevy", "model": "Volt", "comment": "", "blank": ""}"""
)))
val makeSIfTesla = udf {(make: String) =>
if(make == "Tesla") "S" else make
}
df1.withColumn("make", makeSIfTesla(df1("make"))).show
answered Sep 17 '15 at 15:15
Al MAl M
487410
487410
1
I guess this will improve the performance because you are not converting df to rdd and adding a new column.
– Nandakishore
Oct 31 '16 at 19:41
This does not result in duplicatemake
columns?
– javadba
Jul 24 '18 at 0:01
add a comment |
1
I guess this will improve the performance because you are not converting df to rdd and adding a new column.
– Nandakishore
Oct 31 '16 at 19:41
This does not result in duplicatemake
columns?
– javadba
Jul 24 '18 at 0:01
1
1
I guess this will improve the performance because you are not converting df to rdd and adding a new column.
– Nandakishore
Oct 31 '16 at 19:41
I guess this will improve the performance because you are not converting df to rdd and adding a new column.
– Nandakishore
Oct 31 '16 at 19:41
This does not result in duplicate
make
columns?– javadba
Jul 24 '18 at 0:01
This does not result in duplicate
make
columns?– javadba
Jul 24 '18 at 0:01
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f32357774%2fscala-how-can-i-replace-value-in-dataframes-using-scala%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
by converting to RDD with
.rdd
and usingmap
to change to 0 if 0.2 ?– ccheneson
Sep 2 '15 at 15:58
What is the map command for change to 0 if 0.2?
– Tong
Sep 2 '15 at 16:30
And how can i focus on a specific column?
– Tong
Sep 2 '15 at 16:36
Give us an example of your data, what you have tried so far.
– ccheneson
Sep 2 '15 at 17:17
+----+-----+-----+--------------------+-----+ |year| make|model| comment|blank| +----+-----+-----+--------------------+-----+ |2012|Tesla| S| No comment| | |1997| Ford| E350|Go get one now th...| | |2015|Chevy| Volt| null| null| This is my Dataframe I'm trying to change Tesla in make column to S. I have just start learning Scala. Really appreciate your help!
– Tong
Sep 2 '15 at 17:37