Scala: How can I replace value in Dataframes using scala

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

For example I want to replace all numbers equal to 0.2 in a column to 0. How can I do that in Scala? Thanks

Edit:

|year| make|model| comment            |blank|

|2012|Tesla| S   | No comment         |     | 

|1997| Ford| E350|Go get one now th...|     | 

|2015|Chevy| Volt| null               | null|

This is my Dataframe I'm trying to change Tesla in make column to S

edited Aug 25 '17 at 13:27

Javier Montón

471512

asked Sep 2 '15 at 15:55

Tong

199249

by converting to RDD with .rdd and using map to change to 0 if 0.2 ?

– ccheneson
Sep 2 '15 at 15:58

What is the map command for change to 0 if 0.2?

– Tong
Sep 2 '15 at 16:30

And how can i focus on a specific column?

– Tong
Sep 2 '15 at 16:36

Give us an example of your data, what you have tried so far.

– ccheneson
Sep 2 '15 at 17:17

+----+-----+-----+--------------------+-----+ |year| make|model| comment|blank| +----+-----+-----+--------------------+-----+ |2012|Tesla| S| No comment| | |1997| Ford| E350|Go get one now th...| | |2015|Chevy| Volt| null| null| This is my Dataframe I'm trying to change Tesla in make column to S. I have just start learning Scala. Really appreciate your help!

– Tong
Sep 2 '15 at 17:37

|
show 7 more comments

For example I want to replace all numbers equal to 0.2 in a column to 0. How can I do that in Scala? Thanks

Edit:

|year| make|model| comment            |blank|

|2012|Tesla| S   | No comment         |     | 

|1997| Ford| E350|Go get one now th...|     | 

|2015|Chevy| Volt| null               | null|

This is my Dataframe I'm trying to change Tesla in make column to S

edited Aug 25 '17 at 13:27

Javier Montón

471512

asked Sep 2 '15 at 15:55

Tong

199249

by converting to RDD with .rdd and using map to change to 0 if 0.2 ?

– ccheneson
Sep 2 '15 at 15:58

What is the map command for change to 0 if 0.2?

– Tong
Sep 2 '15 at 16:30

And how can i focus on a specific column?

– Tong
Sep 2 '15 at 16:36

Give us an example of your data, what you have tried so far.

– ccheneson
Sep 2 '15 at 17:17

+----+-----+-----+--------------------+-----+ |year| make|model| comment|blank| +----+-----+-----+--------------------+-----+ |2012|Tesla| S| No comment| | |1997| Ford| E350|Go get one now th...| | |2015|Chevy| Volt| null| null| This is my Dataframe I'm trying to change Tesla in make column to S. I have just start learning Scala. Really appreciate your help!

– Tong
Sep 2 '15 at 17:37

|
show 7 more comments

For example I want to replace all numbers equal to 0.2 in a column to 0. How can I do that in Scala? Thanks

Edit:

|year| make|model| comment            |blank|

|2012|Tesla| S   | No comment         |     | 

|1997| Ford| E350|Go get one now th...|     | 

|2015|Chevy| Volt| null               | null|

This is my Dataframe I'm trying to change Tesla in make column to S

edited Aug 25 '17 at 13:27

Javier Montón

471512

asked Sep 2 '15 at 15:55

Tong

199249

For example I want to replace all numbers equal to 0.2 in a column to 0. How can I do that in Scala? Thanks

Edit:

|year| make|model| comment            |blank|

|2012|Tesla| S   | No comment         |     | 

|1997| Ford| E350|Go get one now th...|     | 

|2015|Chevy| Volt| null               | null|

This is my Dataframe I'm trying to change Tesla in make column to S

scala apache-spark dataframe

edited Aug 25 '17 at 13:27

Javier Montón

471512

asked Sep 2 '15 at 15:55

Tong

199249

edited Aug 25 '17 at 13:27

Javier Montón

471512

asked Sep 2 '15 at 15:55

Tong

199249

edited Aug 25 '17 at 13:27

Javier Montón

471512

edited Aug 25 '17 at 13:27

Javier Montón

471512

edited Aug 25 '17 at 13:27

Javier Montón

471512

asked Sep 2 '15 at 15:55

Tong

199249

asked Sep 2 '15 at 15:55

Tong

199249

asked Sep 2 '15 at 15:55

Tong

199249

by converting to RDD with .rdd and using map to change to 0 if 0.2 ?

– ccheneson
Sep 2 '15 at 15:58

What is the map command for change to 0 if 0.2?

– Tong
Sep 2 '15 at 16:30

And how can i focus on a specific column?

– Tong
Sep 2 '15 at 16:36

Give us an example of your data, what you have tried so far.

– ccheneson
Sep 2 '15 at 17:17

+----+-----+-----+--------------------+-----+ |year| make|model| comment|blank| +----+-----+-----+--------------------+-----+ |2012|Tesla| S| No comment| | |1997| Ford| E350|Go get one now th...| | |2015|Chevy| Volt| null| null| This is my Dataframe I'm trying to change Tesla in make column to S. I have just start learning Scala. Really appreciate your help!

– Tong
Sep 2 '15 at 17:37

|
show 7 more comments

by converting to RDD with .rdd and using map to change to 0 if 0.2 ?

– ccheneson
Sep 2 '15 at 15:58

What is the map command for change to 0 if 0.2?

– Tong
Sep 2 '15 at 16:30

And how can i focus on a specific column?

– Tong
Sep 2 '15 at 16:36

Give us an example of your data, what you have tried so far.

– ccheneson
Sep 2 '15 at 17:17

+----+-----+-----+--------------------+-----+ |year| make|model| comment|blank| +----+-----+-----+--------------------+-----+ |2012|Tesla| S| No comment| | |1997| Ford| E350|Go get one now th...| | |2015|Chevy| Volt| null| null| This is my Dataframe I'm trying to change Tesla in make column to S. I have just start learning Scala. Really appreciate your help!

– Tong
Sep 2 '15 at 17:37

by converting to RDD with .rdd and using map to change to 0 if 0.2 ?

– ccheneson
Sep 2 '15 at 15:58

What is the map command for change to 0 if 0.2?

– Tong
Sep 2 '15 at 16:30

And how can i focus on a specific column?

– Tong
Sep 2 '15 at 16:36

Give us an example of your data, what you have tried so far.

– ccheneson
Sep 2 '15 at 17:17

|
show 7 more comments

4 Answers
4

active

oldest

votes

Note:
As mentionned by Olivier Girardot, this answer is not optimized and the withColumn solution is the one to use (Azeroth2b answer)

Can not delete this answer as it has been accepted

Here is my take on this one:

 val rdd = sc.parallelize(

      List( (2012,"Tesla","S"), (1997,"Ford","E350"), (2015,"Chevy","Volt"))

  )

  val sqlContext = new SQLContext(sc)



  // this is used to implicitly convert an RDD to a DataFrame.

  import sqlContext.implicits._



  val dataframe = rdd.toDF()



  dataframe.foreach(println)



 dataframe.map(row => {

    val row1 = row.getAs[String](1)

    val make = if (row1.toLowerCase == "tesla") "S" else row1

    Row(row(0),make,row(2))

  }).collect().foreach(println)



//[2012,S,S]

//[1997,Ford,E350]

//[2015,Chevy,Volt]

You can actually use directly map on the DataFrame.

So you basically check the column 1 for the String tesla.
If it's tesla, use the value S for make else you the current value of column 1

Then build a tuple with all data from the row using the indexes (zero based) (Row(row(0),make,row(2))) in my example)

There is probably a better way to do it. I am not that familiar yet with the Spark umbrella

edited Mar 10 '17 at 13:33

answered Sep 2 '15 at 18:54

ccheneson

42.5k85365

Thanks soooo much!!

– Tong
Sep 2 '15 at 19:15

Thanks for your help. I have one more question. Your solution can printout the strings I want. However what if I want to change the value within the dataframe itself? When I do dataframe.show() the the value is still tesla

– Tong
Sep 2 '15 at 20:38

2

Dataframe are based on RDDs which are immutable. Try val newDF = dataframe.map(row => { val row1 = row.getAs[String](1) val make = if (row1.toLowerCase == "tesla") "S" else row1 Row(row(0),make,row(2)) }) that should construct new DataFrame.

– ccheneson
Sep 2 '15 at 21:04

Thanks! It works! Feels so good! I set a new data frame and add a new column.

– Tong
Sep 3 '15 at 1:07

4

this will break spark's catalyst optimisations, and therefore is not the best practice, the withColumn approach is best suited for this.

– Olivier Girardot
Jan 5 '17 at 19:29

|
show 8 more comments

Spark 1.6.2, Java code (sorry), this will change every instance of Tesla to S for the entire dataframe without passing through an RDD:

dataframe.withColumn("make", when(col("make").equalTo("Tesla"), "S")

                             .otherwise(col("make") 

                    );

Edited to add @marshall245 "otherwise" to ensure non-Tesla columns aren't converted to NULL.

edited Nov 16 '18 at 13:32

ChrisOdney

1,79182538

answered Oct 25 '16 at 16:16

Azeroth2b

437146

hey man, what if i want to change a column with a value from another dataframe column (both dataframes have an id column) i can't seem to make it in java spark.

– Vasile Surdu
Mar 9 '17 at 16:05

This is probably better served with a select .. join on id, given that, sounds like a new question. Hope that gets you started.

– Azeroth2b
Mar 11 '17 at 18:02

yeah i just used the 'case' in select :) worked

– Vasile Surdu
Mar 14 '17 at 12:03

Why to edit this one and make it the same answer as @marshall245?

– Eduardo Reis
Nov 18 '18 at 5:45

Where can I find the doc for withColumn function? I actually have more conditions and more columns to change the values of. I got this docs.azuredatabricks.net/spark/1.6/sparkr/functions/… but this is not helping. Can anyone help?

– GadaaDhaariGeek
Dec 27 '18 at 6:22

|
show 1 more comment

Building off of the solution from @Azeroth2b. If you want to replace only a couple of items and leave the rest unchanged. Do the following. Without using the otherwise(...) method, the remainder of the column becomes null.

import org.apache.spark.sql.functions._

val newsdf = sdf.withColumn("make", when(col("make") === "Tesla", "S")

                                   .otherwise(col("make"))

                           );

Old DataFrame

+-----+-----+ 

| make|model| 

+-----+-----+ 

|Tesla|    S| 

| Ford| E350| 

|Chevy| Volt| 

+-----+-----+

New Datarame

+-----+-----+

| make|model|

+-----+-----+

|    S|    S|

| Ford| E350|

|Chevy| Volt|

+-----+-----+

edited Nov 16 '18 at 14:46

ChrisOdney

1,79182538

answered Apr 7 '17 at 14:56

marshall245

21123

add a comment |

This can be achieved in dataframes with user defined functions (udf).

import org.apache.spark.sql.functions._

val sqlcont = new org.apache.spark.sql.SQLContext(sc)

val df1 = sqlcont.jsonRDD(sc.parallelize(Array(

      """{"year":2012, "make": "Tesla", "model": "S", "comment": "No Comment", "blank": ""}""",

      """{"year":1997, "make": "Ford", "model": "E350", "comment": "Get one", "blank": ""}""",

      """{"year":2015, "make": "Chevy", "model": "Volt", "comment": "", "blank": ""}"""

    )))



val makeSIfTesla = udf {(make: String) => 

  if(make == "Tesla") "S" else make

}

df1.withColumn("make", makeSIfTesla(df1("make"))).show

answered Sep 17 '15 at 15:15

Al M

487410

1

I guess this will improve the performance because you are not converting df to rdd and adding a new column.

– Nandakishore
Oct 31 '16 at 19:41

This does not result in duplicate make columns?

– javadba
Jul 24 '18 at 0:01

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f32357774%2fscala-how-can-i-replace-value-in-dataframes-using-scala%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

Note:
As mentionned by Olivier Girardot, this answer is not optimized and the withColumn solution is the one to use (Azeroth2b answer)

Can not delete this answer as it has been accepted

Here is my take on this one:

 val rdd = sc.parallelize(

      List( (2012,"Tesla","S"), (1997,"Ford","E350"), (2015,"Chevy","Volt"))

  )

  val sqlContext = new SQLContext(sc)



  // this is used to implicitly convert an RDD to a DataFrame.

  import sqlContext.implicits._



  val dataframe = rdd.toDF()



  dataframe.foreach(println)



 dataframe.map(row => {

    val row1 = row.getAs[String](1)

    val make = if (row1.toLowerCase == "tesla") "S" else row1

    Row(row(0),make,row(2))

  }).collect().foreach(println)



//[2012,S,S]

//[1997,Ford,E350]

//[2015,Chevy,Volt]

You can actually use directly map on the DataFrame.

So you basically check the column 1 for the String tesla.
If it's tesla, use the value S for make else you the current value of column 1

Then build a tuple with all data from the row using the indexes (zero based) (Row(row(0),make,row(2))) in my example)

There is probably a better way to do it. I am not that familiar yet with the Spark umbrella

edited Mar 10 '17 at 13:33

answered Sep 2 '15 at 18:54

ccheneson

42.5k85365

Thanks soooo much!!

– Tong
Sep 2 '15 at 19:15

Thanks for your help. I have one more question. Your solution can printout the strings I want. However what if I want to change the value within the dataframe itself? When I do dataframe.show() the the value is still tesla

– Tong
Sep 2 '15 at 20:38

2

Dataframe are based on RDDs which are immutable. Try val newDF = dataframe.map(row => { val row1 = row.getAs[String](1) val make = if (row1.toLowerCase == "tesla") "S" else row1 Row(row(0),make,row(2)) }) that should construct new DataFrame.

– ccheneson
Sep 2 '15 at 21:04

Thanks! It works! Feels so good! I set a new data frame and add a new column.

– Tong
Sep 3 '15 at 1:07

4

this will break spark's catalyst optimisations, and therefore is not the best practice, the withColumn approach is best suited for this.

– Olivier Girardot
Jan 5 '17 at 19:29

|
show 8 more comments

Note:
As mentionned by Olivier Girardot, this answer is not optimized and the withColumn solution is the one to use (Azeroth2b answer)

Can not delete this answer as it has been accepted

Here is my take on this one:

 val rdd = sc.parallelize(

      List( (2012,"Tesla","S"), (1997,"Ford","E350"), (2015,"Chevy","Volt"))

  )

  val sqlContext = new SQLContext(sc)



  // this is used to implicitly convert an RDD to a DataFrame.

  import sqlContext.implicits._



  val dataframe = rdd.toDF()



  dataframe.foreach(println)



 dataframe.map(row => {

    val row1 = row.getAs[String](1)

    val make = if (row1.toLowerCase == "tesla") "S" else row1

    Row(row(0),make,row(2))

  }).collect().foreach(println)



//[2012,S,S]

//[1997,Ford,E350]

//[2015,Chevy,Volt]

You can actually use directly map on the DataFrame.

So you basically check the column 1 for the String tesla.
If it's tesla, use the value S for make else you the current value of column 1

Then build a tuple with all data from the row using the indexes (zero based) (Row(row(0),make,row(2))) in my example)

There is probably a better way to do it. I am not that familiar yet with the Spark umbrella

edited Mar 10 '17 at 13:33

answered Sep 2 '15 at 18:54

ccheneson

42.5k85365

Thanks soooo much!!

– Tong
Sep 2 '15 at 19:15

Thanks for your help. I have one more question. Your solution can printout the strings I want. However what if I want to change the value within the dataframe itself? When I do dataframe.show() the the value is still tesla

– Tong
Sep 2 '15 at 20:38

2

Dataframe are based on RDDs which are immutable. Try val newDF = dataframe.map(row => { val row1 = row.getAs[String](1) val make = if (row1.toLowerCase == "tesla") "S" else row1 Row(row(0),make,row(2)) }) that should construct new DataFrame.

– ccheneson
Sep 2 '15 at 21:04

Thanks! It works! Feels so good! I set a new data frame and add a new column.

– Tong
Sep 3 '15 at 1:07

4

this will break spark's catalyst optimisations, and therefore is not the best practice, the withColumn approach is best suited for this.

– Olivier Girardot
Jan 5 '17 at 19:29

|
show 8 more comments

Note:
As mentionned by Olivier Girardot, this answer is not optimized and the withColumn solution is the one to use (Azeroth2b answer)

Can not delete this answer as it has been accepted

Here is my take on this one:

 val rdd = sc.parallelize(

      List( (2012,"Tesla","S"), (1997,"Ford","E350"), (2015,"Chevy","Volt"))

  )

  val sqlContext = new SQLContext(sc)



  // this is used to implicitly convert an RDD to a DataFrame.

  import sqlContext.implicits._



  val dataframe = rdd.toDF()



  dataframe.foreach(println)



 dataframe.map(row => {

    val row1 = row.getAs[String](1)

    val make = if (row1.toLowerCase == "tesla") "S" else row1

    Row(row(0),make,row(2))

  }).collect().foreach(println)



//[2012,S,S]

//[1997,Ford,E350]

//[2015,Chevy,Volt]

You can actually use directly map on the DataFrame.

So you basically check the column 1 for the String tesla.
If it's tesla, use the value S for make else you the current value of column 1

Then build a tuple with all data from the row using the indexes (zero based) (Row(row(0),make,row(2))) in my example)

There is probably a better way to do it. I am not that familiar yet with the Spark umbrella

edited Mar 10 '17 at 13:33

answered Sep 2 '15 at 18:54

ccheneson

42.5k85365

Note:
As mentionned by Olivier Girardot, this answer is not optimized and the withColumn solution is the one to use (Azeroth2b answer)

Can not delete this answer as it has been accepted

Here is my take on this one:

 val rdd = sc.parallelize(

      List( (2012,"Tesla","S"), (1997,"Ford","E350"), (2015,"Chevy","Volt"))

  )

  val sqlContext = new SQLContext(sc)



  // this is used to implicitly convert an RDD to a DataFrame.

  import sqlContext.implicits._



  val dataframe = rdd.toDF()



  dataframe.foreach(println)



 dataframe.map(row => {

    val row1 = row.getAs[String](1)

    val make = if (row1.toLowerCase == "tesla") "S" else row1

    Row(row(0),make,row(2))

  }).collect().foreach(println)



//[2012,S,S]

//[1997,Ford,E350]

//[2015,Chevy,Volt]

You can actually use directly map on the DataFrame.

So you basically check the column 1 for the String tesla.
If it's tesla, use the value S for make else you the current value of column 1

Then build a tuple with all data from the row using the indexes (zero based) (Row(row(0),make,row(2))) in my example)

There is probably a better way to do it. I am not that familiar yet with the Spark umbrella

edited Mar 10 '17 at 13:33

answered Sep 2 '15 at 18:54

ccheneson

42.5k85365

edited Mar 10 '17 at 13:33

answered Sep 2 '15 at 18:54

ccheneson

42.5k85365

answered Sep 2 '15 at 18:54

ccheneson

42.5k85365

answered Sep 2 '15 at 18:54

ccheneson

42.5k85365

Thanks soooo much!!

– Tong
Sep 2 '15 at 19:15

Thanks for your help. I have one more question. Your solution can printout the strings I want. However what if I want to change the value within the dataframe itself? When I do dataframe.show() the the value is still tesla

– Tong
Sep 2 '15 at 20:38

2

Dataframe are based on RDDs which are immutable. Try val newDF = dataframe.map(row => { val row1 = row.getAs[String](1) val make = if (row1.toLowerCase == "tesla") "S" else row1 Row(row(0),make,row(2)) }) that should construct new DataFrame.

– ccheneson
Sep 2 '15 at 21:04

Thanks! It works! Feels so good! I set a new data frame and add a new column.

– Tong
Sep 3 '15 at 1:07

4

this will break spark's catalyst optimisations, and therefore is not the best practice, the withColumn approach is best suited for this.

– Olivier Girardot
Jan 5 '17 at 19:29

|
show 8 more comments

Thanks soooo much!!

– Tong
Sep 2 '15 at 19:15

Thanks for your help. I have one more question. Your solution can printout the strings I want. However what if I want to change the value within the dataframe itself? When I do dataframe.show() the the value is still tesla

– Tong
Sep 2 '15 at 20:38

2

Dataframe are based on RDDs which are immutable. Try val newDF = dataframe.map(row => { val row1 = row.getAs[String](1) val make = if (row1.toLowerCase == "tesla") "S" else row1 Row(row(0),make,row(2)) }) that should construct new DataFrame.

– ccheneson
Sep 2 '15 at 21:04

Thanks! It works! Feels so good! I set a new data frame and add a new column.

– Tong
Sep 3 '15 at 1:07

4

this will break spark's catalyst optimisations, and therefore is not the best practice, the withColumn approach is best suited for this.

– Olivier Girardot
Jan 5 '17 at 19:29

Thanks soooo much!!

– Tong
Sep 2 '15 at 19:15

Thanks for your help. I have one more question. Your solution can printout the strings I want. However what if I want to change the value within the dataframe itself? When I do dataframe.show() the the value is still tesla

– Tong
Sep 2 '15 at 20:38

Dataframe are based on RDDs which are immutable. Try

val newDF = dataframe.map(row => { val row1 = row.getAs[String](1) val make = if (row1.toLowerCase == "tesla") "S" else row1 Row(row(0),make,row(2)) })

that should construct new DataFrame.

– ccheneson
Sep 2 '15 at 21:04

Dataframe are based on RDDs which are immutable. Try

val newDF = dataframe.map(row => { val row1 = row.getAs[String](1) val make = if (row1.toLowerCase == "tesla") "S" else row1 Row(row(0),make,row(2)) })

that should construct new DataFrame.

– ccheneson
Sep 2 '15 at 21:04

Thanks! It works! Feels so good! I set a new data frame and add a new column.

– Tong
Sep 3 '15 at 1:07

this will break spark's catalyst optimisations, and therefore is not the best practice, the withColumn approach is best suited for this.

– Olivier Girardot
Jan 5 '17 at 19:29

|
show 8 more comments

Spark 1.6.2, Java code (sorry), this will change every instance of Tesla to S for the entire dataframe without passing through an RDD:

dataframe.withColumn("make", when(col("make").equalTo("Tesla"), "S")

                             .otherwise(col("make") 

                    );

Edited to add @marshall245 "otherwise" to ensure non-Tesla columns aren't converted to NULL.

edited Nov 16 '18 at 13:32

ChrisOdney

1,79182538

answered Oct 25 '16 at 16:16

Azeroth2b

437146

hey man, what if i want to change a column with a value from another dataframe column (both dataframes have an id column) i can't seem to make it in java spark.

– Vasile Surdu
Mar 9 '17 at 16:05

This is probably better served with a select .. join on id, given that, sounds like a new question. Hope that gets you started.

– Azeroth2b
Mar 11 '17 at 18:02

yeah i just used the 'case' in select :) worked

– Vasile Surdu
Mar 14 '17 at 12:03

Why to edit this one and make it the same answer as @marshall245?

– Eduardo Reis
Nov 18 '18 at 5:45

Where can I find the doc for withColumn function? I actually have more conditions and more columns to change the values of. I got this docs.azuredatabricks.net/spark/1.6/sparkr/functions/… but this is not helping. Can anyone help?

– GadaaDhaariGeek
Dec 27 '18 at 6:22

|
show 1 more comment

Spark 1.6.2, Java code (sorry), this will change every instance of Tesla to S for the entire dataframe without passing through an RDD:

dataframe.withColumn("make", when(col("make").equalTo("Tesla"), "S")

                             .otherwise(col("make") 

                    );

Edited to add @marshall245 "otherwise" to ensure non-Tesla columns aren't converted to NULL.

edited Nov 16 '18 at 13:32

ChrisOdney

1,79182538

answered Oct 25 '16 at 16:16

Azeroth2b

437146

hey man, what if i want to change a column with a value from another dataframe column (both dataframes have an id column) i can't seem to make it in java spark.

– Vasile Surdu
Mar 9 '17 at 16:05

This is probably better served with a select .. join on id, given that, sounds like a new question. Hope that gets you started.

– Azeroth2b
Mar 11 '17 at 18:02

yeah i just used the 'case' in select :) worked

– Vasile Surdu
Mar 14 '17 at 12:03

Why to edit this one and make it the same answer as @marshall245?

– Eduardo Reis
Nov 18 '18 at 5:45

Where can I find the doc for withColumn function? I actually have more conditions and more columns to change the values of. I got this docs.azuredatabricks.net/spark/1.6/sparkr/functions/… but this is not helping. Can anyone help?

– GadaaDhaariGeek
Dec 27 '18 at 6:22

|
show 1 more comment

Spark 1.6.2, Java code (sorry), this will change every instance of Tesla to S for the entire dataframe without passing through an RDD:

dataframe.withColumn("make", when(col("make").equalTo("Tesla"), "S")

                             .otherwise(col("make") 

                    );

Edited to add @marshall245 "otherwise" to ensure non-Tesla columns aren't converted to NULL.

edited Nov 16 '18 at 13:32

ChrisOdney

1,79182538

answered Oct 25 '16 at 16:16

Azeroth2b

437146

Spark 1.6.2, Java code (sorry), this will change every instance of Tesla to S for the entire dataframe without passing through an RDD:

dataframe.withColumn("make", when(col("make").equalTo("Tesla"), "S")

                             .otherwise(col("make") 

                    );

Edited to add @marshall245 "otherwise" to ensure non-Tesla columns aren't converted to NULL.

edited Nov 16 '18 at 13:32

ChrisOdney

1,79182538

answered Oct 25 '16 at 16:16

Azeroth2b

437146

edited Nov 16 '18 at 13:32

ChrisOdney

1,79182538

edited Nov 16 '18 at 13:32

ChrisOdney

1,79182538

edited Nov 16 '18 at 13:32

ChrisOdney

1,79182538

answered Oct 25 '16 at 16:16

Azeroth2b

437146

answered Oct 25 '16 at 16:16

Azeroth2b

437146

answered Oct 25 '16 at 16:16

Azeroth2b

437146

hey man, what if i want to change a column with a value from another dataframe column (both dataframes have an id column) i can't seem to make it in java spark.

– Vasile Surdu
Mar 9 '17 at 16:05

This is probably better served with a select .. join on id, given that, sounds like a new question. Hope that gets you started.

– Azeroth2b
Mar 11 '17 at 18:02

yeah i just used the 'case' in select :) worked

– Vasile Surdu
Mar 14 '17 at 12:03

Why to edit this one and make it the same answer as @marshall245?

– Eduardo Reis
Nov 18 '18 at 5:45

Where can I find the doc for withColumn function? I actually have more conditions and more columns to change the values of. I got this docs.azuredatabricks.net/spark/1.6/sparkr/functions/… but this is not helping. Can anyone help?

– GadaaDhaariGeek
Dec 27 '18 at 6:22

|
show 1 more comment

hey man, what if i want to change a column with a value from another dataframe column (both dataframes have an id column) i can't seem to make it in java spark.

– Vasile Surdu
Mar 9 '17 at 16:05

This is probably better served with a select .. join on id, given that, sounds like a new question. Hope that gets you started.

– Azeroth2b
Mar 11 '17 at 18:02

yeah i just used the 'case' in select :) worked

– Vasile Surdu
Mar 14 '17 at 12:03

Why to edit this one and make it the same answer as @marshall245?

– Eduardo Reis
Nov 18 '18 at 5:45

Where can I find the doc for withColumn function? I actually have more conditions and more columns to change the values of. I got this docs.azuredatabricks.net/spark/1.6/sparkr/functions/… but this is not helping. Can anyone help?

– GadaaDhaariGeek
Dec 27 '18 at 6:22

hey man, what if i want to change a column with a value from another dataframe column (both dataframes have an id column) i can't seem to make it in java spark.

– Vasile Surdu
Mar 9 '17 at 16:05

This is probably better served with a select .. join on id, given that, sounds like a new question. Hope that gets you started.

– Azeroth2b
Mar 11 '17 at 18:02

yeah i just used the 'case' in select :) worked

– Vasile Surdu
Mar 14 '17 at 12:03

Why to edit this one and make it the same answer as @marshall245?

– Eduardo Reis
Nov 18 '18 at 5:45

Where can I find the doc for withColumn function? I actually have more conditions and more columns to change the values of. I got this docs.azuredatabricks.net/spark/1.6/sparkr/functions/… but this is not helping. Can anyone help?

– GadaaDhaariGeek
Dec 27 '18 at 6:22

|
show 1 more comment

import org.apache.spark.sql.functions._

val newsdf = sdf.withColumn("make", when(col("make") === "Tesla", "S")

                                   .otherwise(col("make"))

                           );

Old DataFrame

+-----+-----+ 

| make|model| 

+-----+-----+ 

|Tesla|    S| 

| Ford| E350| 

|Chevy| Volt| 

+-----+-----+

New Datarame

+-----+-----+

| make|model|

+-----+-----+

|    S|    S|

| Ford| E350|

|Chevy| Volt|

+-----+-----+

edited Nov 16 '18 at 14:46

ChrisOdney

1,79182538

answered Apr 7 '17 at 14:56

marshall245

21123

add a comment |

import org.apache.spark.sql.functions._

val newsdf = sdf.withColumn("make", when(col("make") === "Tesla", "S")

                                   .otherwise(col("make"))

                           );

Old DataFrame

+-----+-----+ 

| make|model| 

+-----+-----+ 

|Tesla|    S| 

| Ford| E350| 

|Chevy| Volt| 

+-----+-----+

New Datarame

+-----+-----+

| make|model|

+-----+-----+

|    S|    S|

| Ford| E350|

|Chevy| Volt|

+-----+-----+

edited Nov 16 '18 at 14:46

ChrisOdney

1,79182538

answered Apr 7 '17 at 14:56

marshall245

21123

add a comment |

import org.apache.spark.sql.functions._

val newsdf = sdf.withColumn("make", when(col("make") === "Tesla", "S")

                                   .otherwise(col("make"))

                           );

Old DataFrame

+-----+-----+ 

| make|model| 

+-----+-----+ 

|Tesla|    S| 

| Ford| E350| 

|Chevy| Volt| 

+-----+-----+

New Datarame

+-----+-----+

| make|model|

+-----+-----+

|    S|    S|

| Ford| E350|

|Chevy| Volt|

+-----+-----+

edited Nov 16 '18 at 14:46

ChrisOdney

1,79182538

answered Apr 7 '17 at 14:56

marshall245

21123

import org.apache.spark.sql.functions._

val newsdf = sdf.withColumn("make", when(col("make") === "Tesla", "S")

                                   .otherwise(col("make"))

                           );

Old DataFrame

+-----+-----+ 

| make|model| 

+-----+-----+ 

|Tesla|    S| 

| Ford| E350| 

|Chevy| Volt| 

+-----+-----+

New Datarame

+-----+-----+

| make|model|

+-----+-----+

|    S|    S|

| Ford| E350|

|Chevy| Volt|

+-----+-----+

edited Nov 16 '18 at 14:46

ChrisOdney

1,79182538

answered Apr 7 '17 at 14:56

marshall245

21123

edited Nov 16 '18 at 14:46

ChrisOdney

1,79182538

edited Nov 16 '18 at 14:46

ChrisOdney

1,79182538

edited Nov 16 '18 at 14:46

ChrisOdney

1,79182538

answered Apr 7 '17 at 14:56

marshall245

21123

answered Apr 7 '17 at 14:56

marshall245

21123

answered Apr 7 '17 at 14:56

marshall245

21123

add a comment |

This can be achieved in dataframes with user defined functions (udf).

import org.apache.spark.sql.functions._

val sqlcont = new org.apache.spark.sql.SQLContext(sc)

val df1 = sqlcont.jsonRDD(sc.parallelize(Array(

      """{"year":2012, "make": "Tesla", "model": "S", "comment": "No Comment", "blank": ""}""",

      """{"year":1997, "make": "Ford", "model": "E350", "comment": "Get one", "blank": ""}""",

      """{"year":2015, "make": "Chevy", "model": "Volt", "comment": "", "blank": ""}"""

    )))



val makeSIfTesla = udf {(make: String) => 

  if(make == "Tesla") "S" else make

}

df1.withColumn("make", makeSIfTesla(df1("make"))).show

answered Sep 17 '15 at 15:15

Al M

487410

1

I guess this will improve the performance because you are not converting df to rdd and adding a new column.

– Nandakishore
Oct 31 '16 at 19:41

This does not result in duplicate make columns?

– javadba
Jul 24 '18 at 0:01

add a comment |

This can be achieved in dataframes with user defined functions (udf).

import org.apache.spark.sql.functions._

val sqlcont = new org.apache.spark.sql.SQLContext(sc)

val df1 = sqlcont.jsonRDD(sc.parallelize(Array(

      """{"year":2012, "make": "Tesla", "model": "S", "comment": "No Comment", "blank": ""}""",

      """{"year":1997, "make": "Ford", "model": "E350", "comment": "Get one", "blank": ""}""",

      """{"year":2015, "make": "Chevy", "model": "Volt", "comment": "", "blank": ""}"""

    )))



val makeSIfTesla = udf {(make: String) => 

  if(make == "Tesla") "S" else make

}

df1.withColumn("make", makeSIfTesla(df1("make"))).show

answered Sep 17 '15 at 15:15

Al M

487410

1

I guess this will improve the performance because you are not converting df to rdd and adding a new column.

– Nandakishore
Oct 31 '16 at 19:41

This does not result in duplicate make columns?

– javadba
Jul 24 '18 at 0:01

add a comment |

This can be achieved in dataframes with user defined functions (udf).

import org.apache.spark.sql.functions._

val sqlcont = new org.apache.spark.sql.SQLContext(sc)

val df1 = sqlcont.jsonRDD(sc.parallelize(Array(

      """{"year":2012, "make": "Tesla", "model": "S", "comment": "No Comment", "blank": ""}""",

      """{"year":1997, "make": "Ford", "model": "E350", "comment": "Get one", "blank": ""}""",

      """{"year":2015, "make": "Chevy", "model": "Volt", "comment": "", "blank": ""}"""

    )))



val makeSIfTesla = udf {(make: String) => 

  if(make == "Tesla") "S" else make

}

df1.withColumn("make", makeSIfTesla(df1("make"))).show

answered Sep 17 '15 at 15:15

Al M

487410

This can be achieved in dataframes with user defined functions (udf).

import org.apache.spark.sql.functions._

val sqlcont = new org.apache.spark.sql.SQLContext(sc)

val df1 = sqlcont.jsonRDD(sc.parallelize(Array(

      """{"year":2012, "make": "Tesla", "model": "S", "comment": "No Comment", "blank": ""}""",

      """{"year":1997, "make": "Ford", "model": "E350", "comment": "Get one", "blank": ""}""",

      """{"year":2015, "make": "Chevy", "model": "Volt", "comment": "", "blank": ""}"""

    )))



val makeSIfTesla = udf {(make: String) => 

  if(make == "Tesla") "S" else make

}

df1.withColumn("make", makeSIfTesla(df1("make"))).show

answered Sep 17 '15 at 15:15

Al M

487410

answered Sep 17 '15 at 15:15

Al M

487410

answered Sep 17 '15 at 15:15

Al M

487410

answered Sep 17 '15 at 15:15

Al M

487410

1

I guess this will improve the performance because you are not converting df to rdd and adding a new column.

– Nandakishore
Oct 31 '16 at 19:41

This does not result in duplicate make columns?

– javadba
Jul 24 '18 at 0:01

add a comment |

1

I guess this will improve the performance because you are not converting df to rdd and adding a new column.

– Nandakishore
Oct 31 '16 at 19:41

This does not result in duplicate make columns?

– javadba
Jul 24 '18 at 0:01

I guess this will improve the performance because you are not converting df to rdd and adding a new column.

– Nandakishore
Oct 31 '16 at 19:41

This does not result in duplicate make columns?

– javadba
Jul 24 '18 at 0:01

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

btOUdQPVoH,GVEcYi,lI3ajnZTHptFa

搜尋此網誌

Vfrdtyky