Scala: How can I replace value in Dataframes using scala





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







26















For example I want to replace all numbers equal to 0.2 in a column to 0. How can I do that in Scala? Thanks



Edit:



|year| make|model| comment            |blank|
|2012|Tesla| S | No comment | |
|1997| Ford| E350|Go get one now th...| |
|2015|Chevy| Volt| null | null|


This is my Dataframe I'm trying to change Tesla in make column to S










share|improve this question

























  • by converting to RDD with .rdd and using map to change to 0 if 0.2 ?

    – ccheneson
    Sep 2 '15 at 15:58











  • What is the map command for change to 0 if 0.2?

    – Tong
    Sep 2 '15 at 16:30











  • And how can i focus on a specific column?

    – Tong
    Sep 2 '15 at 16:36











  • Give us an example of your data, what you have tried so far.

    – ccheneson
    Sep 2 '15 at 17:17











  • +----+-----+-----+--------------------+-----+ |year| make|model| comment|blank| +----+-----+-----+--------------------+-----+ |2012|Tesla| S| No comment| | |1997| Ford| E350|Go get one now th...| | |2015|Chevy| Volt| null| null| This is my Dataframe I'm trying to change Tesla in make column to S. I have just start learning Scala. Really appreciate your help!

    – Tong
    Sep 2 '15 at 17:37


















26















For example I want to replace all numbers equal to 0.2 in a column to 0. How can I do that in Scala? Thanks



Edit:



|year| make|model| comment            |blank|
|2012|Tesla| S | No comment | |
|1997| Ford| E350|Go get one now th...| |
|2015|Chevy| Volt| null | null|


This is my Dataframe I'm trying to change Tesla in make column to S










share|improve this question

























  • by converting to RDD with .rdd and using map to change to 0 if 0.2 ?

    – ccheneson
    Sep 2 '15 at 15:58











  • What is the map command for change to 0 if 0.2?

    – Tong
    Sep 2 '15 at 16:30











  • And how can i focus on a specific column?

    – Tong
    Sep 2 '15 at 16:36











  • Give us an example of your data, what you have tried so far.

    – ccheneson
    Sep 2 '15 at 17:17











  • +----+-----+-----+--------------------+-----+ |year| make|model| comment|blank| +----+-----+-----+--------------------+-----+ |2012|Tesla| S| No comment| | |1997| Ford| E350|Go get one now th...| | |2015|Chevy| Volt| null| null| This is my Dataframe I'm trying to change Tesla in make column to S. I have just start learning Scala. Really appreciate your help!

    – Tong
    Sep 2 '15 at 17:37














26












26








26


8






For example I want to replace all numbers equal to 0.2 in a column to 0. How can I do that in Scala? Thanks



Edit:



|year| make|model| comment            |blank|
|2012|Tesla| S | No comment | |
|1997| Ford| E350|Go get one now th...| |
|2015|Chevy| Volt| null | null|


This is my Dataframe I'm trying to change Tesla in make column to S










share|improve this question
















For example I want to replace all numbers equal to 0.2 in a column to 0. How can I do that in Scala? Thanks



Edit:



|year| make|model| comment            |blank|
|2012|Tesla| S | No comment | |
|1997| Ford| E350|Go get one now th...| |
|2015|Chevy| Volt| null | null|


This is my Dataframe I'm trying to change Tesla in make column to S







scala apache-spark dataframe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Aug 25 '17 at 13:27









Javier Montón

471512




471512










asked Sep 2 '15 at 15:55









TongTong

199249




199249













  • by converting to RDD with .rdd and using map to change to 0 if 0.2 ?

    – ccheneson
    Sep 2 '15 at 15:58











  • What is the map command for change to 0 if 0.2?

    – Tong
    Sep 2 '15 at 16:30











  • And how can i focus on a specific column?

    – Tong
    Sep 2 '15 at 16:36











  • Give us an example of your data, what you have tried so far.

    – ccheneson
    Sep 2 '15 at 17:17











  • +----+-----+-----+--------------------+-----+ |year| make|model| comment|blank| +----+-----+-----+--------------------+-----+ |2012|Tesla| S| No comment| | |1997| Ford| E350|Go get one now th...| | |2015|Chevy| Volt| null| null| This is my Dataframe I'm trying to change Tesla in make column to S. I have just start learning Scala. Really appreciate your help!

    – Tong
    Sep 2 '15 at 17:37



















  • by converting to RDD with .rdd and using map to change to 0 if 0.2 ?

    – ccheneson
    Sep 2 '15 at 15:58











  • What is the map command for change to 0 if 0.2?

    – Tong
    Sep 2 '15 at 16:30











  • And how can i focus on a specific column?

    – Tong
    Sep 2 '15 at 16:36











  • Give us an example of your data, what you have tried so far.

    – ccheneson
    Sep 2 '15 at 17:17











  • +----+-----+-----+--------------------+-----+ |year| make|model| comment|blank| +----+-----+-----+--------------------+-----+ |2012|Tesla| S| No comment| | |1997| Ford| E350|Go get one now th...| | |2015|Chevy| Volt| null| null| This is my Dataframe I'm trying to change Tesla in make column to S. I have just start learning Scala. Really appreciate your help!

    – Tong
    Sep 2 '15 at 17:37

















by converting to RDD with .rdd and using map to change to 0 if 0.2 ?

– ccheneson
Sep 2 '15 at 15:58





by converting to RDD with .rdd and using map to change to 0 if 0.2 ?

– ccheneson
Sep 2 '15 at 15:58













What is the map command for change to 0 if 0.2?

– Tong
Sep 2 '15 at 16:30





What is the map command for change to 0 if 0.2?

– Tong
Sep 2 '15 at 16:30













And how can i focus on a specific column?

– Tong
Sep 2 '15 at 16:36





And how can i focus on a specific column?

– Tong
Sep 2 '15 at 16:36













Give us an example of your data, what you have tried so far.

– ccheneson
Sep 2 '15 at 17:17





Give us an example of your data, what you have tried so far.

– ccheneson
Sep 2 '15 at 17:17













+----+-----+-----+--------------------+-----+ |year| make|model| comment|blank| +----+-----+-----+--------------------+-----+ |2012|Tesla| S| No comment| | |1997| Ford| E350|Go get one now th...| | |2015|Chevy| Volt| null| null| This is my Dataframe I'm trying to change Tesla in make column to S. I have just start learning Scala. Really appreciate your help!

– Tong
Sep 2 '15 at 17:37





+----+-----+-----+--------------------+-----+ |year| make|model| comment|blank| +----+-----+-----+--------------------+-----+ |2012|Tesla| S| No comment| | |1997| Ford| E350|Go get one now th...| | |2015|Chevy| Volt| null| null| This is my Dataframe I'm trying to change Tesla in make column to S. I have just start learning Scala. Really appreciate your help!

– Tong
Sep 2 '15 at 17:37












4 Answers
4






active

oldest

votes


















11














Note:
As mentionned by Olivier Girardot, this answer is not optimized and the withColumn solution is the one to use (Azeroth2b answer)



Can not delete this answer as it has been accepted





Here is my take on this one:



 val rdd = sc.parallelize(
List( (2012,"Tesla","S"), (1997,"Ford","E350"), (2015,"Chevy","Volt"))
)
val sqlContext = new SQLContext(sc)

// this is used to implicitly convert an RDD to a DataFrame.
import sqlContext.implicits._

val dataframe = rdd.toDF()

dataframe.foreach(println)

dataframe.map(row => {
val row1 = row.getAs[String](1)
val make = if (row1.toLowerCase == "tesla") "S" else row1
Row(row(0),make,row(2))
}).collect().foreach(println)

//[2012,S,S]
//[1997,Ford,E350]
//[2015,Chevy,Volt]


You can actually use directly map on the DataFrame.



So you basically check the column 1 for the String tesla.
If it's tesla, use the value S for make else you the current value of column 1



Then build a tuple with all data from the row using the indexes (zero based) (Row(row(0),make,row(2))) in my example)



There is probably a better way to do it. I am not that familiar yet with the Spark umbrella






share|improve this answer


























  • Thanks soooo much!!

    – Tong
    Sep 2 '15 at 19:15











  • Thanks for your help. I have one more question. Your solution can printout the strings I want. However what if I want to change the value within the dataframe itself? When I do dataframe.show() the the value is still tesla

    – Tong
    Sep 2 '15 at 20:38






  • 2





    Dataframe are based on RDDs which are immutable. Try val newDF = dataframe.map(row => { val row1 = row.getAs[String](1) val make = if (row1.toLowerCase == "tesla") "S" else row1 Row(row(0),make,row(2)) }) that should construct new DataFrame.

    – ccheneson
    Sep 2 '15 at 21:04













  • Thanks! It works! Feels so good! I set a new data frame and add a new column.

    – Tong
    Sep 3 '15 at 1:07








  • 4





    this will break spark's catalyst optimisations, and therefore is not the best practice, the withColumn approach is best suited for this.

    – Olivier Girardot
    Jan 5 '17 at 19:29



















33














Spark 1.6.2, Java code (sorry), this will change every instance of Tesla to S for the entire dataframe without passing through an RDD:



dataframe.withColumn("make", when(col("make").equalTo("Tesla"), "S")
.otherwise(col("make")
);


Edited to add @marshall245 "otherwise" to ensure non-Tesla columns aren't converted to NULL.






share|improve this answer


























  • hey man, what if i want to change a column with a value from another dataframe column (both dataframes have an id column) i can't seem to make it in java spark.

    – Vasile Surdu
    Mar 9 '17 at 16:05











  • This is probably better served with a select .. join on id, given that, sounds like a new question. Hope that gets you started.

    – Azeroth2b
    Mar 11 '17 at 18:02











  • yeah i just used the 'case' in select :) worked

    – Vasile Surdu
    Mar 14 '17 at 12:03











  • Why to edit this one and make it the same answer as @marshall245?

    – Eduardo Reis
    Nov 18 '18 at 5:45













  • Where can I find the doc for withColumn function? I actually have more conditions and more columns to change the values of. I got this docs.azuredatabricks.net/spark/1.6/sparkr/functions/… but this is not helping. Can anyone help?

    – GadaaDhaariGeek
    Dec 27 '18 at 6:22





















21














Building off of the solution from @Azeroth2b. If you want to replace only a couple of items and leave the rest unchanged. Do the following. Without using the otherwise(...) method, the remainder of the column becomes null.



import org.apache.spark.sql.functions._
val newsdf = sdf.withColumn("make", when(col("make") === "Tesla", "S")
.otherwise(col("make"))
);


Old DataFrame



+-----+-----+ 
| make|model|
+-----+-----+
|Tesla| S|
| Ford| E350|
|Chevy| Volt|
+-----+-----+


New Datarame



+-----+-----+
| make|model|
+-----+-----+
| S| S|
| Ford| E350|
|Chevy| Volt|
+-----+-----+





share|improve this answer

































    12














    This can be achieved in dataframes with user defined functions (udf).



    import org.apache.spark.sql.functions._
    val sqlcont = new org.apache.spark.sql.SQLContext(sc)
    val df1 = sqlcont.jsonRDD(sc.parallelize(Array(
    """{"year":2012, "make": "Tesla", "model": "S", "comment": "No Comment", "blank": ""}""",
    """{"year":1997, "make": "Ford", "model": "E350", "comment": "Get one", "blank": ""}""",
    """{"year":2015, "make": "Chevy", "model": "Volt", "comment": "", "blank": ""}"""
    )))

    val makeSIfTesla = udf {(make: String) =>
    if(make == "Tesla") "S" else make
    }
    df1.withColumn("make", makeSIfTesla(df1("make"))).show





    share|improve this answer



















    • 1





      I guess this will improve the performance because you are not converting df to rdd and adding a new column.

      – Nandakishore
      Oct 31 '16 at 19:41











    • This does not result in duplicate make columns?

      – javadba
      Jul 24 '18 at 0:01












    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f32357774%2fscala-how-can-i-replace-value-in-dataframes-using-scala%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    11














    Note:
    As mentionned by Olivier Girardot, this answer is not optimized and the withColumn solution is the one to use (Azeroth2b answer)



    Can not delete this answer as it has been accepted





    Here is my take on this one:



     val rdd = sc.parallelize(
    List( (2012,"Tesla","S"), (1997,"Ford","E350"), (2015,"Chevy","Volt"))
    )
    val sqlContext = new SQLContext(sc)

    // this is used to implicitly convert an RDD to a DataFrame.
    import sqlContext.implicits._

    val dataframe = rdd.toDF()

    dataframe.foreach(println)

    dataframe.map(row => {
    val row1 = row.getAs[String](1)
    val make = if (row1.toLowerCase == "tesla") "S" else row1
    Row(row(0),make,row(2))
    }).collect().foreach(println)

    //[2012,S,S]
    //[1997,Ford,E350]
    //[2015,Chevy,Volt]


    You can actually use directly map on the DataFrame.



    So you basically check the column 1 for the String tesla.
    If it's tesla, use the value S for make else you the current value of column 1



    Then build a tuple with all data from the row using the indexes (zero based) (Row(row(0),make,row(2))) in my example)



    There is probably a better way to do it. I am not that familiar yet with the Spark umbrella






    share|improve this answer


























    • Thanks soooo much!!

      – Tong
      Sep 2 '15 at 19:15











    • Thanks for your help. I have one more question. Your solution can printout the strings I want. However what if I want to change the value within the dataframe itself? When I do dataframe.show() the the value is still tesla

      – Tong
      Sep 2 '15 at 20:38






    • 2





      Dataframe are based on RDDs which are immutable. Try val newDF = dataframe.map(row => { val row1 = row.getAs[String](1) val make = if (row1.toLowerCase == "tesla") "S" else row1 Row(row(0),make,row(2)) }) that should construct new DataFrame.

      – ccheneson
      Sep 2 '15 at 21:04













    • Thanks! It works! Feels so good! I set a new data frame and add a new column.

      – Tong
      Sep 3 '15 at 1:07








    • 4





      this will break spark's catalyst optimisations, and therefore is not the best practice, the withColumn approach is best suited for this.

      – Olivier Girardot
      Jan 5 '17 at 19:29
















    11














    Note:
    As mentionned by Olivier Girardot, this answer is not optimized and the withColumn solution is the one to use (Azeroth2b answer)



    Can not delete this answer as it has been accepted





    Here is my take on this one:



     val rdd = sc.parallelize(
    List( (2012,"Tesla","S"), (1997,"Ford","E350"), (2015,"Chevy","Volt"))
    )
    val sqlContext = new SQLContext(sc)

    // this is used to implicitly convert an RDD to a DataFrame.
    import sqlContext.implicits._

    val dataframe = rdd.toDF()

    dataframe.foreach(println)

    dataframe.map(row => {
    val row1 = row.getAs[String](1)
    val make = if (row1.toLowerCase == "tesla") "S" else row1
    Row(row(0),make,row(2))
    }).collect().foreach(println)

    //[2012,S,S]
    //[1997,Ford,E350]
    //[2015,Chevy,Volt]


    You can actually use directly map on the DataFrame.



    So you basically check the column 1 for the String tesla.
    If it's tesla, use the value S for make else you the current value of column 1



    Then build a tuple with all data from the row using the indexes (zero based) (Row(row(0),make,row(2))) in my example)



    There is probably a better way to do it. I am not that familiar yet with the Spark umbrella






    share|improve this answer


























    • Thanks soooo much!!

      – Tong
      Sep 2 '15 at 19:15











    • Thanks for your help. I have one more question. Your solution can printout the strings I want. However what if I want to change the value within the dataframe itself? When I do dataframe.show() the the value is still tesla

      – Tong
      Sep 2 '15 at 20:38






    • 2





      Dataframe are based on RDDs which are immutable. Try val newDF = dataframe.map(row => { val row1 = row.getAs[String](1) val make = if (row1.toLowerCase == "tesla") "S" else row1 Row(row(0),make,row(2)) }) that should construct new DataFrame.

      – ccheneson
      Sep 2 '15 at 21:04













    • Thanks! It works! Feels so good! I set a new data frame and add a new column.

      – Tong
      Sep 3 '15 at 1:07








    • 4





      this will break spark's catalyst optimisations, and therefore is not the best practice, the withColumn approach is best suited for this.

      – Olivier Girardot
      Jan 5 '17 at 19:29














    11












    11








    11







    Note:
    As mentionned by Olivier Girardot, this answer is not optimized and the withColumn solution is the one to use (Azeroth2b answer)



    Can not delete this answer as it has been accepted





    Here is my take on this one:



     val rdd = sc.parallelize(
    List( (2012,"Tesla","S"), (1997,"Ford","E350"), (2015,"Chevy","Volt"))
    )
    val sqlContext = new SQLContext(sc)

    // this is used to implicitly convert an RDD to a DataFrame.
    import sqlContext.implicits._

    val dataframe = rdd.toDF()

    dataframe.foreach(println)

    dataframe.map(row => {
    val row1 = row.getAs[String](1)
    val make = if (row1.toLowerCase == "tesla") "S" else row1
    Row(row(0),make,row(2))
    }).collect().foreach(println)

    //[2012,S,S]
    //[1997,Ford,E350]
    //[2015,Chevy,Volt]


    You can actually use directly map on the DataFrame.



    So you basically check the column 1 for the String tesla.
    If it's tesla, use the value S for make else you the current value of column 1



    Then build a tuple with all data from the row using the indexes (zero based) (Row(row(0),make,row(2))) in my example)



    There is probably a better way to do it. I am not that familiar yet with the Spark umbrella






    share|improve this answer















    Note:
    As mentionned by Olivier Girardot, this answer is not optimized and the withColumn solution is the one to use (Azeroth2b answer)



    Can not delete this answer as it has been accepted





    Here is my take on this one:



     val rdd = sc.parallelize(
    List( (2012,"Tesla","S"), (1997,"Ford","E350"), (2015,"Chevy","Volt"))
    )
    val sqlContext = new SQLContext(sc)

    // this is used to implicitly convert an RDD to a DataFrame.
    import sqlContext.implicits._

    val dataframe = rdd.toDF()

    dataframe.foreach(println)

    dataframe.map(row => {
    val row1 = row.getAs[String](1)
    val make = if (row1.toLowerCase == "tesla") "S" else row1
    Row(row(0),make,row(2))
    }).collect().foreach(println)

    //[2012,S,S]
    //[1997,Ford,E350]
    //[2015,Chevy,Volt]


    You can actually use directly map on the DataFrame.



    So you basically check the column 1 for the String tesla.
    If it's tesla, use the value S for make else you the current value of column 1



    Then build a tuple with all data from the row using the indexes (zero based) (Row(row(0),make,row(2))) in my example)



    There is probably a better way to do it. I am not that familiar yet with the Spark umbrella







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Mar 10 '17 at 13:33

























    answered Sep 2 '15 at 18:54









    cchenesonccheneson

    42.5k85365




    42.5k85365













    • Thanks soooo much!!

      – Tong
      Sep 2 '15 at 19:15











    • Thanks for your help. I have one more question. Your solution can printout the strings I want. However what if I want to change the value within the dataframe itself? When I do dataframe.show() the the value is still tesla

      – Tong
      Sep 2 '15 at 20:38






    • 2





      Dataframe are based on RDDs which are immutable. Try val newDF = dataframe.map(row => { val row1 = row.getAs[String](1) val make = if (row1.toLowerCase == "tesla") "S" else row1 Row(row(0),make,row(2)) }) that should construct new DataFrame.

      – ccheneson
      Sep 2 '15 at 21:04













    • Thanks! It works! Feels so good! I set a new data frame and add a new column.

      – Tong
      Sep 3 '15 at 1:07








    • 4





      this will break spark's catalyst optimisations, and therefore is not the best practice, the withColumn approach is best suited for this.

      – Olivier Girardot
      Jan 5 '17 at 19:29



















    • Thanks soooo much!!

      – Tong
      Sep 2 '15 at 19:15











    • Thanks for your help. I have one more question. Your solution can printout the strings I want. However what if I want to change the value within the dataframe itself? When I do dataframe.show() the the value is still tesla

      – Tong
      Sep 2 '15 at 20:38






    • 2





      Dataframe are based on RDDs which are immutable. Try val newDF = dataframe.map(row => { val row1 = row.getAs[String](1) val make = if (row1.toLowerCase == "tesla") "S" else row1 Row(row(0),make,row(2)) }) that should construct new DataFrame.

      – ccheneson
      Sep 2 '15 at 21:04













    • Thanks! It works! Feels so good! I set a new data frame and add a new column.

      – Tong
      Sep 3 '15 at 1:07








    • 4





      this will break spark's catalyst optimisations, and therefore is not the best practice, the withColumn approach is best suited for this.

      – Olivier Girardot
      Jan 5 '17 at 19:29

















    Thanks soooo much!!

    – Tong
    Sep 2 '15 at 19:15





    Thanks soooo much!!

    – Tong
    Sep 2 '15 at 19:15













    Thanks for your help. I have one more question. Your solution can printout the strings I want. However what if I want to change the value within the dataframe itself? When I do dataframe.show() the the value is still tesla

    – Tong
    Sep 2 '15 at 20:38





    Thanks for your help. I have one more question. Your solution can printout the strings I want. However what if I want to change the value within the dataframe itself? When I do dataframe.show() the the value is still tesla

    – Tong
    Sep 2 '15 at 20:38




    2




    2





    Dataframe are based on RDDs which are immutable. Try val newDF = dataframe.map(row => { val row1 = row.getAs[String](1) val make = if (row1.toLowerCase == "tesla") "S" else row1 Row(row(0),make,row(2)) }) that should construct new DataFrame.

    – ccheneson
    Sep 2 '15 at 21:04







    Dataframe are based on RDDs which are immutable. Try val newDF = dataframe.map(row => { val row1 = row.getAs[String](1) val make = if (row1.toLowerCase == "tesla") "S" else row1 Row(row(0),make,row(2)) }) that should construct new DataFrame.

    – ccheneson
    Sep 2 '15 at 21:04















    Thanks! It works! Feels so good! I set a new data frame and add a new column.

    – Tong
    Sep 3 '15 at 1:07







    Thanks! It works! Feels so good! I set a new data frame and add a new column.

    – Tong
    Sep 3 '15 at 1:07






    4




    4





    this will break spark's catalyst optimisations, and therefore is not the best practice, the withColumn approach is best suited for this.

    – Olivier Girardot
    Jan 5 '17 at 19:29





    this will break spark's catalyst optimisations, and therefore is not the best practice, the withColumn approach is best suited for this.

    – Olivier Girardot
    Jan 5 '17 at 19:29













    33














    Spark 1.6.2, Java code (sorry), this will change every instance of Tesla to S for the entire dataframe without passing through an RDD:



    dataframe.withColumn("make", when(col("make").equalTo("Tesla"), "S")
    .otherwise(col("make")
    );


    Edited to add @marshall245 "otherwise" to ensure non-Tesla columns aren't converted to NULL.






    share|improve this answer


























    • hey man, what if i want to change a column with a value from another dataframe column (both dataframes have an id column) i can't seem to make it in java spark.

      – Vasile Surdu
      Mar 9 '17 at 16:05











    • This is probably better served with a select .. join on id, given that, sounds like a new question. Hope that gets you started.

      – Azeroth2b
      Mar 11 '17 at 18:02











    • yeah i just used the 'case' in select :) worked

      – Vasile Surdu
      Mar 14 '17 at 12:03











    • Why to edit this one and make it the same answer as @marshall245?

      – Eduardo Reis
      Nov 18 '18 at 5:45













    • Where can I find the doc for withColumn function? I actually have more conditions and more columns to change the values of. I got this docs.azuredatabricks.net/spark/1.6/sparkr/functions/… but this is not helping. Can anyone help?

      – GadaaDhaariGeek
      Dec 27 '18 at 6:22


















    33














    Spark 1.6.2, Java code (sorry), this will change every instance of Tesla to S for the entire dataframe without passing through an RDD:



    dataframe.withColumn("make", when(col("make").equalTo("Tesla"), "S")
    .otherwise(col("make")
    );


    Edited to add @marshall245 "otherwise" to ensure non-Tesla columns aren't converted to NULL.






    share|improve this answer


























    • hey man, what if i want to change a column with a value from another dataframe column (both dataframes have an id column) i can't seem to make it in java spark.

      – Vasile Surdu
      Mar 9 '17 at 16:05











    • This is probably better served with a select .. join on id, given that, sounds like a new question. Hope that gets you started.

      – Azeroth2b
      Mar 11 '17 at 18:02











    • yeah i just used the 'case' in select :) worked

      – Vasile Surdu
      Mar 14 '17 at 12:03











    • Why to edit this one and make it the same answer as @marshall245?

      – Eduardo Reis
      Nov 18 '18 at 5:45













    • Where can I find the doc for withColumn function? I actually have more conditions and more columns to change the values of. I got this docs.azuredatabricks.net/spark/1.6/sparkr/functions/… but this is not helping. Can anyone help?

      – GadaaDhaariGeek
      Dec 27 '18 at 6:22
















    33












    33








    33







    Spark 1.6.2, Java code (sorry), this will change every instance of Tesla to S for the entire dataframe without passing through an RDD:



    dataframe.withColumn("make", when(col("make").equalTo("Tesla"), "S")
    .otherwise(col("make")
    );


    Edited to add @marshall245 "otherwise" to ensure non-Tesla columns aren't converted to NULL.






    share|improve this answer















    Spark 1.6.2, Java code (sorry), this will change every instance of Tesla to S for the entire dataframe without passing through an RDD:



    dataframe.withColumn("make", when(col("make").equalTo("Tesla"), "S")
    .otherwise(col("make")
    );


    Edited to add @marshall245 "otherwise" to ensure non-Tesla columns aren't converted to NULL.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 16 '18 at 13:32









    ChrisOdney

    1,79182538




    1,79182538










    answered Oct 25 '16 at 16:16









    Azeroth2bAzeroth2b

    437146




    437146













    • hey man, what if i want to change a column with a value from another dataframe column (both dataframes have an id column) i can't seem to make it in java spark.

      – Vasile Surdu
      Mar 9 '17 at 16:05











    • This is probably better served with a select .. join on id, given that, sounds like a new question. Hope that gets you started.

      – Azeroth2b
      Mar 11 '17 at 18:02











    • yeah i just used the 'case' in select :) worked

      – Vasile Surdu
      Mar 14 '17 at 12:03











    • Why to edit this one and make it the same answer as @marshall245?

      – Eduardo Reis
      Nov 18 '18 at 5:45













    • Where can I find the doc for withColumn function? I actually have more conditions and more columns to change the values of. I got this docs.azuredatabricks.net/spark/1.6/sparkr/functions/… but this is not helping. Can anyone help?

      – GadaaDhaariGeek
      Dec 27 '18 at 6:22





















    • hey man, what if i want to change a column with a value from another dataframe column (both dataframes have an id column) i can't seem to make it in java spark.

      – Vasile Surdu
      Mar 9 '17 at 16:05











    • This is probably better served with a select .. join on id, given that, sounds like a new question. Hope that gets you started.

      – Azeroth2b
      Mar 11 '17 at 18:02











    • yeah i just used the 'case' in select :) worked

      – Vasile Surdu
      Mar 14 '17 at 12:03











    • Why to edit this one and make it the same answer as @marshall245?

      – Eduardo Reis
      Nov 18 '18 at 5:45













    • Where can I find the doc for withColumn function? I actually have more conditions and more columns to change the values of. I got this docs.azuredatabricks.net/spark/1.6/sparkr/functions/… but this is not helping. Can anyone help?

      – GadaaDhaariGeek
      Dec 27 '18 at 6:22



















    hey man, what if i want to change a column with a value from another dataframe column (both dataframes have an id column) i can't seem to make it in java spark.

    – Vasile Surdu
    Mar 9 '17 at 16:05





    hey man, what if i want to change a column with a value from another dataframe column (both dataframes have an id column) i can't seem to make it in java spark.

    – Vasile Surdu
    Mar 9 '17 at 16:05













    This is probably better served with a select .. join on id, given that, sounds like a new question. Hope that gets you started.

    – Azeroth2b
    Mar 11 '17 at 18:02





    This is probably better served with a select .. join on id, given that, sounds like a new question. Hope that gets you started.

    – Azeroth2b
    Mar 11 '17 at 18:02













    yeah i just used the 'case' in select :) worked

    – Vasile Surdu
    Mar 14 '17 at 12:03





    yeah i just used the 'case' in select :) worked

    – Vasile Surdu
    Mar 14 '17 at 12:03













    Why to edit this one and make it the same answer as @marshall245?

    – Eduardo Reis
    Nov 18 '18 at 5:45







    Why to edit this one and make it the same answer as @marshall245?

    – Eduardo Reis
    Nov 18 '18 at 5:45















    Where can I find the doc for withColumn function? I actually have more conditions and more columns to change the values of. I got this docs.azuredatabricks.net/spark/1.6/sparkr/functions/… but this is not helping. Can anyone help?

    – GadaaDhaariGeek
    Dec 27 '18 at 6:22







    Where can I find the doc for withColumn function? I actually have more conditions and more columns to change the values of. I got this docs.azuredatabricks.net/spark/1.6/sparkr/functions/… but this is not helping. Can anyone help?

    – GadaaDhaariGeek
    Dec 27 '18 at 6:22













    21














    Building off of the solution from @Azeroth2b. If you want to replace only a couple of items and leave the rest unchanged. Do the following. Without using the otherwise(...) method, the remainder of the column becomes null.



    import org.apache.spark.sql.functions._
    val newsdf = sdf.withColumn("make", when(col("make") === "Tesla", "S")
    .otherwise(col("make"))
    );


    Old DataFrame



    +-----+-----+ 
    | make|model|
    +-----+-----+
    |Tesla| S|
    | Ford| E350|
    |Chevy| Volt|
    +-----+-----+


    New Datarame



    +-----+-----+
    | make|model|
    +-----+-----+
    | S| S|
    | Ford| E350|
    |Chevy| Volt|
    +-----+-----+





    share|improve this answer






























      21














      Building off of the solution from @Azeroth2b. If you want to replace only a couple of items and leave the rest unchanged. Do the following. Without using the otherwise(...) method, the remainder of the column becomes null.



      import org.apache.spark.sql.functions._
      val newsdf = sdf.withColumn("make", when(col("make") === "Tesla", "S")
      .otherwise(col("make"))
      );


      Old DataFrame



      +-----+-----+ 
      | make|model|
      +-----+-----+
      |Tesla| S|
      | Ford| E350|
      |Chevy| Volt|
      +-----+-----+


      New Datarame



      +-----+-----+
      | make|model|
      +-----+-----+
      | S| S|
      | Ford| E350|
      |Chevy| Volt|
      +-----+-----+





      share|improve this answer




























        21












        21








        21







        Building off of the solution from @Azeroth2b. If you want to replace only a couple of items and leave the rest unchanged. Do the following. Without using the otherwise(...) method, the remainder of the column becomes null.



        import org.apache.spark.sql.functions._
        val newsdf = sdf.withColumn("make", when(col("make") === "Tesla", "S")
        .otherwise(col("make"))
        );


        Old DataFrame



        +-----+-----+ 
        | make|model|
        +-----+-----+
        |Tesla| S|
        | Ford| E350|
        |Chevy| Volt|
        +-----+-----+


        New Datarame



        +-----+-----+
        | make|model|
        +-----+-----+
        | S| S|
        | Ford| E350|
        |Chevy| Volt|
        +-----+-----+





        share|improve this answer















        Building off of the solution from @Azeroth2b. If you want to replace only a couple of items and leave the rest unchanged. Do the following. Without using the otherwise(...) method, the remainder of the column becomes null.



        import org.apache.spark.sql.functions._
        val newsdf = sdf.withColumn("make", when(col("make") === "Tesla", "S")
        .otherwise(col("make"))
        );


        Old DataFrame



        +-----+-----+ 
        | make|model|
        +-----+-----+
        |Tesla| S|
        | Ford| E350|
        |Chevy| Volt|
        +-----+-----+


        New Datarame



        +-----+-----+
        | make|model|
        +-----+-----+
        | S| S|
        | Ford| E350|
        |Chevy| Volt|
        +-----+-----+






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 16 '18 at 14:46









        ChrisOdney

        1,79182538




        1,79182538










        answered Apr 7 '17 at 14:56









        marshall245marshall245

        21123




        21123























            12














            This can be achieved in dataframes with user defined functions (udf).



            import org.apache.spark.sql.functions._
            val sqlcont = new org.apache.spark.sql.SQLContext(sc)
            val df1 = sqlcont.jsonRDD(sc.parallelize(Array(
            """{"year":2012, "make": "Tesla", "model": "S", "comment": "No Comment", "blank": ""}""",
            """{"year":1997, "make": "Ford", "model": "E350", "comment": "Get one", "blank": ""}""",
            """{"year":2015, "make": "Chevy", "model": "Volt", "comment": "", "blank": ""}"""
            )))

            val makeSIfTesla = udf {(make: String) =>
            if(make == "Tesla") "S" else make
            }
            df1.withColumn("make", makeSIfTesla(df1("make"))).show





            share|improve this answer



















            • 1





              I guess this will improve the performance because you are not converting df to rdd and adding a new column.

              – Nandakishore
              Oct 31 '16 at 19:41











            • This does not result in duplicate make columns?

              – javadba
              Jul 24 '18 at 0:01
















            12














            This can be achieved in dataframes with user defined functions (udf).



            import org.apache.spark.sql.functions._
            val sqlcont = new org.apache.spark.sql.SQLContext(sc)
            val df1 = sqlcont.jsonRDD(sc.parallelize(Array(
            """{"year":2012, "make": "Tesla", "model": "S", "comment": "No Comment", "blank": ""}""",
            """{"year":1997, "make": "Ford", "model": "E350", "comment": "Get one", "blank": ""}""",
            """{"year":2015, "make": "Chevy", "model": "Volt", "comment": "", "blank": ""}"""
            )))

            val makeSIfTesla = udf {(make: String) =>
            if(make == "Tesla") "S" else make
            }
            df1.withColumn("make", makeSIfTesla(df1("make"))).show





            share|improve this answer



















            • 1





              I guess this will improve the performance because you are not converting df to rdd and adding a new column.

              – Nandakishore
              Oct 31 '16 at 19:41











            • This does not result in duplicate make columns?

              – javadba
              Jul 24 '18 at 0:01














            12












            12








            12







            This can be achieved in dataframes with user defined functions (udf).



            import org.apache.spark.sql.functions._
            val sqlcont = new org.apache.spark.sql.SQLContext(sc)
            val df1 = sqlcont.jsonRDD(sc.parallelize(Array(
            """{"year":2012, "make": "Tesla", "model": "S", "comment": "No Comment", "blank": ""}""",
            """{"year":1997, "make": "Ford", "model": "E350", "comment": "Get one", "blank": ""}""",
            """{"year":2015, "make": "Chevy", "model": "Volt", "comment": "", "blank": ""}"""
            )))

            val makeSIfTesla = udf {(make: String) =>
            if(make == "Tesla") "S" else make
            }
            df1.withColumn("make", makeSIfTesla(df1("make"))).show





            share|improve this answer













            This can be achieved in dataframes with user defined functions (udf).



            import org.apache.spark.sql.functions._
            val sqlcont = new org.apache.spark.sql.SQLContext(sc)
            val df1 = sqlcont.jsonRDD(sc.parallelize(Array(
            """{"year":2012, "make": "Tesla", "model": "S", "comment": "No Comment", "blank": ""}""",
            """{"year":1997, "make": "Ford", "model": "E350", "comment": "Get one", "blank": ""}""",
            """{"year":2015, "make": "Chevy", "model": "Volt", "comment": "", "blank": ""}"""
            )))

            val makeSIfTesla = udf {(make: String) =>
            if(make == "Tesla") "S" else make
            }
            df1.withColumn("make", makeSIfTesla(df1("make"))).show






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Sep 17 '15 at 15:15









            Al MAl M

            487410




            487410








            • 1





              I guess this will improve the performance because you are not converting df to rdd and adding a new column.

              – Nandakishore
              Oct 31 '16 at 19:41











            • This does not result in duplicate make columns?

              – javadba
              Jul 24 '18 at 0:01














            • 1





              I guess this will improve the performance because you are not converting df to rdd and adding a new column.

              – Nandakishore
              Oct 31 '16 at 19:41











            • This does not result in duplicate make columns?

              – javadba
              Jul 24 '18 at 0:01








            1




            1





            I guess this will improve the performance because you are not converting df to rdd and adding a new column.

            – Nandakishore
            Oct 31 '16 at 19:41





            I guess this will improve the performance because you are not converting df to rdd and adding a new column.

            – Nandakishore
            Oct 31 '16 at 19:41













            This does not result in duplicate make columns?

            – javadba
            Jul 24 '18 at 0:01





            This does not result in duplicate make columns?

            – javadba
            Jul 24 '18 at 0:01


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f32357774%2fscala-how-can-i-replace-value-in-dataframes-using-scala%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Bressuire

            Vorschmack

            Quarantine