How to create new Column using withColumn to concentrate two numeric conlumn as String ? [duplicate]

This question already has an answer here:

Concatenate columns in Apache Spark DataFrame

10 answers

I have the dataframe as follow

val employees = sc.parallelize(Array[(String, Int, BigInt)](

  ("Rafferty", 31, 222222222), ("Jones", 33, 111111111), ("Heisenberg", 33, 222222222), ("Robinson", 34, 111111111), ("Smith", 34, 333333333), ("Williams", 15, 222222222)

)).toDF("LastName", "DepartmentID", "Code")



employees.show()



 +----------+------------+---------+

|  LastName|DepartmentID|     Code|

+----------+------------+---------+

|  Rafferty|          31|222222222|

|     Jones|          33|111111111|

|Heisenberg|          33|222222222|

|  Robinson|          34|111111111|

|     Smith|          34|333333333|

|  Williams|          15|222222222|

+----------+------------+---------+

I want to create another column as personal_id as concentrate DepartmentId and Code. Example: Rafferty => 31222222222

So I write code as follow:

val anotherdf = employees.withColumn("personal_id", $"DepartmentID".cast("String") + $"Code".cast("String"))





 +----------+------------+---------+------------+

|  LastName|DepartmentID|     Code| personal_id|

+----------+------------+---------+------------+

|  Rafferty|          31|222222222|2.22222253E8|

|     Jones|          33|111111111|1.11111144E8|

|Heisenberg|          33|222222222|2.22222255E8|

|  Robinson|          34|111111111|1.11111145E8|

|     Smith|          34|333333333|3.33333367E8|

|  Williams|          15|222222222|2.22222237E8|

+----------+------------+---------+------------+

But I got personal_id at double.

anotherdf.printSchema



root

 |-- LastName: string (nullable = true)

 |-- DepartmentID: integer (nullable = false)

 |-- Code: decimal(38,0) (nullable = true)

 |-- personal_id: double (nullable = true)

asked Nov 16 '18 at 4:44

Haha TTpro

1,51831035

marked as duplicate by Shaido, user6910411 apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 16 '18 at 10:28

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a comment |

This question already has an answer here:

Concatenate columns in Apache Spark DataFrame

10 answers

I have the dataframe as follow

val employees = sc.parallelize(Array[(String, Int, BigInt)](

  ("Rafferty", 31, 222222222), ("Jones", 33, 111111111), ("Heisenberg", 33, 222222222), ("Robinson", 34, 111111111), ("Smith", 34, 333333333), ("Williams", 15, 222222222)

)).toDF("LastName", "DepartmentID", "Code")



employees.show()



 +----------+------------+---------+

|  LastName|DepartmentID|     Code|

+----------+------------+---------+

|  Rafferty|          31|222222222|

|     Jones|          33|111111111|

|Heisenberg|          33|222222222|

|  Robinson|          34|111111111|

|     Smith|          34|333333333|

|  Williams|          15|222222222|

+----------+------------+---------+

I want to create another column as personal_id as concentrate DepartmentId and Code. Example: Rafferty => 31222222222

So I write code as follow:

val anotherdf = employees.withColumn("personal_id", $"DepartmentID".cast("String") + $"Code".cast("String"))





 +----------+------------+---------+------------+

|  LastName|DepartmentID|     Code| personal_id|

+----------+------------+---------+------------+

|  Rafferty|          31|222222222|2.22222253E8|

|     Jones|          33|111111111|1.11111144E8|

|Heisenberg|          33|222222222|2.22222255E8|

|  Robinson|          34|111111111|1.11111145E8|

|     Smith|          34|333333333|3.33333367E8|

|  Williams|          15|222222222|2.22222237E8|

+----------+------------+---------+------------+

But I got personal_id at double.

anotherdf.printSchema



root

 |-- LastName: string (nullable = true)

 |-- DepartmentID: integer (nullable = false)

 |-- Code: decimal(38,0) (nullable = true)

 |-- personal_id: double (nullable = true)

asked Nov 16 '18 at 4:44

Haha TTpro

1,51831035

marked as duplicate by Shaido, user6910411 apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 16 '18 at 10:28

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a comment |

This question already has an answer here:

Concatenate columns in Apache Spark DataFrame

10 answers

I have the dataframe as follow

val employees = sc.parallelize(Array[(String, Int, BigInt)](

  ("Rafferty", 31, 222222222), ("Jones", 33, 111111111), ("Heisenberg", 33, 222222222), ("Robinson", 34, 111111111), ("Smith", 34, 333333333), ("Williams", 15, 222222222)

)).toDF("LastName", "DepartmentID", "Code")



employees.show()



 +----------+------------+---------+

|  LastName|DepartmentID|     Code|

+----------+------------+---------+

|  Rafferty|          31|222222222|

|     Jones|          33|111111111|

|Heisenberg|          33|222222222|

|  Robinson|          34|111111111|

|     Smith|          34|333333333|

|  Williams|          15|222222222|

+----------+------------+---------+

I want to create another column as personal_id as concentrate DepartmentId and Code. Example: Rafferty => 31222222222

So I write code as follow:

val anotherdf = employees.withColumn("personal_id", $"DepartmentID".cast("String") + $"Code".cast("String"))





 +----------+------------+---------+------------+

|  LastName|DepartmentID|     Code| personal_id|

+----------+------------+---------+------------+

|  Rafferty|          31|222222222|2.22222253E8|

|     Jones|          33|111111111|1.11111144E8|

|Heisenberg|          33|222222222|2.22222255E8|

|  Robinson|          34|111111111|1.11111145E8|

|     Smith|          34|333333333|3.33333367E8|

|  Williams|          15|222222222|2.22222237E8|

+----------+------------+---------+------------+

But I got personal_id at double.

anotherdf.printSchema



root

 |-- LastName: string (nullable = true)

 |-- DepartmentID: integer (nullable = false)

 |-- Code: decimal(38,0) (nullable = true)

 |-- personal_id: double (nullable = true)

asked Nov 16 '18 at 4:44

Haha TTpro

1,51831035

This question already has an answer here:

Concatenate columns in Apache Spark DataFrame

10 answers

I have the dataframe as follow

val employees = sc.parallelize(Array[(String, Int, BigInt)](

  ("Rafferty", 31, 222222222), ("Jones", 33, 111111111), ("Heisenberg", 33, 222222222), ("Robinson", 34, 111111111), ("Smith", 34, 333333333), ("Williams", 15, 222222222)

)).toDF("LastName", "DepartmentID", "Code")



employees.show()



 +----------+------------+---------+

|  LastName|DepartmentID|     Code|

+----------+------------+---------+

|  Rafferty|          31|222222222|

|     Jones|          33|111111111|

|Heisenberg|          33|222222222|

|  Robinson|          34|111111111|

|     Smith|          34|333333333|

|  Williams|          15|222222222|

+----------+------------+---------+

I want to create another column as personal_id as concentrate DepartmentId and Code. Example: Rafferty => 31222222222

So I write code as follow:

val anotherdf = employees.withColumn("personal_id", $"DepartmentID".cast("String") + $"Code".cast("String"))





 +----------+------------+---------+------------+

|  LastName|DepartmentID|     Code| personal_id|

+----------+------------+---------+------------+

|  Rafferty|          31|222222222|2.22222253E8|

|     Jones|          33|111111111|1.11111144E8|

|Heisenberg|          33|222222222|2.22222255E8|

|  Robinson|          34|111111111|1.11111145E8|

|     Smith|          34|333333333|3.33333367E8|

|  Williams|          15|222222222|2.22222237E8|

+----------+------------+---------+------------+

But I got personal_id at double.

anotherdf.printSchema



root

 |-- LastName: string (nullable = true)

 |-- DepartmentID: integer (nullable = false)

 |-- Code: decimal(38,0) (nullable = true)

 |-- personal_id: double (nullable = true)

This question already has an answer here:

Concatenate columns in Apache Spark DataFrame

10 answers

scala apache-spark apache-spark-sql

asked Nov 16 '18 at 4:44

Haha TTpro

1,51831035

asked Nov 16 '18 at 4:44

Haha TTpro

1,51831035

asked Nov 16 '18 at 4:44

Haha TTpro

1,51831035

asked Nov 16 '18 at 4:44

Haha TTpro

1,51831035

asked Nov 16 '18 at 4:44

Haha TTpro

1,51831035

marked as duplicate by Shaido, user6910411 apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 16 '18 at 10:28

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

marked as duplicate by Shaido, user6910411 apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 16 '18 at 10:28

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a comment |

1 Answer
1

active

oldest

votes

I should use concat

import org.apache.spark.sql.functions.concat

val anotherdf2 = employees.withColumn("personal_id", concat($"DepartmentID".cast("String"), $"Code".cast("String")))





 +----------+------------+---------+-----------+

|  LastName|DepartmentID|     Code|personal_id|

+----------+------------+---------+-----------+

|  Rafferty|          31|222222222|31222222222|

|     Jones|          33|111111111|33111111111|

|Heisenberg|          33|222222222|33222222222|

|  Robinson|          34|111111111|34111111111|

|     Smith|          34|333333333|34333333333|

|  Williams|          15|222222222|15222222222|

+----------+------------+---------+-----------+

answered Nov 16 '18 at 4:51

Haha TTpro

1,51831035

add a comment |

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

I should use concat

import org.apache.spark.sql.functions.concat

val anotherdf2 = employees.withColumn("personal_id", concat($"DepartmentID".cast("String"), $"Code".cast("String")))





 +----------+------------+---------+-----------+

|  LastName|DepartmentID|     Code|personal_id|

+----------+------------+---------+-----------+

|  Rafferty|          31|222222222|31222222222|

|     Jones|          33|111111111|33111111111|

|Heisenberg|          33|222222222|33222222222|

|  Robinson|          34|111111111|34111111111|

|     Smith|          34|333333333|34333333333|

|  Williams|          15|222222222|15222222222|

+----------+------------+---------+-----------+

answered Nov 16 '18 at 4:51

Haha TTpro

1,51831035

add a comment |

I should use concat

import org.apache.spark.sql.functions.concat

val anotherdf2 = employees.withColumn("personal_id", concat($"DepartmentID".cast("String"), $"Code".cast("String")))





 +----------+------------+---------+-----------+

|  LastName|DepartmentID|     Code|personal_id|

+----------+------------+---------+-----------+

|  Rafferty|          31|222222222|31222222222|

|     Jones|          33|111111111|33111111111|

|Heisenberg|          33|222222222|33222222222|

|  Robinson|          34|111111111|34111111111|

|     Smith|          34|333333333|34333333333|

|  Williams|          15|222222222|15222222222|

+----------+------------+---------+-----------+

answered Nov 16 '18 at 4:51

Haha TTpro

1,51831035

add a comment |

I should use concat

import org.apache.spark.sql.functions.concat

val anotherdf2 = employees.withColumn("personal_id", concat($"DepartmentID".cast("String"), $"Code".cast("String")))





 +----------+------------+---------+-----------+

|  LastName|DepartmentID|     Code|personal_id|

+----------+------------+---------+-----------+

|  Rafferty|          31|222222222|31222222222|

|     Jones|          33|111111111|33111111111|

|Heisenberg|          33|222222222|33222222222|

|  Robinson|          34|111111111|34111111111|

|     Smith|          34|333333333|34333333333|

|  Williams|          15|222222222|15222222222|

+----------+------------+---------+-----------+

answered Nov 16 '18 at 4:51

Haha TTpro

1,51831035

I should use concat

import org.apache.spark.sql.functions.concat

val anotherdf2 = employees.withColumn("personal_id", concat($"DepartmentID".cast("String"), $"Code".cast("String")))





 +----------+------------+---------+-----------+

|  LastName|DepartmentID|     Code|personal_id|

+----------+------------+---------+-----------+

|  Rafferty|          31|222222222|31222222222|

|     Jones|          33|111111111|33111111111|

|Heisenberg|          33|222222222|33222222222|

|  Robinson|          34|111111111|34111111111|

|     Smith|          34|333333333|34333333333|

|  Williams|          15|222222222|15222222222|

+----------+------------+---------+-----------+

answered Nov 16 '18 at 4:51

Haha TTpro

1,51831035

answered Nov 16 '18 at 4:51

Haha TTpro

1,51831035

answered Nov 16 '18 at 4:51

Haha TTpro

1,51831035

answered Nov 16 '18 at 4:51

Haha TTpro

1,51831035

add a comment |

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vfrdtyky