How to create new Column using withColumn to concentrate two numeric conlumn as String ? [duplicate]
This question already has an answer here:
Concatenate columns in Apache Spark DataFrame
10 answers
I have the dataframe as follow
val employees = sc.parallelize(Array[(String, Int, BigInt)](
("Rafferty", 31, 222222222), ("Jones", 33, 111111111), ("Heisenberg", 33, 222222222), ("Robinson", 34, 111111111), ("Smith", 34, 333333333), ("Williams", 15, 222222222)
)).toDF("LastName", "DepartmentID", "Code")
employees.show()
+----------+------------+---------+
| LastName|DepartmentID| Code|
+----------+------------+---------+
| Rafferty| 31|222222222|
| Jones| 33|111111111|
|Heisenberg| 33|222222222|
| Robinson| 34|111111111|
| Smith| 34|333333333|
| Williams| 15|222222222|
+----------+------------+---------+
I want to create another column as personal_id as concentrate DepartmentId and Code. Example: Rafferty => 31222222222
So I write code as follow:
val anotherdf = employees.withColumn("personal_id", $"DepartmentID".cast("String") + $"Code".cast("String"))
+----------+------------+---------+------------+
| LastName|DepartmentID| Code| personal_id|
+----------+------------+---------+------------+
| Rafferty| 31|222222222|2.22222253E8|
| Jones| 33|111111111|1.11111144E8|
|Heisenberg| 33|222222222|2.22222255E8|
| Robinson| 34|111111111|1.11111145E8|
| Smith| 34|333333333|3.33333367E8|
| Williams| 15|222222222|2.22222237E8|
+----------+------------+---------+------------+
But I got personal_id at double.
anotherdf.printSchema
root
|-- LastName: string (nullable = true)
|-- DepartmentID: integer (nullable = false)
|-- Code: decimal(38,0) (nullable = true)
|-- personal_id: double (nullable = true)
scala apache-spark apache-spark-sql
marked as duplicate by Shaido, user6910411
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 16 '18 at 10:28
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
This question already has an answer here:
Concatenate columns in Apache Spark DataFrame
10 answers
I have the dataframe as follow
val employees = sc.parallelize(Array[(String, Int, BigInt)](
("Rafferty", 31, 222222222), ("Jones", 33, 111111111), ("Heisenberg", 33, 222222222), ("Robinson", 34, 111111111), ("Smith", 34, 333333333), ("Williams", 15, 222222222)
)).toDF("LastName", "DepartmentID", "Code")
employees.show()
+----------+------------+---------+
| LastName|DepartmentID| Code|
+----------+------------+---------+
| Rafferty| 31|222222222|
| Jones| 33|111111111|
|Heisenberg| 33|222222222|
| Robinson| 34|111111111|
| Smith| 34|333333333|
| Williams| 15|222222222|
+----------+------------+---------+
I want to create another column as personal_id as concentrate DepartmentId and Code. Example: Rafferty => 31222222222
So I write code as follow:
val anotherdf = employees.withColumn("personal_id", $"DepartmentID".cast("String") + $"Code".cast("String"))
+----------+------------+---------+------------+
| LastName|DepartmentID| Code| personal_id|
+----------+------------+---------+------------+
| Rafferty| 31|222222222|2.22222253E8|
| Jones| 33|111111111|1.11111144E8|
|Heisenberg| 33|222222222|2.22222255E8|
| Robinson| 34|111111111|1.11111145E8|
| Smith| 34|333333333|3.33333367E8|
| Williams| 15|222222222|2.22222237E8|
+----------+------------+---------+------------+
But I got personal_id at double.
anotherdf.printSchema
root
|-- LastName: string (nullable = true)
|-- DepartmentID: integer (nullable = false)
|-- Code: decimal(38,0) (nullable = true)
|-- personal_id: double (nullable = true)
scala apache-spark apache-spark-sql
marked as duplicate by Shaido, user6910411
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 16 '18 at 10:28
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
This question already has an answer here:
Concatenate columns in Apache Spark DataFrame
10 answers
I have the dataframe as follow
val employees = sc.parallelize(Array[(String, Int, BigInt)](
("Rafferty", 31, 222222222), ("Jones", 33, 111111111), ("Heisenberg", 33, 222222222), ("Robinson", 34, 111111111), ("Smith", 34, 333333333), ("Williams", 15, 222222222)
)).toDF("LastName", "DepartmentID", "Code")
employees.show()
+----------+------------+---------+
| LastName|DepartmentID| Code|
+----------+------------+---------+
| Rafferty| 31|222222222|
| Jones| 33|111111111|
|Heisenberg| 33|222222222|
| Robinson| 34|111111111|
| Smith| 34|333333333|
| Williams| 15|222222222|
+----------+------------+---------+
I want to create another column as personal_id as concentrate DepartmentId and Code. Example: Rafferty => 31222222222
So I write code as follow:
val anotherdf = employees.withColumn("personal_id", $"DepartmentID".cast("String") + $"Code".cast("String"))
+----------+------------+---------+------------+
| LastName|DepartmentID| Code| personal_id|
+----------+------------+---------+------------+
| Rafferty| 31|222222222|2.22222253E8|
| Jones| 33|111111111|1.11111144E8|
|Heisenberg| 33|222222222|2.22222255E8|
| Robinson| 34|111111111|1.11111145E8|
| Smith| 34|333333333|3.33333367E8|
| Williams| 15|222222222|2.22222237E8|
+----------+------------+---------+------------+
But I got personal_id at double.
anotherdf.printSchema
root
|-- LastName: string (nullable = true)
|-- DepartmentID: integer (nullable = false)
|-- Code: decimal(38,0) (nullable = true)
|-- personal_id: double (nullable = true)
scala apache-spark apache-spark-sql
This question already has an answer here:
Concatenate columns in Apache Spark DataFrame
10 answers
I have the dataframe as follow
val employees = sc.parallelize(Array[(String, Int, BigInt)](
("Rafferty", 31, 222222222), ("Jones", 33, 111111111), ("Heisenberg", 33, 222222222), ("Robinson", 34, 111111111), ("Smith", 34, 333333333), ("Williams", 15, 222222222)
)).toDF("LastName", "DepartmentID", "Code")
employees.show()
+----------+------------+---------+
| LastName|DepartmentID| Code|
+----------+------------+---------+
| Rafferty| 31|222222222|
| Jones| 33|111111111|
|Heisenberg| 33|222222222|
| Robinson| 34|111111111|
| Smith| 34|333333333|
| Williams| 15|222222222|
+----------+------------+---------+
I want to create another column as personal_id as concentrate DepartmentId and Code. Example: Rafferty => 31222222222
So I write code as follow:
val anotherdf = employees.withColumn("personal_id", $"DepartmentID".cast("String") + $"Code".cast("String"))
+----------+------------+---------+------------+
| LastName|DepartmentID| Code| personal_id|
+----------+------------+---------+------------+
| Rafferty| 31|222222222|2.22222253E8|
| Jones| 33|111111111|1.11111144E8|
|Heisenberg| 33|222222222|2.22222255E8|
| Robinson| 34|111111111|1.11111145E8|
| Smith| 34|333333333|3.33333367E8|
| Williams| 15|222222222|2.22222237E8|
+----------+------------+---------+------------+
But I got personal_id at double.
anotherdf.printSchema
root
|-- LastName: string (nullable = true)
|-- DepartmentID: integer (nullable = false)
|-- Code: decimal(38,0) (nullable = true)
|-- personal_id: double (nullable = true)
This question already has an answer here:
Concatenate columns in Apache Spark DataFrame
10 answers
scala apache-spark apache-spark-sql
scala apache-spark apache-spark-sql
asked Nov 16 '18 at 4:44
Haha TTproHaha TTpro
1,51831035
1,51831035
marked as duplicate by Shaido, user6910411
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 16 '18 at 10:28
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by Shaido, user6910411
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 16 '18 at 10:28
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
I should use concat
import org.apache.spark.sql.functions.concat
val anotherdf2 = employees.withColumn("personal_id", concat($"DepartmentID".cast("String"), $"Code".cast("String")))
+----------+------------+---------+-----------+
| LastName|DepartmentID| Code|personal_id|
+----------+------------+---------+-----------+
| Rafferty| 31|222222222|31222222222|
| Jones| 33|111111111|33111111111|
|Heisenberg| 33|222222222|33222222222|
| Robinson| 34|111111111|34111111111|
| Smith| 34|333333333|34333333333|
| Williams| 15|222222222|15222222222|
+----------+------------+---------+-----------+
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I should use concat
import org.apache.spark.sql.functions.concat
val anotherdf2 = employees.withColumn("personal_id", concat($"DepartmentID".cast("String"), $"Code".cast("String")))
+----------+------------+---------+-----------+
| LastName|DepartmentID| Code|personal_id|
+----------+------------+---------+-----------+
| Rafferty| 31|222222222|31222222222|
| Jones| 33|111111111|33111111111|
|Heisenberg| 33|222222222|33222222222|
| Robinson| 34|111111111|34111111111|
| Smith| 34|333333333|34333333333|
| Williams| 15|222222222|15222222222|
+----------+------------+---------+-----------+
add a comment |
I should use concat
import org.apache.spark.sql.functions.concat
val anotherdf2 = employees.withColumn("personal_id", concat($"DepartmentID".cast("String"), $"Code".cast("String")))
+----------+------------+---------+-----------+
| LastName|DepartmentID| Code|personal_id|
+----------+------------+---------+-----------+
| Rafferty| 31|222222222|31222222222|
| Jones| 33|111111111|33111111111|
|Heisenberg| 33|222222222|33222222222|
| Robinson| 34|111111111|34111111111|
| Smith| 34|333333333|34333333333|
| Williams| 15|222222222|15222222222|
+----------+------------+---------+-----------+
add a comment |
I should use concat
import org.apache.spark.sql.functions.concat
val anotherdf2 = employees.withColumn("personal_id", concat($"DepartmentID".cast("String"), $"Code".cast("String")))
+----------+------------+---------+-----------+
| LastName|DepartmentID| Code|personal_id|
+----------+------------+---------+-----------+
| Rafferty| 31|222222222|31222222222|
| Jones| 33|111111111|33111111111|
|Heisenberg| 33|222222222|33222222222|
| Robinson| 34|111111111|34111111111|
| Smith| 34|333333333|34333333333|
| Williams| 15|222222222|15222222222|
+----------+------------+---------+-----------+
I should use concat
import org.apache.spark.sql.functions.concat
val anotherdf2 = employees.withColumn("personal_id", concat($"DepartmentID".cast("String"), $"Code".cast("String")))
+----------+------------+---------+-----------+
| LastName|DepartmentID| Code|personal_id|
+----------+------------+---------+-----------+
| Rafferty| 31|222222222|31222222222|
| Jones| 33|111111111|33111111111|
|Heisenberg| 33|222222222|33222222222|
| Robinson| 34|111111111|34111111111|
| Smith| 34|333333333|34333333333|
| Williams| 15|222222222|15222222222|
+----------+------------+---------+-----------+
answered Nov 16 '18 at 4:51
Haha TTproHaha TTpro
1,51831035
1,51831035
add a comment |
add a comment |