Spark - Scala Mapping the JSON file to dataset of case class without using all JSON attributes

I try to create the case class to be able to map each line of my JSON file, so create a RDD by the JSON file.
I need only some data in a JSON file to create a case class, but I have as error:

cannot resolve '`result`' due to data type mismatch: cannot cast ArrayType(StructType(StructField(hop,LongType,true), StructField(result,ArrayType(StructType(StructField(from,StringType,true), StructField(rtt,DoubleType,true), StructField(ttl,LongType,true)),true),true)),true) to ArrayType(StructType(StructField(hop,DecimalType(38,0),true), StructField(result,ArrayType(StructType(StructField(rtt,DoubleType,true)),true),true)),true);

a JSON line is like :

{"lts": 165, "size": 40, "from": "89.105.202.4", "dst_name": "192.5.5.241", "fw": 4790, "proto": "UDP", "af": 4, "msm_name": "Traceroute", "stored_timestamp": 1514768539, "prb_id": 4247, "result": [{"result": [{"rtt": 1.955, "ttl": 255, "from": "89.105.200.50", "size": 28}, {"rtt": 1.7, "ttl": 255, "from": "10.10.0.5", "size": 28}, {"rtt": 1.709, "ttl": 255, "from": "89.105.200.57", "size": 28}], "hop": 1}, {"result": [{"rtt": 7.543, "ttl": 254, "from": "185.147.12.31", "size": 28}, {"rtt": 3.103, "ttl": 254, "from": "185.147.12.31", "size": 28}, {"rtt": 3.172, "ttl": 254, "from": "185.147.12.0", "size": 28}], "hop": 2}, {"result": [{"rtt": 4.347, "ttl": 253, "from": "185.147.12.19", "size": 28}, {"rtt": 2.876, "ttl": 253, "from": "185.147.12.19", "size": 28}, {"rtt": 3.143, "ttl": 253, "from": "185.147.12.19", "size": 28}], "hop": 3}, {"result": [{"rtt": 3.655, "ttl": 61, "from": "160.242.100.88", "size": 28}, {"rtt": 3.678, "ttl": 61, "from": "160.242.100.88", "size": 28}, {"rtt": 15.568, "ttl": 61, "from": "160.242.100.88", "size": 28}], "hop": 4}, {"result": [{"rtt": 4.263, "ttl": 60, "from": "196.216.48.144", "size": 28}, {"rtt": 6.082, "ttl": 60, "from": "196.216.48.144", "size": 28}, {"rtt": 11.834, "ttl": 60, "from": "196.216.48.144", "size": 28}], "hop": 5}, {"result": [{"rtt": 7.802, "ttl": 249, "from": "193.239.116.112", "size": 28}, {"rtt": 7.691, "ttl": 249, "from": "193.239.116.112", "size": 28}, {"rtt": 7.711, "ttl": 249, "from": "193.239.116.112", "size": 28}], "hop": 6}, {"result": [{"rtt": 8.228, "ttl": 57, "from": "192.5.5.241", "size": 28}, {"rtt": 8.026, "ttl": 57, "from": "192.5.5.241", "size": 28}, {"rtt": 8.254, "ttl": 57, "from": "192.5.5.241", "size": 28}], "hop": 7}], "timestamp": 1514768409, "src_addr": "89.105.202.4", "paris_id": 9, "endtime": 1514768403, "type": "traceroute", "dst_addr": "192.5.5.241", "msm_id": 5004}

my code is as bellow:

package tests

//imports

import org.apache.spark.SparkContext

import org.apache.spark.SparkConf

import org.apache.spark.SparkConf

import org.apache.spark.sql.SparkSession



object shell {



   case class Hop(

       hop:    BigInt,

       result: Seq[Signal])



   case class Signal(

       rtt: Double

   )



  case class Row(

     af:     String,

     from:   String,

     size:   String,

     result: Seq[Hop]

  )

 def main(args: Array[String]): Unit = {



//create configuration

val conf = new SparkConf().setAppName("my first rdd app").setMaster("local")



//create spark context

val sc = new SparkContext(conf)



// find absolute path of json file

val pathToTraceroutesExamples = getClass.getResource("/test/rttAnalysis_sample_0.json")





//create spark session

val spark = SparkSession

  .builder()

  .config(conf)

  .getOrCreate()

import spark.implicits._



//read json file

val logData = spark.read.json(pathToTraceroutesExamples.getPath)



// create a dataset of Row

val datasetLogdata = logData.select("af", "from", "size", "result").as[Row]



//count dataset elements

val count = datasetLogdata.rdd.count()

println(count)

}}

Question : How I can create an RDD containing a list of Row cas class and getting only important data (because an JSON object contains many unused data in my case)

asked Nov 14 '18 at 13:43

samara

12812

add a comment |

I try to create the case class to be able to map each line of my JSON file, so create a RDD by the JSON file.
I need only some data in a JSON file to create a case class, but I have as error:

cannot resolve '`result`' due to data type mismatch: cannot cast ArrayType(StructType(StructField(hop,LongType,true), StructField(result,ArrayType(StructType(StructField(from,StringType,true), StructField(rtt,DoubleType,true), StructField(ttl,LongType,true)),true),true)),true) to ArrayType(StructType(StructField(hop,DecimalType(38,0),true), StructField(result,ArrayType(StructType(StructField(rtt,DoubleType,true)),true),true)),true);

a JSON line is like :

{"lts": 165, "size": 40, "from": "89.105.202.4", "dst_name": "192.5.5.241", "fw": 4790, "proto": "UDP", "af": 4, "msm_name": "Traceroute", "stored_timestamp": 1514768539, "prb_id": 4247, "result": [{"result": [{"rtt": 1.955, "ttl": 255, "from": "89.105.200.50", "size": 28}, {"rtt": 1.7, "ttl": 255, "from": "10.10.0.5", "size": 28}, {"rtt": 1.709, "ttl": 255, "from": "89.105.200.57", "size": 28}], "hop": 1}, {"result": [{"rtt": 7.543, "ttl": 254, "from": "185.147.12.31", "size": 28}, {"rtt": 3.103, "ttl": 254, "from": "185.147.12.31", "size": 28}, {"rtt": 3.172, "ttl": 254, "from": "185.147.12.0", "size": 28}], "hop": 2}, {"result": [{"rtt": 4.347, "ttl": 253, "from": "185.147.12.19", "size": 28}, {"rtt": 2.876, "ttl": 253, "from": "185.147.12.19", "size": 28}, {"rtt": 3.143, "ttl": 253, "from": "185.147.12.19", "size": 28}], "hop": 3}, {"result": [{"rtt": 3.655, "ttl": 61, "from": "160.242.100.88", "size": 28}, {"rtt": 3.678, "ttl": 61, "from": "160.242.100.88", "size": 28}, {"rtt": 15.568, "ttl": 61, "from": "160.242.100.88", "size": 28}], "hop": 4}, {"result": [{"rtt": 4.263, "ttl": 60, "from": "196.216.48.144", "size": 28}, {"rtt": 6.082, "ttl": 60, "from": "196.216.48.144", "size": 28}, {"rtt": 11.834, "ttl": 60, "from": "196.216.48.144", "size": 28}], "hop": 5}, {"result": [{"rtt": 7.802, "ttl": 249, "from": "193.239.116.112", "size": 28}, {"rtt": 7.691, "ttl": 249, "from": "193.239.116.112", "size": 28}, {"rtt": 7.711, "ttl": 249, "from": "193.239.116.112", "size": 28}], "hop": 6}, {"result": [{"rtt": 8.228, "ttl": 57, "from": "192.5.5.241", "size": 28}, {"rtt": 8.026, "ttl": 57, "from": "192.5.5.241", "size": 28}, {"rtt": 8.254, "ttl": 57, "from": "192.5.5.241", "size": 28}], "hop": 7}], "timestamp": 1514768409, "src_addr": "89.105.202.4", "paris_id": 9, "endtime": 1514768403, "type": "traceroute", "dst_addr": "192.5.5.241", "msm_id": 5004}

my code is as bellow:

package tests

//imports

import org.apache.spark.SparkContext

import org.apache.spark.SparkConf

import org.apache.spark.SparkConf

import org.apache.spark.sql.SparkSession



object shell {



   case class Hop(

       hop:    BigInt,

       result: Seq[Signal])



   case class Signal(

       rtt: Double

   )



  case class Row(

     af:     String,

     from:   String,

     size:   String,

     result: Seq[Hop]

  )

 def main(args: Array[String]): Unit = {



//create configuration

val conf = new SparkConf().setAppName("my first rdd app").setMaster("local")



//create spark context

val sc = new SparkContext(conf)



// find absolute path of json file

val pathToTraceroutesExamples = getClass.getResource("/test/rttAnalysis_sample_0.json")





//create spark session

val spark = SparkSession

  .builder()

  .config(conf)

  .getOrCreate()

import spark.implicits._



//read json file

val logData = spark.read.json(pathToTraceroutesExamples.getPath)



// create a dataset of Row

val datasetLogdata = logData.select("af", "from", "size", "result").as[Row]



//count dataset elements

val count = datasetLogdata.rdd.count()

println(count)

}}

Question : How I can create an RDD containing a list of Row cas class and getting only important data (because an JSON object contains many unused data in my case)

asked Nov 14 '18 at 13:43

samara

12812

add a comment |

I try to create the case class to be able to map each line of my JSON file, so create a RDD by the JSON file.
I need only some data in a JSON file to create a case class, but I have as error:

cannot resolve '`result`' due to data type mismatch: cannot cast ArrayType(StructType(StructField(hop,LongType,true), StructField(result,ArrayType(StructType(StructField(from,StringType,true), StructField(rtt,DoubleType,true), StructField(ttl,LongType,true)),true),true)),true) to ArrayType(StructType(StructField(hop,DecimalType(38,0),true), StructField(result,ArrayType(StructType(StructField(rtt,DoubleType,true)),true),true)),true);

a JSON line is like :

{"lts": 165, "size": 40, "from": "89.105.202.4", "dst_name": "192.5.5.241", "fw": 4790, "proto": "UDP", "af": 4, "msm_name": "Traceroute", "stored_timestamp": 1514768539, "prb_id": 4247, "result": [{"result": [{"rtt": 1.955, "ttl": 255, "from": "89.105.200.50", "size": 28}, {"rtt": 1.7, "ttl": 255, "from": "10.10.0.5", "size": 28}, {"rtt": 1.709, "ttl": 255, "from": "89.105.200.57", "size": 28}], "hop": 1}, {"result": [{"rtt": 7.543, "ttl": 254, "from": "185.147.12.31", "size": 28}, {"rtt": 3.103, "ttl": 254, "from": "185.147.12.31", "size": 28}, {"rtt": 3.172, "ttl": 254, "from": "185.147.12.0", "size": 28}], "hop": 2}, {"result": [{"rtt": 4.347, "ttl": 253, "from": "185.147.12.19", "size": 28}, {"rtt": 2.876, "ttl": 253, "from": "185.147.12.19", "size": 28}, {"rtt": 3.143, "ttl": 253, "from": "185.147.12.19", "size": 28}], "hop": 3}, {"result": [{"rtt": 3.655, "ttl": 61, "from": "160.242.100.88", "size": 28}, {"rtt": 3.678, "ttl": 61, "from": "160.242.100.88", "size": 28}, {"rtt": 15.568, "ttl": 61, "from": "160.242.100.88", "size": 28}], "hop": 4}, {"result": [{"rtt": 4.263, "ttl": 60, "from": "196.216.48.144", "size": 28}, {"rtt": 6.082, "ttl": 60, "from": "196.216.48.144", "size": 28}, {"rtt": 11.834, "ttl": 60, "from": "196.216.48.144", "size": 28}], "hop": 5}, {"result": [{"rtt": 7.802, "ttl": 249, "from": "193.239.116.112", "size": 28}, {"rtt": 7.691, "ttl": 249, "from": "193.239.116.112", "size": 28}, {"rtt": 7.711, "ttl": 249, "from": "193.239.116.112", "size": 28}], "hop": 6}, {"result": [{"rtt": 8.228, "ttl": 57, "from": "192.5.5.241", "size": 28}, {"rtt": 8.026, "ttl": 57, "from": "192.5.5.241", "size": 28}, {"rtt": 8.254, "ttl": 57, "from": "192.5.5.241", "size": 28}], "hop": 7}], "timestamp": 1514768409, "src_addr": "89.105.202.4", "paris_id": 9, "endtime": 1514768403, "type": "traceroute", "dst_addr": "192.5.5.241", "msm_id": 5004}

my code is as bellow:

package tests

//imports

import org.apache.spark.SparkContext

import org.apache.spark.SparkConf

import org.apache.spark.SparkConf

import org.apache.spark.sql.SparkSession



object shell {



   case class Hop(

       hop:    BigInt,

       result: Seq[Signal])



   case class Signal(

       rtt: Double

   )



  case class Row(

     af:     String,

     from:   String,

     size:   String,

     result: Seq[Hop]

  )

 def main(args: Array[String]): Unit = {



//create configuration

val conf = new SparkConf().setAppName("my first rdd app").setMaster("local")



//create spark context

val sc = new SparkContext(conf)



// find absolute path of json file

val pathToTraceroutesExamples = getClass.getResource("/test/rttAnalysis_sample_0.json")





//create spark session

val spark = SparkSession

  .builder()

  .config(conf)

  .getOrCreate()

import spark.implicits._



//read json file

val logData = spark.read.json(pathToTraceroutesExamples.getPath)



// create a dataset of Row

val datasetLogdata = logData.select("af", "from", "size", "result").as[Row]



//count dataset elements

val count = datasetLogdata.rdd.count()

println(count)

}}

Question : How I can create an RDD containing a list of Row cas class and getting only important data (because an JSON object contains many unused data in my case)

asked Nov 14 '18 at 13:43

samara

12812

I try to create the case class to be able to map each line of my JSON file, so create a RDD by the JSON file.
I need only some data in a JSON file to create a case class, but I have as error:

cannot resolve '`result`' due to data type mismatch: cannot cast ArrayType(StructType(StructField(hop,LongType,true), StructField(result,ArrayType(StructType(StructField(from,StringType,true), StructField(rtt,DoubleType,true), StructField(ttl,LongType,true)),true),true)),true) to ArrayType(StructType(StructField(hop,DecimalType(38,0),true), StructField(result,ArrayType(StructType(StructField(rtt,DoubleType,true)),true),true)),true);

a JSON line is like :

{"lts": 165, "size": 40, "from": "89.105.202.4", "dst_name": "192.5.5.241", "fw": 4790, "proto": "UDP", "af": 4, "msm_name": "Traceroute", "stored_timestamp": 1514768539, "prb_id": 4247, "result": [{"result": [{"rtt": 1.955, "ttl": 255, "from": "89.105.200.50", "size": 28}, {"rtt": 1.7, "ttl": 255, "from": "10.10.0.5", "size": 28}, {"rtt": 1.709, "ttl": 255, "from": "89.105.200.57", "size": 28}], "hop": 1}, {"result": [{"rtt": 7.543, "ttl": 254, "from": "185.147.12.31", "size": 28}, {"rtt": 3.103, "ttl": 254, "from": "185.147.12.31", "size": 28}, {"rtt": 3.172, "ttl": 254, "from": "185.147.12.0", "size": 28}], "hop": 2}, {"result": [{"rtt": 4.347, "ttl": 253, "from": "185.147.12.19", "size": 28}, {"rtt": 2.876, "ttl": 253, "from": "185.147.12.19", "size": 28}, {"rtt": 3.143, "ttl": 253, "from": "185.147.12.19", "size": 28}], "hop": 3}, {"result": [{"rtt": 3.655, "ttl": 61, "from": "160.242.100.88", "size": 28}, {"rtt": 3.678, "ttl": 61, "from": "160.242.100.88", "size": 28}, {"rtt": 15.568, "ttl": 61, "from": "160.242.100.88", "size": 28}], "hop": 4}, {"result": [{"rtt": 4.263, "ttl": 60, "from": "196.216.48.144", "size": 28}, {"rtt": 6.082, "ttl": 60, "from": "196.216.48.144", "size": 28}, {"rtt": 11.834, "ttl": 60, "from": "196.216.48.144", "size": 28}], "hop": 5}, {"result": [{"rtt": 7.802, "ttl": 249, "from": "193.239.116.112", "size": 28}, {"rtt": 7.691, "ttl": 249, "from": "193.239.116.112", "size": 28}, {"rtt": 7.711, "ttl": 249, "from": "193.239.116.112", "size": 28}], "hop": 6}, {"result": [{"rtt": 8.228, "ttl": 57, "from": "192.5.5.241", "size": 28}, {"rtt": 8.026, "ttl": 57, "from": "192.5.5.241", "size": 28}, {"rtt": 8.254, "ttl": 57, "from": "192.5.5.241", "size": 28}], "hop": 7}], "timestamp": 1514768409, "src_addr": "89.105.202.4", "paris_id": 9, "endtime": 1514768403, "type": "traceroute", "dst_addr": "192.5.5.241", "msm_id": 5004}

my code is as bellow:

package tests

//imports

import org.apache.spark.SparkContext

import org.apache.spark.SparkConf

import org.apache.spark.SparkConf

import org.apache.spark.sql.SparkSession



object shell {



   case class Hop(

       hop:    BigInt,

       result: Seq[Signal])



   case class Signal(

       rtt: Double

   )



  case class Row(

     af:     String,

     from:   String,

     size:   String,

     result: Seq[Hop]

  )

 def main(args: Array[String]): Unit = {



//create configuration

val conf = new SparkConf().setAppName("my first rdd app").setMaster("local")



//create spark context

val sc = new SparkContext(conf)



// find absolute path of json file

val pathToTraceroutesExamples = getClass.getResource("/test/rttAnalysis_sample_0.json")





//create spark session

val spark = SparkSession

  .builder()

  .config(conf)

  .getOrCreate()

import spark.implicits._



//read json file

val logData = spark.read.json(pathToTraceroutesExamples.getPath)



// create a dataset of Row

val datasetLogdata = logData.select("af", "from", "size", "result").as[Row]



//count dataset elements

val count = datasetLogdata.rdd.count()

println(count)

}}

Question : How I can create an RDD containing a list of Row cas class and getting only important data (because an JSON object contains many unused data in my case)

json scala apache-spark

asked Nov 14 '18 at 13:43

samara

12812

asked Nov 14 '18 at 13:43

samara

12812

asked Nov 14 '18 at 13:43

samara

12812

asked Nov 14 '18 at 13:43

samara

12812

asked Nov 14 '18 at 13:43

samara

12812

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53301655%2fspark-scala-mapping-the-json-file-to-dataset-of-case-class-without-using-all-j%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

nvTadj,SDHpyW5v5b2n3Sq0dEZA2JWuSx2os47acTjFUxTwf,X ubCkheLYEtxZFH8L9t2kMK,ICh8Vs3kCCv CDD

搜尋此網誌

Vfrdtyky