Spark - Scala Mapping the JSON file to dataset of case class without using all JSON attributes












0















I try to create the case class to be able to map each line of my JSON file, so create a RDD by the JSON file.
I need only some data in a JSON file to create a case class, but I have as error:



cannot resolve '`result`' due to data type mismatch: cannot cast ArrayType(StructType(StructField(hop,LongType,true), StructField(result,ArrayType(StructType(StructField(from,StringType,true), StructField(rtt,DoubleType,true), StructField(ttl,LongType,true)),true),true)),true) to ArrayType(StructType(StructField(hop,DecimalType(38,0),true), StructField(result,ArrayType(StructType(StructField(rtt,DoubleType,true)),true),true)),true);


a JSON line is like :



{"lts": 165, "size": 40, "from": "89.105.202.4", "dst_name": "192.5.5.241", "fw": 4790, "proto": "UDP", "af": 4, "msm_name": "Traceroute", "stored_timestamp": 1514768539, "prb_id": 4247, "result": [{"result": [{"rtt": 1.955, "ttl": 255, "from": "89.105.200.50", "size": 28}, {"rtt": 1.7, "ttl": 255, "from": "10.10.0.5", "size": 28}, {"rtt": 1.709, "ttl": 255, "from": "89.105.200.57", "size": 28}], "hop": 1}, {"result": [{"rtt": 7.543, "ttl": 254, "from": "185.147.12.31", "size": 28}, {"rtt": 3.103, "ttl": 254, "from": "185.147.12.31", "size": 28}, {"rtt": 3.172, "ttl": 254, "from": "185.147.12.0", "size": 28}], "hop": 2}, {"result": [{"rtt": 4.347, "ttl": 253, "from": "185.147.12.19", "size": 28}, {"rtt": 2.876, "ttl": 253, "from": "185.147.12.19", "size": 28}, {"rtt": 3.143, "ttl": 253, "from": "185.147.12.19", "size": 28}], "hop": 3}, {"result": [{"rtt": 3.655, "ttl": 61, "from": "160.242.100.88", "size": 28}, {"rtt": 3.678, "ttl": 61, "from": "160.242.100.88", "size": 28}, {"rtt": 15.568, "ttl": 61, "from": "160.242.100.88", "size": 28}], "hop": 4}, {"result": [{"rtt": 4.263, "ttl": 60, "from": "196.216.48.144", "size": 28}, {"rtt": 6.082, "ttl": 60, "from": "196.216.48.144", "size": 28}, {"rtt": 11.834, "ttl": 60, "from": "196.216.48.144", "size": 28}], "hop": 5}, {"result": [{"rtt": 7.802, "ttl": 249, "from": "193.239.116.112", "size": 28}, {"rtt": 7.691, "ttl": 249, "from": "193.239.116.112", "size": 28}, {"rtt": 7.711, "ttl": 249, "from": "193.239.116.112", "size": 28}], "hop": 6}, {"result": [{"rtt": 8.228, "ttl": 57, "from": "192.5.5.241", "size": 28}, {"rtt": 8.026, "ttl": 57, "from": "192.5.5.241", "size": 28}, {"rtt": 8.254, "ttl": 57, "from": "192.5.5.241", "size": 28}], "hop": 7}], "timestamp": 1514768409, "src_addr": "89.105.202.4", "paris_id": 9, "endtime": 1514768403, "type": "traceroute", "dst_addr": "192.5.5.241", "msm_id": 5004}


my code is as bellow:



package tests
//imports
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession

object shell {

case class Hop(
hop: BigInt,
result: Seq[Signal])

case class Signal(
rtt: Double
)

case class Row(
af: String,
from: String,
size: String,
result: Seq[Hop]
)
def main(args: Array[String]): Unit = {

//create configuration
val conf = new SparkConf().setAppName("my first rdd app").setMaster("local")

//create spark context
val sc = new SparkContext(conf)

// find absolute path of json file
val pathToTraceroutesExamples = getClass.getResource("/test/rttAnalysis_sample_0.json")


//create spark session
val spark = SparkSession
.builder()
.config(conf)
.getOrCreate()
import spark.implicits._

//read json file
val logData = spark.read.json(pathToTraceroutesExamples.getPath)

// create a dataset of Row
val datasetLogdata = logData.select("af", "from", "size", "result").as[Row]

//count dataset elements
val count = datasetLogdata.rdd.count()
println(count)
}}


Question : How I can create an RDD containing a list of Row cas class and getting only important data (because an JSON object contains many unused data in my case)










share|improve this question



























    0















    I try to create the case class to be able to map each line of my JSON file, so create a RDD by the JSON file.
    I need only some data in a JSON file to create a case class, but I have as error:



    cannot resolve '`result`' due to data type mismatch: cannot cast ArrayType(StructType(StructField(hop,LongType,true), StructField(result,ArrayType(StructType(StructField(from,StringType,true), StructField(rtt,DoubleType,true), StructField(ttl,LongType,true)),true),true)),true) to ArrayType(StructType(StructField(hop,DecimalType(38,0),true), StructField(result,ArrayType(StructType(StructField(rtt,DoubleType,true)),true),true)),true);


    a JSON line is like :



    {"lts": 165, "size": 40, "from": "89.105.202.4", "dst_name": "192.5.5.241", "fw": 4790, "proto": "UDP", "af": 4, "msm_name": "Traceroute", "stored_timestamp": 1514768539, "prb_id": 4247, "result": [{"result": [{"rtt": 1.955, "ttl": 255, "from": "89.105.200.50", "size": 28}, {"rtt": 1.7, "ttl": 255, "from": "10.10.0.5", "size": 28}, {"rtt": 1.709, "ttl": 255, "from": "89.105.200.57", "size": 28}], "hop": 1}, {"result": [{"rtt": 7.543, "ttl": 254, "from": "185.147.12.31", "size": 28}, {"rtt": 3.103, "ttl": 254, "from": "185.147.12.31", "size": 28}, {"rtt": 3.172, "ttl": 254, "from": "185.147.12.0", "size": 28}], "hop": 2}, {"result": [{"rtt": 4.347, "ttl": 253, "from": "185.147.12.19", "size": 28}, {"rtt": 2.876, "ttl": 253, "from": "185.147.12.19", "size": 28}, {"rtt": 3.143, "ttl": 253, "from": "185.147.12.19", "size": 28}], "hop": 3}, {"result": [{"rtt": 3.655, "ttl": 61, "from": "160.242.100.88", "size": 28}, {"rtt": 3.678, "ttl": 61, "from": "160.242.100.88", "size": 28}, {"rtt": 15.568, "ttl": 61, "from": "160.242.100.88", "size": 28}], "hop": 4}, {"result": [{"rtt": 4.263, "ttl": 60, "from": "196.216.48.144", "size": 28}, {"rtt": 6.082, "ttl": 60, "from": "196.216.48.144", "size": 28}, {"rtt": 11.834, "ttl": 60, "from": "196.216.48.144", "size": 28}], "hop": 5}, {"result": [{"rtt": 7.802, "ttl": 249, "from": "193.239.116.112", "size": 28}, {"rtt": 7.691, "ttl": 249, "from": "193.239.116.112", "size": 28}, {"rtt": 7.711, "ttl": 249, "from": "193.239.116.112", "size": 28}], "hop": 6}, {"result": [{"rtt": 8.228, "ttl": 57, "from": "192.5.5.241", "size": 28}, {"rtt": 8.026, "ttl": 57, "from": "192.5.5.241", "size": 28}, {"rtt": 8.254, "ttl": 57, "from": "192.5.5.241", "size": 28}], "hop": 7}], "timestamp": 1514768409, "src_addr": "89.105.202.4", "paris_id": 9, "endtime": 1514768403, "type": "traceroute", "dst_addr": "192.5.5.241", "msm_id": 5004}


    my code is as bellow:



    package tests
    //imports
    import org.apache.spark.SparkContext
    import org.apache.spark.SparkConf
    import org.apache.spark.SparkConf
    import org.apache.spark.sql.SparkSession

    object shell {

    case class Hop(
    hop: BigInt,
    result: Seq[Signal])

    case class Signal(
    rtt: Double
    )

    case class Row(
    af: String,
    from: String,
    size: String,
    result: Seq[Hop]
    )
    def main(args: Array[String]): Unit = {

    //create configuration
    val conf = new SparkConf().setAppName("my first rdd app").setMaster("local")

    //create spark context
    val sc = new SparkContext(conf)

    // find absolute path of json file
    val pathToTraceroutesExamples = getClass.getResource("/test/rttAnalysis_sample_0.json")


    //create spark session
    val spark = SparkSession
    .builder()
    .config(conf)
    .getOrCreate()
    import spark.implicits._

    //read json file
    val logData = spark.read.json(pathToTraceroutesExamples.getPath)

    // create a dataset of Row
    val datasetLogdata = logData.select("af", "from", "size", "result").as[Row]

    //count dataset elements
    val count = datasetLogdata.rdd.count()
    println(count)
    }}


    Question : How I can create an RDD containing a list of Row cas class and getting only important data (because an JSON object contains many unused data in my case)










    share|improve this question

























      0












      0








      0








      I try to create the case class to be able to map each line of my JSON file, so create a RDD by the JSON file.
      I need only some data in a JSON file to create a case class, but I have as error:



      cannot resolve '`result`' due to data type mismatch: cannot cast ArrayType(StructType(StructField(hop,LongType,true), StructField(result,ArrayType(StructType(StructField(from,StringType,true), StructField(rtt,DoubleType,true), StructField(ttl,LongType,true)),true),true)),true) to ArrayType(StructType(StructField(hop,DecimalType(38,0),true), StructField(result,ArrayType(StructType(StructField(rtt,DoubleType,true)),true),true)),true);


      a JSON line is like :



      {"lts": 165, "size": 40, "from": "89.105.202.4", "dst_name": "192.5.5.241", "fw": 4790, "proto": "UDP", "af": 4, "msm_name": "Traceroute", "stored_timestamp": 1514768539, "prb_id": 4247, "result": [{"result": [{"rtt": 1.955, "ttl": 255, "from": "89.105.200.50", "size": 28}, {"rtt": 1.7, "ttl": 255, "from": "10.10.0.5", "size": 28}, {"rtt": 1.709, "ttl": 255, "from": "89.105.200.57", "size": 28}], "hop": 1}, {"result": [{"rtt": 7.543, "ttl": 254, "from": "185.147.12.31", "size": 28}, {"rtt": 3.103, "ttl": 254, "from": "185.147.12.31", "size": 28}, {"rtt": 3.172, "ttl": 254, "from": "185.147.12.0", "size": 28}], "hop": 2}, {"result": [{"rtt": 4.347, "ttl": 253, "from": "185.147.12.19", "size": 28}, {"rtt": 2.876, "ttl": 253, "from": "185.147.12.19", "size": 28}, {"rtt": 3.143, "ttl": 253, "from": "185.147.12.19", "size": 28}], "hop": 3}, {"result": [{"rtt": 3.655, "ttl": 61, "from": "160.242.100.88", "size": 28}, {"rtt": 3.678, "ttl": 61, "from": "160.242.100.88", "size": 28}, {"rtt": 15.568, "ttl": 61, "from": "160.242.100.88", "size": 28}], "hop": 4}, {"result": [{"rtt": 4.263, "ttl": 60, "from": "196.216.48.144", "size": 28}, {"rtt": 6.082, "ttl": 60, "from": "196.216.48.144", "size": 28}, {"rtt": 11.834, "ttl": 60, "from": "196.216.48.144", "size": 28}], "hop": 5}, {"result": [{"rtt": 7.802, "ttl": 249, "from": "193.239.116.112", "size": 28}, {"rtt": 7.691, "ttl": 249, "from": "193.239.116.112", "size": 28}, {"rtt": 7.711, "ttl": 249, "from": "193.239.116.112", "size": 28}], "hop": 6}, {"result": [{"rtt": 8.228, "ttl": 57, "from": "192.5.5.241", "size": 28}, {"rtt": 8.026, "ttl": 57, "from": "192.5.5.241", "size": 28}, {"rtt": 8.254, "ttl": 57, "from": "192.5.5.241", "size": 28}], "hop": 7}], "timestamp": 1514768409, "src_addr": "89.105.202.4", "paris_id": 9, "endtime": 1514768403, "type": "traceroute", "dst_addr": "192.5.5.241", "msm_id": 5004}


      my code is as bellow:



      package tests
      //imports
      import org.apache.spark.SparkContext
      import org.apache.spark.SparkConf
      import org.apache.spark.SparkConf
      import org.apache.spark.sql.SparkSession

      object shell {

      case class Hop(
      hop: BigInt,
      result: Seq[Signal])

      case class Signal(
      rtt: Double
      )

      case class Row(
      af: String,
      from: String,
      size: String,
      result: Seq[Hop]
      )
      def main(args: Array[String]): Unit = {

      //create configuration
      val conf = new SparkConf().setAppName("my first rdd app").setMaster("local")

      //create spark context
      val sc = new SparkContext(conf)

      // find absolute path of json file
      val pathToTraceroutesExamples = getClass.getResource("/test/rttAnalysis_sample_0.json")


      //create spark session
      val spark = SparkSession
      .builder()
      .config(conf)
      .getOrCreate()
      import spark.implicits._

      //read json file
      val logData = spark.read.json(pathToTraceroutesExamples.getPath)

      // create a dataset of Row
      val datasetLogdata = logData.select("af", "from", "size", "result").as[Row]

      //count dataset elements
      val count = datasetLogdata.rdd.count()
      println(count)
      }}


      Question : How I can create an RDD containing a list of Row cas class and getting only important data (because an JSON object contains many unused data in my case)










      share|improve this question














      I try to create the case class to be able to map each line of my JSON file, so create a RDD by the JSON file.
      I need only some data in a JSON file to create a case class, but I have as error:



      cannot resolve '`result`' due to data type mismatch: cannot cast ArrayType(StructType(StructField(hop,LongType,true), StructField(result,ArrayType(StructType(StructField(from,StringType,true), StructField(rtt,DoubleType,true), StructField(ttl,LongType,true)),true),true)),true) to ArrayType(StructType(StructField(hop,DecimalType(38,0),true), StructField(result,ArrayType(StructType(StructField(rtt,DoubleType,true)),true),true)),true);


      a JSON line is like :



      {"lts": 165, "size": 40, "from": "89.105.202.4", "dst_name": "192.5.5.241", "fw": 4790, "proto": "UDP", "af": 4, "msm_name": "Traceroute", "stored_timestamp": 1514768539, "prb_id": 4247, "result": [{"result": [{"rtt": 1.955, "ttl": 255, "from": "89.105.200.50", "size": 28}, {"rtt": 1.7, "ttl": 255, "from": "10.10.0.5", "size": 28}, {"rtt": 1.709, "ttl": 255, "from": "89.105.200.57", "size": 28}], "hop": 1}, {"result": [{"rtt": 7.543, "ttl": 254, "from": "185.147.12.31", "size": 28}, {"rtt": 3.103, "ttl": 254, "from": "185.147.12.31", "size": 28}, {"rtt": 3.172, "ttl": 254, "from": "185.147.12.0", "size": 28}], "hop": 2}, {"result": [{"rtt": 4.347, "ttl": 253, "from": "185.147.12.19", "size": 28}, {"rtt": 2.876, "ttl": 253, "from": "185.147.12.19", "size": 28}, {"rtt": 3.143, "ttl": 253, "from": "185.147.12.19", "size": 28}], "hop": 3}, {"result": [{"rtt": 3.655, "ttl": 61, "from": "160.242.100.88", "size": 28}, {"rtt": 3.678, "ttl": 61, "from": "160.242.100.88", "size": 28}, {"rtt": 15.568, "ttl": 61, "from": "160.242.100.88", "size": 28}], "hop": 4}, {"result": [{"rtt": 4.263, "ttl": 60, "from": "196.216.48.144", "size": 28}, {"rtt": 6.082, "ttl": 60, "from": "196.216.48.144", "size": 28}, {"rtt": 11.834, "ttl": 60, "from": "196.216.48.144", "size": 28}], "hop": 5}, {"result": [{"rtt": 7.802, "ttl": 249, "from": "193.239.116.112", "size": 28}, {"rtt": 7.691, "ttl": 249, "from": "193.239.116.112", "size": 28}, {"rtt": 7.711, "ttl": 249, "from": "193.239.116.112", "size": 28}], "hop": 6}, {"result": [{"rtt": 8.228, "ttl": 57, "from": "192.5.5.241", "size": 28}, {"rtt": 8.026, "ttl": 57, "from": "192.5.5.241", "size": 28}, {"rtt": 8.254, "ttl": 57, "from": "192.5.5.241", "size": 28}], "hop": 7}], "timestamp": 1514768409, "src_addr": "89.105.202.4", "paris_id": 9, "endtime": 1514768403, "type": "traceroute", "dst_addr": "192.5.5.241", "msm_id": 5004}


      my code is as bellow:



      package tests
      //imports
      import org.apache.spark.SparkContext
      import org.apache.spark.SparkConf
      import org.apache.spark.SparkConf
      import org.apache.spark.sql.SparkSession

      object shell {

      case class Hop(
      hop: BigInt,
      result: Seq[Signal])

      case class Signal(
      rtt: Double
      )

      case class Row(
      af: String,
      from: String,
      size: String,
      result: Seq[Hop]
      )
      def main(args: Array[String]): Unit = {

      //create configuration
      val conf = new SparkConf().setAppName("my first rdd app").setMaster("local")

      //create spark context
      val sc = new SparkContext(conf)

      // find absolute path of json file
      val pathToTraceroutesExamples = getClass.getResource("/test/rttAnalysis_sample_0.json")


      //create spark session
      val spark = SparkSession
      .builder()
      .config(conf)
      .getOrCreate()
      import spark.implicits._

      //read json file
      val logData = spark.read.json(pathToTraceroutesExamples.getPath)

      // create a dataset of Row
      val datasetLogdata = logData.select("af", "from", "size", "result").as[Row]

      //count dataset elements
      val count = datasetLogdata.rdd.count()
      println(count)
      }}


      Question : How I can create an RDD containing a list of Row cas class and getting only important data (because an JSON object contains many unused data in my case)







      json scala apache-spark






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 14 '18 at 13:43









      samarasamara

      12812




      12812
























          0






          active

          oldest

          votes











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53301655%2fspark-scala-mapping-the-json-file-to-dataset-of-case-class-without-using-all-j%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53301655%2fspark-scala-mapping-the-json-file-to-dataset-of-case-class-without-using-all-j%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Xamarin.iOS Cant Deploy on Iphone

          Glorious Revolution

          Dulmage-Mendelsohn matrix decomposition in Python