Read local files into spark dataframe in zeppelin running on a docker container












-1















I'm trying to write Spark code in Zeppelin using apache zeppelin docker image on my laptop. Everything works as expected except reading files from local disk, e.g. when I try to read a csv file into a Spark dataframe



val df = spark.read.csv("/User/myname/documents/data/xyz.csv")



I get the following error:



org.apache.spark.sql.AnalysisException: Path does not exist: file:/User/myname/documents/data/xyz.csv;
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:382)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:415)
at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:352)
... 47 elided









share|improve this question





























    -1















    I'm trying to write Spark code in Zeppelin using apache zeppelin docker image on my laptop. Everything works as expected except reading files from local disk, e.g. when I try to read a csv file into a Spark dataframe



    val df = spark.read.csv("/User/myname/documents/data/xyz.csv")



    I get the following error:



    org.apache.spark.sql.AnalysisException: Path does not exist: file:/User/myname/documents/data/xyz.csv;
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:382)
    at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
    at scala.collection.immutable.List.flatMap(List.scala:344)
    at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
    at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:415)
    at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:352)
    ... 47 elided









    share|improve this question



























      -1












      -1








      -1








      I'm trying to write Spark code in Zeppelin using apache zeppelin docker image on my laptop. Everything works as expected except reading files from local disk, e.g. when I try to read a csv file into a Spark dataframe



      val df = spark.read.csv("/User/myname/documents/data/xyz.csv")



      I get the following error:



      org.apache.spark.sql.AnalysisException: Path does not exist: file:/User/myname/documents/data/xyz.csv;
      at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:382)
      at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
      at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
      at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
      at scala.collection.immutable.List.foreach(List.scala:381)
      at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
      at scala.collection.immutable.List.flatMap(List.scala:344)
      at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
      at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
      at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:415)
      at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:352)
      ... 47 elided









      share|improve this question
















      I'm trying to write Spark code in Zeppelin using apache zeppelin docker image on my laptop. Everything works as expected except reading files from local disk, e.g. when I try to read a csv file into a Spark dataframe



      val df = spark.read.csv("/User/myname/documents/data/xyz.csv")



      I get the following error:



      org.apache.spark.sql.AnalysisException: Path does not exist: file:/User/myname/documents/data/xyz.csv;
      at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:382)
      at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
      at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
      at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
      at scala.collection.immutable.List.foreach(List.scala:381)
      at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
      at scala.collection.immutable.List.flatMap(List.scala:344)
      at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
      at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
      at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:415)
      at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:352)
      ... 47 elided






      docker apache-spark apache-zeppelin






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 16 '18 at 17:57







      HaMi

















      asked Nov 16 '18 at 0:21









      HaMiHaMi

      426




      426
























          1 Answer
          1






          active

          oldest

          votes


















          0














          I think I found the answer:
          I pulled the docker image (I used the one below but you can change it)



          docker pull skymindops/zeppelin-dl4j



          And then ran:



          docker run -it --rm -p 7077:7077 -p 8080:8080 --privileged=true -v $PWD/logs:/logs -v $PWD/notebook:/notebook -v $PWD/data:/data 
          -e ZEPPELIN_NOTEBOOK_DIR='/notebook'
          -e ZEPPELIN_LOG_DIR='/logs'
          skymindops/zeppelin-dl4j:latest


          Now reading files from data folder will work:



          val df = spark.read.option("header", "true").csv("/data/xyz.csv")



          Note that I didn't need the notebooks already in that image.






          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53329698%2fread-local-files-into-spark-dataframe-in-zeppelin-running-on-a-docker-container%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            I think I found the answer:
            I pulled the docker image (I used the one below but you can change it)



            docker pull skymindops/zeppelin-dl4j



            And then ran:



            docker run -it --rm -p 7077:7077 -p 8080:8080 --privileged=true -v $PWD/logs:/logs -v $PWD/notebook:/notebook -v $PWD/data:/data 
            -e ZEPPELIN_NOTEBOOK_DIR='/notebook'
            -e ZEPPELIN_LOG_DIR='/logs'
            skymindops/zeppelin-dl4j:latest


            Now reading files from data folder will work:



            val df = spark.read.option("header", "true").csv("/data/xyz.csv")



            Note that I didn't need the notebooks already in that image.






            share|improve this answer




























              0














              I think I found the answer:
              I pulled the docker image (I used the one below but you can change it)



              docker pull skymindops/zeppelin-dl4j



              And then ran:



              docker run -it --rm -p 7077:7077 -p 8080:8080 --privileged=true -v $PWD/logs:/logs -v $PWD/notebook:/notebook -v $PWD/data:/data 
              -e ZEPPELIN_NOTEBOOK_DIR='/notebook'
              -e ZEPPELIN_LOG_DIR='/logs'
              skymindops/zeppelin-dl4j:latest


              Now reading files from data folder will work:



              val df = spark.read.option("header", "true").csv("/data/xyz.csv")



              Note that I didn't need the notebooks already in that image.






              share|improve this answer


























                0












                0








                0







                I think I found the answer:
                I pulled the docker image (I used the one below but you can change it)



                docker pull skymindops/zeppelin-dl4j



                And then ran:



                docker run -it --rm -p 7077:7077 -p 8080:8080 --privileged=true -v $PWD/logs:/logs -v $PWD/notebook:/notebook -v $PWD/data:/data 
                -e ZEPPELIN_NOTEBOOK_DIR='/notebook'
                -e ZEPPELIN_LOG_DIR='/logs'
                skymindops/zeppelin-dl4j:latest


                Now reading files from data folder will work:



                val df = spark.read.option("header", "true").csv("/data/xyz.csv")



                Note that I didn't need the notebooks already in that image.






                share|improve this answer













                I think I found the answer:
                I pulled the docker image (I used the one below but you can change it)



                docker pull skymindops/zeppelin-dl4j



                And then ran:



                docker run -it --rm -p 7077:7077 -p 8080:8080 --privileged=true -v $PWD/logs:/logs -v $PWD/notebook:/notebook -v $PWD/data:/data 
                -e ZEPPELIN_NOTEBOOK_DIR='/notebook'
                -e ZEPPELIN_LOG_DIR='/logs'
                skymindops/zeppelin-dl4j:latest


                Now reading files from data folder will work:



                val df = spark.read.option("header", "true").csv("/data/xyz.csv")



                Note that I didn't need the notebooks already in that image.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 16 '18 at 18:37









                HaMiHaMi

                426




                426
































                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53329698%2fread-local-files-into-spark-dataframe-in-zeppelin-running-on-a-docker-container%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Xamarin.iOS Cant Deploy on Iphone

                    Glorious Revolution

                    Dulmage-Mendelsohn matrix decomposition in Python