Spark-Scala Unable to infer schema (Defer input path validation into DataSource)

up vote
0
down vote

favorite

SPARK-26039

While loading empty orc folder. Anyways to bypass this.

val df = spark.read.format("orc").load(orcFolderPath)



org.apache.spark.sql.AnalysisException: Unable to infer schema for ORC. It must be specified manually.;

  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:185)

  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:185)

  at scala.Option.getOrElse(Option.scala:121)

  at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:184)

  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)

  at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)

  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)

  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)

  ... 49 elided

Getting this error may be orc reader trying to infer schema but i want to bypass this special case when somehow in repository blank folder came up but has to be checked.

try {

    spark.read.format("orc").load(path)

    } catch {

        case ex: org.apache.spark.sql.AnalysisException => {

        null

            }

    }

Tried by this way to catch exception. Any other way would be helpful

edited Nov 14 at 10:06

asked Nov 11 at 12:23

xargus

184

Put this code in a separate function. If the function catches this exception, change a class level variable value or flag.
– Nikhil
Nov 12 at 12:56

@Nikhil...i dont want to use try catch. But if there is empty folder there would always be exception...so want another way of doing it..
– xargus
Nov 13 at 12:21

add a comment |

up vote
0
down vote

favorite

SPARK-26039

While loading empty orc folder. Anyways to bypass this.

val df = spark.read.format("orc").load(orcFolderPath)



org.apache.spark.sql.AnalysisException: Unable to infer schema for ORC. It must be specified manually.;

  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:185)

  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:185)

  at scala.Option.getOrElse(Option.scala:121)

  at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:184)

  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)

  at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)

  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)

  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)

  ... 49 elided

Getting this error may be orc reader trying to infer schema but i want to bypass this special case when somehow in repository blank folder came up but has to be checked.

try {

    spark.read.format("orc").load(path)

    } catch {

        case ex: org.apache.spark.sql.AnalysisException => {

        null

            }

    }

Tried by this way to catch exception. Any other way would be helpful

edited Nov 14 at 10:06

asked Nov 11 at 12:23

xargus

184

Put this code in a separate function. If the function catches this exception, change a class level variable value or flag.
– Nikhil
Nov 12 at 12:56

@Nikhil...i dont want to use try catch. But if there is empty folder there would always be exception...so want another way of doing it..
– xargus
Nov 13 at 12:21

add a comment |

up vote
0
down vote

favorite

SPARK-26039

While loading empty orc folder. Anyways to bypass this.

val df = spark.read.format("orc").load(orcFolderPath)



org.apache.spark.sql.AnalysisException: Unable to infer schema for ORC. It must be specified manually.;

  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:185)

  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:185)

  at scala.Option.getOrElse(Option.scala:121)

  at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:184)

  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)

  at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)

  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)

  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)

  ... 49 elided

Getting this error may be orc reader trying to infer schema but i want to bypass this special case when somehow in repository blank folder came up but has to be checked.

try {

    spark.read.format("orc").load(path)

    } catch {

        case ex: org.apache.spark.sql.AnalysisException => {

        null

            }

    }

Tried by this way to catch exception. Any other way would be helpful

edited Nov 14 at 10:06

asked Nov 11 at 12:23

xargus

184

SPARK-26039

While loading empty orc folder. Anyways to bypass this.

val df = spark.read.format("orc").load(orcFolderPath)



org.apache.spark.sql.AnalysisException: Unable to infer schema for ORC. It must be specified manually.;

  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:185)

  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:185)

  at scala.Option.getOrElse(Option.scala:121)

  at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:184)

  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)

  at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)

  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)

  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)

  ... 49 elided

Getting this error may be orc reader trying to infer schema but i want to bypass this special case when somehow in repository blank folder came up but has to be checked.

try {

    spark.read.format("orc").load(path)

    } catch {

        case ex: org.apache.spark.sql.AnalysisException => {

        null

            }

    }

Tried by this way to catch exception. Any other way would be helpful

java scala apache-spark apache-spark-sql

edited Nov 14 at 10:06

asked Nov 11 at 12:23

xargus

184

edited Nov 14 at 10:06

asked Nov 11 at 12:23

xargus

184

edited Nov 14 at 10:06

asked Nov 11 at 12:23

xargus

184

asked Nov 11 at 12:23

xargus

184

asked Nov 11 at 12:23

xargus

184

Put this code in a separate function. If the function catches this exception, change a class level variable value or flag.
– Nikhil
Nov 12 at 12:56

@Nikhil...i dont want to use try catch. But if there is empty folder there would always be exception...so want another way of doing it..
– xargus
Nov 13 at 12:21

add a comment |

Put this code in a separate function. If the function catches this exception, change a class level variable value or flag.
– Nikhil
Nov 12 at 12:56

@Nikhil...i dont want to use try catch. But if there is empty folder there would always be exception...so want another way of doing it..
– xargus
Nov 13 at 12:21

Put this code in a separate function. If the function catches this exception, change a class level variable value or flag.
– Nikhil
Nov 12 at 12:56

@Nikhil...i dont want to use try catch. But if there is empty folder there would always be exception...so want another way of doing it..
– xargus
Nov 13 at 12:21

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

got one more solution seems like...this is also not best one...

import org.apache.hadoop.conf.Configuration

import org.apache.hadoop.fs.{FileSystem, Path}



  def pathStatus(path: String): Boolean = {

      val config: Configuration = new Configuration()

      val fs: FileSystem = FileSystem.get(config)

    if (fs.globStatus(new Path(path)) == null) {

      false

    } else {

      true

    }

  }

answered Nov 14 at 10:05

xargus

184

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53248725%2fspark-scala-unable-to-infer-schema-defer-input-path-validation-into-datasource%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

got one more solution seems like...this is also not best one...

import org.apache.hadoop.conf.Configuration

import org.apache.hadoop.fs.{FileSystem, Path}



  def pathStatus(path: String): Boolean = {

      val config: Configuration = new Configuration()

      val fs: FileSystem = FileSystem.get(config)

    if (fs.globStatus(new Path(path)) == null) {

      false

    } else {

      true

    }

  }

answered Nov 14 at 10:05

xargus

184

add a comment |

up vote
0
down vote

got one more solution seems like...this is also not best one...

import org.apache.hadoop.conf.Configuration

import org.apache.hadoop.fs.{FileSystem, Path}



  def pathStatus(path: String): Boolean = {

      val config: Configuration = new Configuration()

      val fs: FileSystem = FileSystem.get(config)

    if (fs.globStatus(new Path(path)) == null) {

      false

    } else {

      true

    }

  }

answered Nov 14 at 10:05

xargus

184

add a comment |

up vote
0
down vote

got one more solution seems like...this is also not best one...

import org.apache.hadoop.conf.Configuration

import org.apache.hadoop.fs.{FileSystem, Path}



  def pathStatus(path: String): Boolean = {

      val config: Configuration = new Configuration()

      val fs: FileSystem = FileSystem.get(config)

    if (fs.globStatus(new Path(path)) == null) {

      false

    } else {

      true

    }

  }

answered Nov 14 at 10:05

xargus

184

got one more solution seems like...this is also not best one...

import org.apache.hadoop.conf.Configuration

import org.apache.hadoop.fs.{FileSystem, Path}



  def pathStatus(path: String): Boolean = {

      val config: Configuration = new Configuration()

      val fs: FileSystem = FileSystem.get(config)

    if (fs.globStatus(new Path(path)) == null) {

      false

    } else {

      true

    }

  }

answered Nov 14 at 10:05

xargus

184

answered Nov 14 at 10:05

xargus

184

answered Nov 14 at 10:05

xargus

184

answered Nov 14 at 10:05

xargus

184

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

K NvDhMo,8b,ZPU7S gYV48I,0GPSkHGWUENwQDwHApERjk1Xn 1N6OKR

搜尋此網誌

Vfrdtyky