How to handle backpressure on databases when using Apache Spark?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

We are using Apache Spark for performing ETL for every 2 hours.

Sometimes Spark puts much pressure on databases when read/write operation is performed.

For Spark Streaming, I can see backpressure configuration on kafka.

Is there a way to handle this issue in batch processing?

edited Nov 16 '18 at 13:56

eliasah

27.9k776120

asked Nov 16 '18 at 12:29

Gowthaman V

3118

add a comment |

We are using Apache Spark for performing ETL for every 2 hours.

Sometimes Spark puts much pressure on databases when read/write operation is performed.

For Spark Streaming, I can see backpressure configuration on kafka.

Is there a way to handle this issue in batch processing?

edited Nov 16 '18 at 13:56

eliasah

27.9k776120

asked Nov 16 '18 at 12:29

Gowthaman V

3118

add a comment |

We are using Apache Spark for performing ETL for every 2 hours.

Sometimes Spark puts much pressure on databases when read/write operation is performed.

For Spark Streaming, I can see backpressure configuration on kafka.

Is there a way to handle this issue in batch processing?

edited Nov 16 '18 at 13:56

eliasah

27.9k776120

asked Nov 16 '18 at 12:29

Gowthaman V

3118

We are using Apache Spark for performing ETL for every 2 hours.

Sometimes Spark puts much pressure on databases when read/write operation is performed.

For Spark Streaming, I can see backpressure configuration on kafka.

Is there a way to handle this issue in batch processing?

apache-spark apache-spark-sql

edited Nov 16 '18 at 13:56

eliasah

27.9k776120

asked Nov 16 '18 at 12:29

Gowthaman V

3118

edited Nov 16 '18 at 13:56

eliasah

27.9k776120

asked Nov 16 '18 at 12:29

Gowthaman V

3118

edited Nov 16 '18 at 13:56

eliasah

27.9k776120

edited Nov 16 '18 at 13:56

eliasah

27.9k776120

edited Nov 16 '18 at 13:56

eliasah

27.9k776120

asked Nov 16 '18 at 12:29

Gowthaman V

3118

asked Nov 16 '18 at 12:29

Gowthaman V

3118

asked Nov 16 '18 at 12:29

Gowthaman V

3118

add a comment |

1 Answer
1

active

oldest

votes

Backpressure is actually just a fancy word to refer to setting up the max receiving rate. So actually it doesn't work the way you think it does.

What should be done here is actually on the reading end.

Now in classical JDBC usage, jdbc connectors have a fetchSize property for PreparedStatements. So basically you can consider configuring that fetchSize with regards of what is said in the following answers :

Spark JDBC fetchsize option

What does Statement.setFetchSize(nSize) method really do in SQL Server JDBC driver?

Unfortunately, this might not solve all of your performance issues with your RDBMS.

What you must know is that compared to the basic jdbc reader, which run on a single worker, when partitioning data using an integer column or using a sequence of predicates, loading data in a distributed mode but introduce a couple of problems. In your case, high number of concurrent reads can easily throttle the database.

To deal with this, I suggest the following :

If available, consider using specialized data sources over JDBC
connections.

Consider using specialized or generic bulk import/export tools like Postgres COPY or Apache Sqoop.

Be sure to understand performance implications of different JDBC data source
variants, especially when working with production database.

Consider using a separate replica for Spark jobs.

If you wish to know more about Reading data using the JDBC source, I suggest you read the following :

Spark SQL and Dataset API.

Disclaimer: I'm the co-author of that repo.

answered Nov 16 '18 at 14:05

eliasah

27.9k776120

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53337950%2fhow-to-handle-backpressure-on-databases-when-using-apache-spark%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Backpressure is actually just a fancy word to refer to setting up the max receiving rate. So actually it doesn't work the way you think it does.

What should be done here is actually on the reading end.

Spark JDBC fetchsize option

What does Statement.setFetchSize(nSize) method really do in SQL Server JDBC driver?

Unfortunately, this might not solve all of your performance issues with your RDBMS.

To deal with this, I suggest the following :

If available, consider using specialized data sources over JDBC
connections.

Consider using specialized or generic bulk import/export tools like Postgres COPY or Apache Sqoop.

Be sure to understand performance implications of different JDBC data source
variants, especially when working with production database.

Consider using a separate replica for Spark jobs.

If you wish to know more about Reading data using the JDBC source, I suggest you read the following :

Spark SQL and Dataset API.

Disclaimer: I'm the co-author of that repo.

answered Nov 16 '18 at 14:05

eliasah

27.9k776120

add a comment |

Backpressure is actually just a fancy word to refer to setting up the max receiving rate. So actually it doesn't work the way you think it does.

What should be done here is actually on the reading end.

Spark JDBC fetchsize option

What does Statement.setFetchSize(nSize) method really do in SQL Server JDBC driver?

Unfortunately, this might not solve all of your performance issues with your RDBMS.

To deal with this, I suggest the following :

If available, consider using specialized data sources over JDBC
connections.

Consider using specialized or generic bulk import/export tools like Postgres COPY or Apache Sqoop.

Be sure to understand performance implications of different JDBC data source
variants, especially when working with production database.

Consider using a separate replica for Spark jobs.

If you wish to know more about Reading data using the JDBC source, I suggest you read the following :

Spark SQL and Dataset API.

Disclaimer: I'm the co-author of that repo.

answered Nov 16 '18 at 14:05

eliasah

27.9k776120

add a comment |

Backpressure is actually just a fancy word to refer to setting up the max receiving rate. So actually it doesn't work the way you think it does.

What should be done here is actually on the reading end.

Spark JDBC fetchsize option

What does Statement.setFetchSize(nSize) method really do in SQL Server JDBC driver?

Unfortunately, this might not solve all of your performance issues with your RDBMS.

To deal with this, I suggest the following :

If available, consider using specialized data sources over JDBC
connections.

Consider using specialized or generic bulk import/export tools like Postgres COPY or Apache Sqoop.

Be sure to understand performance implications of different JDBC data source
variants, especially when working with production database.

Consider using a separate replica for Spark jobs.

If you wish to know more about Reading data using the JDBC source, I suggest you read the following :

Spark SQL and Dataset API.

Disclaimer: I'm the co-author of that repo.

answered Nov 16 '18 at 14:05

eliasah

27.9k776120

Backpressure is actually just a fancy word to refer to setting up the max receiving rate. So actually it doesn't work the way you think it does.

What should be done here is actually on the reading end.

Spark JDBC fetchsize option

What does Statement.setFetchSize(nSize) method really do in SQL Server JDBC driver?

Unfortunately, this might not solve all of your performance issues with your RDBMS.

To deal with this, I suggest the following :

If available, consider using specialized data sources over JDBC
connections.

Consider using specialized or generic bulk import/export tools like Postgres COPY or Apache Sqoop.

Be sure to understand performance implications of different JDBC data source
variants, especially when working with production database.

Consider using a separate replica for Spark jobs.

If you wish to know more about Reading data using the JDBC source, I suggest you read the following :

Spark SQL and Dataset API.

Disclaimer: I'm the co-author of that repo.

answered Nov 16 '18 at 14:05

eliasah

27.9k776120

answered Nov 16 '18 at 14:05

eliasah

27.9k776120

answered Nov 16 '18 at 14:05

eliasah

27.9k776120

answered Nov 16 '18 at 14:05

eliasah

27.9k776120

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

z5 6gKlCNp,tiglamZ4ZiPEva38Rcb QFwKQqkW,byv

搜尋此網誌

Vfrdtyky