Scrapy - Use feed exporter for a particular spider (and not others) in a project
ENVIRONMENT:
Windows7, Python 3.6.5, Scrapy 1.5.1
PROBLEM DESCRIPTION:
I have a scrapy project called project_github
, which contains 3 spiders:spider1
, spider2
, spider3
. Each of these spiders scrapes data from a particular website individual to that spider.
I am trying to automatically export a JSON file when a particular spider is executed, with the format: NameOfSpider_TodaysDate.json
, so that from the command line I can:
Execute the script scrapy crawl spider1
which returns spider1_181115.json
Currently I am using ITEM EXPORTERS
in settings.py
with the following code:
import datetime
FEED_URI = 'spider1_' + datetime.datetime.today().strftime('%y%m%d') + '.json'
FEED_FORMAT = 'json'
FEED_EXPORTERS = {'json': 'scrapy.exporters.JsonItemExporter'}
FEED_EXPORT_ENCODING = 'utf-8'
Obviously this code always writes spider1_TodaysDate.json
regardless of the spider used... Any suggestions?
python json scrapy
add a comment |
ENVIRONMENT:
Windows7, Python 3.6.5, Scrapy 1.5.1
PROBLEM DESCRIPTION:
I have a scrapy project called project_github
, which contains 3 spiders:spider1
, spider2
, spider3
. Each of these spiders scrapes data from a particular website individual to that spider.
I am trying to automatically export a JSON file when a particular spider is executed, with the format: NameOfSpider_TodaysDate.json
, so that from the command line I can:
Execute the script scrapy crawl spider1
which returns spider1_181115.json
Currently I am using ITEM EXPORTERS
in settings.py
with the following code:
import datetime
FEED_URI = 'spider1_' + datetime.datetime.today().strftime('%y%m%d') + '.json'
FEED_FORMAT = 'json'
FEED_EXPORTERS = {'json': 'scrapy.exporters.JsonItemExporter'}
FEED_EXPORT_ENCODING = 'utf-8'
Obviously this code always writes spider1_TodaysDate.json
regardless of the spider used... Any suggestions?
python json scrapy
add a comment |
ENVIRONMENT:
Windows7, Python 3.6.5, Scrapy 1.5.1
PROBLEM DESCRIPTION:
I have a scrapy project called project_github
, which contains 3 spiders:spider1
, spider2
, spider3
. Each of these spiders scrapes data from a particular website individual to that spider.
I am trying to automatically export a JSON file when a particular spider is executed, with the format: NameOfSpider_TodaysDate.json
, so that from the command line I can:
Execute the script scrapy crawl spider1
which returns spider1_181115.json
Currently I am using ITEM EXPORTERS
in settings.py
with the following code:
import datetime
FEED_URI = 'spider1_' + datetime.datetime.today().strftime('%y%m%d') + '.json'
FEED_FORMAT = 'json'
FEED_EXPORTERS = {'json': 'scrapy.exporters.JsonItemExporter'}
FEED_EXPORT_ENCODING = 'utf-8'
Obviously this code always writes spider1_TodaysDate.json
regardless of the spider used... Any suggestions?
python json scrapy
ENVIRONMENT:
Windows7, Python 3.6.5, Scrapy 1.5.1
PROBLEM DESCRIPTION:
I have a scrapy project called project_github
, which contains 3 spiders:spider1
, spider2
, spider3
. Each of these spiders scrapes data from a particular website individual to that spider.
I am trying to automatically export a JSON file when a particular spider is executed, with the format: NameOfSpider_TodaysDate.json
, so that from the command line I can:
Execute the script scrapy crawl spider1
which returns spider1_181115.json
Currently I am using ITEM EXPORTERS
in settings.py
with the following code:
import datetime
FEED_URI = 'spider1_' + datetime.datetime.today().strftime('%y%m%d') + '.json'
FEED_FORMAT = 'json'
FEED_EXPORTERS = {'json': 'scrapy.exporters.JsonItemExporter'}
FEED_EXPORT_ENCODING = 'utf-8'
Obviously this code always writes spider1_TodaysDate.json
regardless of the spider used... Any suggestions?
python json scrapy
python json scrapy
asked Nov 15 '18 at 11:52
johnnydoejohnnydoe
236
236
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
The way to do this is by defining custom_settings
as a class
attribute under the specific spider were are writing the item exporter for. Spider settings override project settings.
So, for spider1
:
from scrapy.settings.default_settings import FEED_EXPORT_ENCODING, FEED_EXPORTERS, FEED_FORMAT, FEED_URI
class spider1(scrapy.Spider):
name = "spider1"
allowed_domains =
custom_settings = {
FEED_URI: 'spider1_' + datetime.datetime.today().strftime('%y%m%d') + '.json',
FEED_FORMAT: 'json',
FEED_EXPORTERS: {
'json': 'scrapy.exporters.JsonItemExporter',
},
FEED_EXPORT_ENCODING: 'utf-8',
}
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53318905%2fscrapy-use-feed-exporter-for-a-particular-spider-and-not-others-in-a-project%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The way to do this is by defining custom_settings
as a class
attribute under the specific spider were are writing the item exporter for. Spider settings override project settings.
So, for spider1
:
from scrapy.settings.default_settings import FEED_EXPORT_ENCODING, FEED_EXPORTERS, FEED_FORMAT, FEED_URI
class spider1(scrapy.Spider):
name = "spider1"
allowed_domains =
custom_settings = {
FEED_URI: 'spider1_' + datetime.datetime.today().strftime('%y%m%d') + '.json',
FEED_FORMAT: 'json',
FEED_EXPORTERS: {
'json': 'scrapy.exporters.JsonItemExporter',
},
FEED_EXPORT_ENCODING: 'utf-8',
}
add a comment |
The way to do this is by defining custom_settings
as a class
attribute under the specific spider were are writing the item exporter for. Spider settings override project settings.
So, for spider1
:
from scrapy.settings.default_settings import FEED_EXPORT_ENCODING, FEED_EXPORTERS, FEED_FORMAT, FEED_URI
class spider1(scrapy.Spider):
name = "spider1"
allowed_domains =
custom_settings = {
FEED_URI: 'spider1_' + datetime.datetime.today().strftime('%y%m%d') + '.json',
FEED_FORMAT: 'json',
FEED_EXPORTERS: {
'json': 'scrapy.exporters.JsonItemExporter',
},
FEED_EXPORT_ENCODING: 'utf-8',
}
add a comment |
The way to do this is by defining custom_settings
as a class
attribute under the specific spider were are writing the item exporter for. Spider settings override project settings.
So, for spider1
:
from scrapy.settings.default_settings import FEED_EXPORT_ENCODING, FEED_EXPORTERS, FEED_FORMAT, FEED_URI
class spider1(scrapy.Spider):
name = "spider1"
allowed_domains =
custom_settings = {
FEED_URI: 'spider1_' + datetime.datetime.today().strftime('%y%m%d') + '.json',
FEED_FORMAT: 'json',
FEED_EXPORTERS: {
'json': 'scrapy.exporters.JsonItemExporter',
},
FEED_EXPORT_ENCODING: 'utf-8',
}
The way to do this is by defining custom_settings
as a class
attribute under the specific spider were are writing the item exporter for. Spider settings override project settings.
So, for spider1
:
from scrapy.settings.default_settings import FEED_EXPORT_ENCODING, FEED_EXPORTERS, FEED_FORMAT, FEED_URI
class spider1(scrapy.Spider):
name = "spider1"
allowed_domains =
custom_settings = {
FEED_URI: 'spider1_' + datetime.datetime.today().strftime('%y%m%d') + '.json',
FEED_FORMAT: 'json',
FEED_EXPORTERS: {
'json': 'scrapy.exporters.JsonItemExporter',
},
FEED_EXPORT_ENCODING: 'utf-8',
}
answered Nov 15 '18 at 15:41
johnnydoejohnnydoe
236
236
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53318905%2fscrapy-use-feed-exporter-for-a-particular-spider-and-not-others-in-a-project%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown