PHP crawler take a lot of time in the execution [closed]
up vote
-2
down vote
favorite
I'm working on an small websites crawler to get links, but my script take a lot of time in the execution and sometime return without a result. can you help or suggest me another algorithm please.
public function crawl($url = "http://www.example.com", $depth = 5){
static $seen = array();
if (isset($seen[$url]) || $depth === 0)
return;
$path = $query = '';
$seen[$url] = true;
$dom = new DOMDocument('1.0');
@$dom->loadHTMLFile($url);
$anchors = $dom->getElementsByTagName('a');
foreach ($anchors as $element){
$href = $element->getAttribute('href');
if (0 !== strpos($href, 'http')){
$href = $url;
}
$this->crawl($href, $depth - 1);
}
$parse = parse_url($url);
if(preg_replace('#^www.(.+.)#i', '$1', $parse['host']) == $this->domain_name){
if(array_key_exists('query', $parse)){
$this->crud->insert('dynamic_urls', array('url_link' => $url));
}
}
}
php codeigniter web-crawler
closed as off-topic by John Conde, Vickel, DFriend, sideshowbarker, AdrianHHH Nov 10 at 22:37
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Vickel, DFriend, sideshowbarker, AdrianHHH
If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
up vote
-2
down vote
favorite
I'm working on an small websites crawler to get links, but my script take a lot of time in the execution and sometime return without a result. can you help or suggest me another algorithm please.
public function crawl($url = "http://www.example.com", $depth = 5){
static $seen = array();
if (isset($seen[$url]) || $depth === 0)
return;
$path = $query = '';
$seen[$url] = true;
$dom = new DOMDocument('1.0');
@$dom->loadHTMLFile($url);
$anchors = $dom->getElementsByTagName('a');
foreach ($anchors as $element){
$href = $element->getAttribute('href');
if (0 !== strpos($href, 'http')){
$href = $url;
}
$this->crawl($href, $depth - 1);
}
$parse = parse_url($url);
if(preg_replace('#^www.(.+.)#i', '$1', $parse['host']) == $this->domain_name){
if(array_key_exists('query', $parse)){
$this->crud->insert('dynamic_urls', array('url_link' => $url));
}
}
}
php codeigniter web-crawler
closed as off-topic by John Conde, Vickel, DFriend, sideshowbarker, AdrianHHH Nov 10 at 22:37
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Vickel, DFriend, sideshowbarker, AdrianHHH
If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
up vote
-2
down vote
favorite
up vote
-2
down vote
favorite
I'm working on an small websites crawler to get links, but my script take a lot of time in the execution and sometime return without a result. can you help or suggest me another algorithm please.
public function crawl($url = "http://www.example.com", $depth = 5){
static $seen = array();
if (isset($seen[$url]) || $depth === 0)
return;
$path = $query = '';
$seen[$url] = true;
$dom = new DOMDocument('1.0');
@$dom->loadHTMLFile($url);
$anchors = $dom->getElementsByTagName('a');
foreach ($anchors as $element){
$href = $element->getAttribute('href');
if (0 !== strpos($href, 'http')){
$href = $url;
}
$this->crawl($href, $depth - 1);
}
$parse = parse_url($url);
if(preg_replace('#^www.(.+.)#i', '$1', $parse['host']) == $this->domain_name){
if(array_key_exists('query', $parse)){
$this->crud->insert('dynamic_urls', array('url_link' => $url));
}
}
}
php codeigniter web-crawler
I'm working on an small websites crawler to get links, but my script take a lot of time in the execution and sometime return without a result. can you help or suggest me another algorithm please.
public function crawl($url = "http://www.example.com", $depth = 5){
static $seen = array();
if (isset($seen[$url]) || $depth === 0)
return;
$path = $query = '';
$seen[$url] = true;
$dom = new DOMDocument('1.0');
@$dom->loadHTMLFile($url);
$anchors = $dom->getElementsByTagName('a');
foreach ($anchors as $element){
$href = $element->getAttribute('href');
if (0 !== strpos($href, 'http')){
$href = $url;
}
$this->crawl($href, $depth - 1);
}
$parse = parse_url($url);
if(preg_replace('#^www.(.+.)#i', '$1', $parse['host']) == $this->domain_name){
if(array_key_exists('query', $parse)){
$this->crud->insert('dynamic_urls', array('url_link' => $url));
}
}
}
php codeigniter web-crawler
php codeigniter web-crawler
asked Nov 10 at 21:33
Amine Bouhaddi
12
12
closed as off-topic by John Conde, Vickel, DFriend, sideshowbarker, AdrianHHH Nov 10 at 22:37
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Vickel, DFriend, sideshowbarker, AdrianHHH
If this question can be reworded to fit the rules in the help center, please edit the question.
closed as off-topic by John Conde, Vickel, DFriend, sideshowbarker, AdrianHHH Nov 10 at 22:37
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Vickel, DFriend, sideshowbarker, AdrianHHH
If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes