PHP crawler take a lot of time in the execution [closed]











up vote
-2
down vote

favorite












I'm working on an small websites crawler to get links, but my script take a lot of time in the execution and sometime return without a result. can you help or suggest me another algorithm please.



public function crawl($url = "http://www.example.com", $depth = 5){
static $seen = array();
if (isset($seen[$url]) || $depth === 0)
return;

$path = $query = '';
$seen[$url] = true;

$dom = new DOMDocument('1.0');
@$dom->loadHTMLFile($url);
$anchors = $dom->getElementsByTagName('a');

foreach ($anchors as $element){
$href = $element->getAttribute('href');
if (0 !== strpos($href, 'http')){
$href = $url;
}
$this->crawl($href, $depth - 1);
}

$parse = parse_url($url);
if(preg_replace('#^www.(.+.)#i', '$1', $parse['host']) == $this->domain_name){
if(array_key_exists('query', $parse)){
$this->crud->insert('dynamic_urls', array('url_link' => $url));
}
}
}









share|improve this question













closed as off-topic by John Conde, Vickel, DFriend, sideshowbarker, AdrianHHH Nov 10 at 22:37


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Vickel, DFriend, sideshowbarker, AdrianHHH

If this question can be reworded to fit the rules in the help center, please edit the question.

















    up vote
    -2
    down vote

    favorite












    I'm working on an small websites crawler to get links, but my script take a lot of time in the execution and sometime return without a result. can you help or suggest me another algorithm please.



    public function crawl($url = "http://www.example.com", $depth = 5){
    static $seen = array();
    if (isset($seen[$url]) || $depth === 0)
    return;

    $path = $query = '';
    $seen[$url] = true;

    $dom = new DOMDocument('1.0');
    @$dom->loadHTMLFile($url);
    $anchors = $dom->getElementsByTagName('a');

    foreach ($anchors as $element){
    $href = $element->getAttribute('href');
    if (0 !== strpos($href, 'http')){
    $href = $url;
    }
    $this->crawl($href, $depth - 1);
    }

    $parse = parse_url($url);
    if(preg_replace('#^www.(.+.)#i', '$1', $parse['host']) == $this->domain_name){
    if(array_key_exists('query', $parse)){
    $this->crud->insert('dynamic_urls', array('url_link' => $url));
    }
    }
    }









    share|improve this question













    closed as off-topic by John Conde, Vickel, DFriend, sideshowbarker, AdrianHHH Nov 10 at 22:37


    This question appears to be off-topic. The users who voted to close gave this specific reason:


    • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Vickel, DFriend, sideshowbarker, AdrianHHH

    If this question can be reworded to fit the rules in the help center, please edit the question.















      up vote
      -2
      down vote

      favorite









      up vote
      -2
      down vote

      favorite











      I'm working on an small websites crawler to get links, but my script take a lot of time in the execution and sometime return without a result. can you help or suggest me another algorithm please.



      public function crawl($url = "http://www.example.com", $depth = 5){
      static $seen = array();
      if (isset($seen[$url]) || $depth === 0)
      return;

      $path = $query = '';
      $seen[$url] = true;

      $dom = new DOMDocument('1.0');
      @$dom->loadHTMLFile($url);
      $anchors = $dom->getElementsByTagName('a');

      foreach ($anchors as $element){
      $href = $element->getAttribute('href');
      if (0 !== strpos($href, 'http')){
      $href = $url;
      }
      $this->crawl($href, $depth - 1);
      }

      $parse = parse_url($url);
      if(preg_replace('#^www.(.+.)#i', '$1', $parse['host']) == $this->domain_name){
      if(array_key_exists('query', $parse)){
      $this->crud->insert('dynamic_urls', array('url_link' => $url));
      }
      }
      }









      share|improve this question













      I'm working on an small websites crawler to get links, but my script take a lot of time in the execution and sometime return without a result. can you help or suggest me another algorithm please.



      public function crawl($url = "http://www.example.com", $depth = 5){
      static $seen = array();
      if (isset($seen[$url]) || $depth === 0)
      return;

      $path = $query = '';
      $seen[$url] = true;

      $dom = new DOMDocument('1.0');
      @$dom->loadHTMLFile($url);
      $anchors = $dom->getElementsByTagName('a');

      foreach ($anchors as $element){
      $href = $element->getAttribute('href');
      if (0 !== strpos($href, 'http')){
      $href = $url;
      }
      $this->crawl($href, $depth - 1);
      }

      $parse = parse_url($url);
      if(preg_replace('#^www.(.+.)#i', '$1', $parse['host']) == $this->domain_name){
      if(array_key_exists('query', $parse)){
      $this->crud->insert('dynamic_urls', array('url_link' => $url));
      }
      }
      }






      php codeigniter web-crawler






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 10 at 21:33









      Amine Bouhaddi

      12




      12




      closed as off-topic by John Conde, Vickel, DFriend, sideshowbarker, AdrianHHH Nov 10 at 22:37


      This question appears to be off-topic. The users who voted to close gave this specific reason:


      • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Vickel, DFriend, sideshowbarker, AdrianHHH

      If this question can be reworded to fit the rules in the help center, please edit the question.




      closed as off-topic by John Conde, Vickel, DFriend, sideshowbarker, AdrianHHH Nov 10 at 22:37


      This question appears to be off-topic. The users who voted to close gave this specific reason:


      • "Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – Vickel, DFriend, sideshowbarker, AdrianHHH

      If this question can be reworded to fit the rules in the help center, please edit the question.





























          active

          oldest

          votes






















          active

          oldest

          votes













          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes

          Popular posts from this blog

          Xamarin.iOS Cant Deploy on Iphone

          Glorious Revolution

          Dulmage-Mendelsohn matrix decomposition in Python