grab-spider

PHP async scrapper used multi curl and reactphp inspired by python grab

Installation

To install grab-spider run the command:

composer require grab/spider "dev-master"

Quick start

<?php

require __DIR__ . '/../vendor/autoload.php';

class HackerNewCrawler extends \Grab\Spider
{
    public function taskGenerator()
    {
        $range = array_map(function($item) {
            return sprintf('https://news.ycombinator.com/news?p=%d', $item);
        }, range(1, 4)) ;

        foreach ($range as $url) {
            $this->task('page', [
                'url' => $url,
                'max_request' => 10,
            ]);
        }
    }

    public function taskPage($parser, $task)
    {
        $links = $parser->find('.storylink');
        foreach ($links as $link) {
            $this->task('topic', [
                'url' => $link->getAttribute('href'),
                'curl_config' => [
                    CURLOPT_TIMEOUT => 60,
                ],
                'max_request' => 10,
            ]);
        }
    }

    public function taskTopic($parser, $task)
    {
        $products = $parser->find('title');
        echo trim($products[0]->text()) . PHP_EOL;
    }
}

$bot = new HackerNewCrawler();
$bot->debug = true;
$bot->setCurlSetting([
    CURLOPT_USERAGENT => 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36',
]);
//$bot->loadProxy(__DIR__ . '/proxy_list.txt');
$bot->run();

Simple DI from change parser

    $parser = new \DiDom\Document();
    $bot = new HackerNewCrawler([$parser, 'load']);

    $bot = new HackerNewCrawler(function ($content) {
        $parser = new \DiDom\Document();
        return $parser->load($content);
    });
    
    $bot = new HackerNewCrawler(function ($content) {
        return simplexml_load_string($content);
    });
    
    $bot = new HackerNewCrawler(function ($content) {
        return new \SoapClient($content);
    });

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
example		example
src		src
.codeclimate		.codeclimate
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
_config.yml		_config.yml
composer.json		composer.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

grab-spider

Installation

Quick start

Simple DI from change parser

About

Releases 1

Packages

Languages

strelov1/Spider

Folders and files

Latest commit

History

Repository files navigation

grab-spider

Installation

Quick start

Simple DI from change parser

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages