NAME Scrappy - All Powerful Web Spidering, Scrapering, Crawling Framework VERSION version 0.9111110 SYNOPSIS #!/usr/bin/perl use Scrappy; my $scraper = Scrappy->new; $scraper->crawl('search.cpan.org', '/recent' => { '#cpansearch li a' => sub { print $_[1]->{href}, "\n"; } } ); DESCRIPTION Scrappy is an easy (and hopefully fun) way of scraping, spidering, and/or harvesting information from web pages, web services, and more. Scrappy is a feature rich, flexible, intelligent web automation tool. Scrappy (pronounced Scrap+Pee) == 'Scraper Happy' or 'Happy Scraper'; If you like you may call it Scrapy (pronounced Scrape+Pee) although Python has a web scraping framework by that name and this module is not a port of that one. METHODS crawl The crawl method is very useful when it is desired to crawl an entire website or at-least partially, it automates the tasks of creating a queue, fetching and parsing html pages, and establishing simple flow-control. See the SYNOPSIS for a simplified example, ... the following is a more complex example. my $scrappy = Scrappy->new; $scrappy->crawl('http://search.cpan.org/recent', '/recent' => { '#cpansearch li a' => sub { my ($self, $item) = @_; # follow all recent modules from search.cpan.org $self->queue->add($item->{href}); } }, '/~:author/:name-:version/' => { 'body' => sub { my ($self, $item, $args) = @_; my $reviews = $self ->select('.box table tr')->focus(3)->select('td.cell small a') ->data->[0]->{text}; $reviews = $reviews =~ /\d+ Reviews/ ? $reviews : '0 reviews'; print "found $args->{name} version $args->{version} ". "[$reviews] by $args->{author}\n"; } } ); AUTHOR Al Newkirk <awncorp@cpan.org> COPYRIGHT AND LICENSE This software is copyright (c) 2010 by awncorp. This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.