StormCrawler

Website

  • Libre
  • Mac
  • Windows
  • Linux
Description

StormCrawler is an open source collection of reusable resources for building distributed web crawlers and providing near real-time crawling and indexing. It is a library of crawlers, parsers, indexers and other components that can be used to create robust and scalable web crawlers. With StormCrawler, you can easily build a distributed web crawler that is resilient to failure, can scale up and down, and can crawl and index content quickly. StormCrawler uses Apache Storm to process data in parallel, and Apache Solr to store and query the data. Additionally, StormCrawler provides an API to allow developers to customize the crawlers to their specific needs. StormCrawler is suitable for large scale web crawling and indexing projects.

Categories

Alternatives