Apache Nutch

Website

  • Libre
  • Mac
  • Windows
  • Linux
Description

Apache Nutch is an open source web crawling and web search engine software project. It is a highly extensible and scalable web crawler written in Java. Nutch is part of the Apache Lucene project, which is a set of search-related tools and technologies. Nutch is used to crawl the web and gather data from websites. It can retrieve, parse, index and store web content. It can be used to build web search engines, search portals, and more. It is highly customizable and can be used to create custom search solutions. Nutch is also integrated with Apache Solr and Hadoop, so it can be used for distributed crawling and search. Nutch has many features such as customizable crawlers, support for multiple languages, automatic page categorization, and more.

Categories

Alternatives