Mitarbeit willkommen! Bitte schau unter Hilfe:Benutzerkonto oder informiere Dich über Populus.Wiki.

Populus:DezInV/Notes

Aus Populus DE
Zur Navigation springenZur Suche springen

archive

crawling

index

scraping

search

from org-mode/thk

dezentrale Suchmaschine

Internet Archive

Crawling, Crawler

  1. ArchiveBox

    Python, aktiv

    Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more…

  2. grab-site

    Python, aktiv https://github.com/ArchiveTeam/grab-site

  3. WASP

  4. StormCrawler

  5. HTTrack

  6. Grub.org

    dezentraler Crawler des Wikia Projektes In C#, Pyton

  7. HCE – Hierarchical Cluster Engine

  8. Heritrix

  9. Haskell

Search

  1. openwebsearch.eu

  2. lemurproject.org

    1. lucindri

      https://lemurproject.org/lucindri.php

      Lucindri is an open-source implementation of Indri search logic and structured query language using the Lucene Search Engine. Lucindri consists of two components: the indexer and the searcher.

    2. Galago

      toolkit for experimenting with text search

Wikia

Recherche 2023-11-26

Scraping