• Programmer Belch@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 days ago

    I use a tool that downloads a website to check for new chapters of series every day, then creates an RSS feed with the contents. Would this be considered a harmful scraper?

    The problem with AI scrapers and bots is their scale, thousands of requests to webpages that the internal server cannot handle, resulting in slow traffic.

      • who@feddit.org
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        10 days ago

        Unfortunately, robots.txt cannot express rate limits, so it would be an overly blunt instrument for things like GP describes. HTTP 429 would be a better fit.

        • redjard@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          1
          ·
          10 days ago

          Crawl-delay is just that, a simple directive to add to robots.txt to set the maximum crawl frequency. It used to be widely followed by all but the worst crawlers …