• BodilessGaze@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        0
        ·
        13 days ago

        No, the reason no action will be taken is because Huawei is a Chinese company. I work for a major US company that’s dealing with the same problem, and the problematic scrapers are usually from China. US companies like OpenAI rarely cause serious problems because they know we can sue them if they do. There’s nothing we can do legally about Chinese scrapers.

          • BodilessGaze@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            1
            ·
            edit-2
            12 days ago

            We do, somewhat. We haven’t gone as far as a blanket ban of Chinese CIDR ranges because there’s a lot of risks and bureaucracy associated with a move like that. But it probably makes sense for a small company like Codeberg, since they have higher risk tolerance and can move faster.

    • Programmer Belch@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      1
      ·
      13 days ago

      I use a tool that downloads a website to check for new chapters of series every day, then creates an RSS feed with the contents. Would this be considered a harmful scraper?

      The problem with AI scrapers and bots is their scale, thousands of requests to webpages that the internal server cannot handle, resulting in slow traffic.

        • who@feddit.org
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          13 days ago

          Unfortunately, robots.txt cannot express rate limits, so it would be an overly blunt instrument for things like GP describes. HTTP 429 would be a better fit.

          • redjard@lemmy.dbzer0.com
            link
            fedilink
            English
            arrow-up
            1
            ·
            12 days ago

            Crawl-delay is just that, a simple directive to add to robots.txt to set the maximum crawl frequency. It used to be widely followed by all but the worst crawlers …