• OpenStars@piefed.social
    link
    fedilink
    English
    arrow-up
    0
    ·
    6 days ago

    Except iirc, they aren’t scraping “properly” (read: efficiently at least, setting aside morality for the sake of discussing this component in isolation), and are causing traffic troubles. If only they took the time to install an actual instance themselves then nobody would care in the slightest (again, ignoring the morality part, for now).

    TLDR: they are being dicks about it, bc offering everything we have for free is not enough for them.

    • Chloé 🥕@lemmy.blahaj.zone
      link
      fedilink
      English
      arrow-up
      0
      ·
      6 days ago

      i mean, that’s exactly what they did with threads, and many instances defederated from it because they didn’t want to have their data scraped by meta

    • MrKaplan@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      6 days ago

      of all the scrapers we see, the requests identified as originating from Meta seem to be well behaved overall. they appear to (mostly) be respecting robots.txt where present and their request volume to Lemmy.World is only averaging slightly above 5 requests per minute over the last 2 weeks. they also don’t spoof their user agents to pretend to be web browsers, or at least I have not seen credible accusations of this happening.

    • scytale@piefed.zip
      link
      fedilink
      English
      arrow-up
      0
      ·
      6 days ago

      But if they do it the “proper” way, they won’t be able to grab the data if instances defederate from them, right? And that’s what the majority of instances will do.

      • FaceDeer@fedia.io
        link
        fedilink
        arrow-up
        0
        ·
        6 days ago

        Assuming you know which instances are the ones they’re collecting data from. It could be any instance.

        • OpenStars@piefed.social
          link
          fedilink
          English
          arrow-up
          0
          ·
          edit-2
          6 days ago

          You are absolutely correct there, in that hypothetical scenario if they were to attempt to hide their traffic among normal instance activities.

          To add a bit more detail to my previous answer, there were some prior discussions about this topic, citing some of the most popular instances of the entire Threadiverse having been targeted by their normal DDOS-like approach:

          img