Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther
Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.
Full article here.
Link to the full leaked list download: Meta leaked list pdf
Run your access logs through something that will report the ASN for the client IPs. Goaccess would be my recommendation. It will require access to a GeoIP database which you can get from Maxmind by signing up for a free API key, or download them directly from P3TERX/GeoLite.mmdb on Github. We have identified a number of bot networks this way. Happy to help further if you’d like a hand 👍