Many sites have been affected by the aggressiveness of web crawlers designed to improve LLMs.
I’ve been relatively spared, but since the phenomenon started, I've been looking for a solution to implement.
Today, I present a zip bomb gzip and brotli that is valid HTML.
Another interesting deterrent is to poison unruly scrapers with infinite nonsense via Markov chains https://algorithmic-sabotage.github.io/asrg/trapping-ai/