Gitea has a builtin defense against this, `REQUIRE_SIGNIN_VIEW=expensive`, that completely stopped AI traffic issues for me and cut my VPS's bandwidth usage by 95%.
This is the most assured best way to make sure your remain the only user of your stuff.
I highly encourage folks to put stuff out there! Put your stuff on the internet! Even if you don't need it even if you don't think you'll necessarily benefit: leave the door open to possibility!
Crawlers will find everything on the internet eventually regardless of subdomain (e.g. from crt.sh logs, or Google finds them from 8.8.8.8 queries).
REQUIRE_SIGNIN_VIEW=true means signin is required for all pages - that's great and definitely stops AI bots. The signin page is very cheap for Gitea to render. However, it is a barrier for the regular human visitors to your site.
'expensive' is a middle-ground that lets normal visitors browse and explore repos, view README, and download release binaries. Signin is only required for "expensive" pageloads, such as viewing file content at specific commits git history.
From Gitea's doc I was under the impression that it was going further than "true" so I didn't understood why because "true" was enough for me to not be bothered by bots.
But in your case you want a middle-ground, which is provided by "expensive"!
> Enable this to force users to log in to view any page or to use API. It could be set to "expensive" to block anonymous users accessing some pages which consume a lot of resources, for example: block anonymous AI crawlers from accessing repo code pages. The "expensive" mode is experimental and subject to change.
Forgejo doesn't seem to have copied that feature yet