The password may not be known to HIBP at the time it is registered. This will very likely be a large enough fraction (>0.2%) that you can still get into sufficient accounts to access plenty of relatives' data.
> could probably check if large numbers of people have all tried to sign in from common IP addresses within the space of a few hours
>The password may not be known to HIBP at the time it is registered. This will very likely be a large enough fraction (>0.2%) that you can still get into sufficient accounts to access plenty of relatives' data.
To be clear, the password can be checked at login-time, rather than registration-time, at which point the service should send the user through an account recovery process. There's still scope for passwords not appearing in the HIBP dataset, but it's massively reduced.
Do you mean this link: https://news.ycombinator.com/item?id=39116531 ? I'm not sure I agree with your conclusion. You should be able to successfully highlight a unique IP address making ~4,400 discrete login attempts across a month as suspicious - and further highlight that there are 1,000 other IP addresses behaving in the same way. Most users login from a handfull of predictable IP addresses, and most IP addresses login with only a couple of predictable accounts.
These types of login analytics aren't beyond the ken of man, and a service like 23-and-me should definitely not be able to allow 4.4m attempts and 14k successes from a small set of IP addresses without it raising some internal alerts.
With CGNAT as in Italy or daily forced IP changes as in Germany, I'm not sure that it's true that most people log in from a predictable set of IP addresses. Perhaps one could indeed establish such a pattern in some countries like the Netherlands, and establish a set of ISPs per account for customers in other countries.
I must agree, though, about your point that 4k different logins from the same address in one month would be rather high for their customer base, so the limit could be lower if you allow enough bursting. What do you do after that, though, block them outright if you suspect a bot? That's going to block real users also. Give it captchas? Besides people also hating those, one can have someone in Bangladesh solve them if modern neural nets don't get the desired solve rate.
I guess the overall solution will have to be 2FA and, indeed, some long-term rate limit beyond which they'll have to give users captchas (to at least increase the cost of an attack), and some upper bound beyond which it gets outright blocked.
> could probably check if large numbers of people have all tried to sign in from common IP addresses within the space of a few hours
That doesn't work, see my sibling comment where I did the math on what authentication rate you'd need to trigger at https://news.ycombinator.com/item?id=39116531