Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Oh, of course, double check every single email just before it is sent, that will not impact the server very much. Except it will. By a lot.


I mean I agree but it's also possible to store a list of recently unsubscribed email addresses on the servers sending the mail out. Is it worth the trouble? Probably not.


You don't need to double check, you just have a way to remove emails from the queue. And doing simple removal from the queue only when someone unsubs is much less work than checking every e-mail in the queue against some central subscriber list, before sending.


Well, actually no.

The vast majority of bulk email is done via 3rd party providers like constant contact or mail chimp because delivery is important.

You can't just check an address--you have to check against the particular mailing list, email address and account which means you'd have to basically have to have the database. Otherwise, if I unsubscribe from "Megous Monthly Mailer" I'll stop getting the announcements from my kid's school with useful info like what the first day of school for my kid is.

So you need to know the account, the mailing list that was unsubscribed, the email address. Of course, the system is architected for high-volume, and scheduled email. Delivery volumes can be in the billions per minute. Anything too crazy and your service level is destroyed.

Queues are designed for high-volume queuing, not search, or retrieval. There's no "query API" for rabbitMQ, for example.

All of this is easy if we hand-wave away the requirements for high volume email delivery.


Presumably people are unsubbing all day.

If n is the number of emails and k is the number of unsubs, it adds up to O(n) to check the table of unsubs before sending vs. O(k*n) to scan the queue after every unsub.

Or, to be less computer-sciency, a task that scans the queue of emails every time someone unsubs would be a pain to keep running. Elsewhere in this discussion the queue was described as a file. Who wants to scan the same file over and over all day?


Yes, but all that is easy to optimize. You can scan the mail file on entry to the queue and store file path/recipients to an indexed database. Then removal is just one SELECT query and unlink() call.


Thank you. You said what I was trying to, concisely.


Nah it doesn't. Cache the list of emails that have unsubscribed, your dispatching process will filter out from that list.

Let's say 10million have unsubscribe, at 128bytes per email. That's roughly under 1.5 gigs of ram to store all those in memory.

Sending 2million emails daily comes out to an average of 23 a second. You can search 23 times through that list pretty quickly with one extra CPU core.


Pretty sure you could check a email in a database more than 23 times per second these days.


If it was sane company, you could. When I was doing some support for less sane companies, sometimes their "servers" were less powerful than my phone (hey, it still works after 10yrs, why change it). When my current company made product, everyone in such less sane companies was amazed that we could produce monthly route reports for fleet cars in seconds instead of minutes (and as devs we thought it was still not optimized enough).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: