Hacker Newsnew | past | comments | ask | show | jobs | submit | 542458's commentslogin

Okay, so I know back in the day you could choke scanning software (ie email attachment scanners) by throwing a zip bomb into them. I believe the software has gotten smarter these days so it won’t simply crash when that happens - but how is this done; How does one detect a zip bomb?

I don't understand the code itself, but here's Debian's patch to detect overlapping zip bombs in `unzip`:

https://sources.debian.org/patches/unzip/6.0-29/23-cve-2019-...

    The detection maintains a list of covered spans of the zip files
    so far, where the central directory to the end of the file and any
    bytes preceding the first entry at zip file offset zero are
    considered covered initially. Then as each entry is decompressed
    or tested, it is considered covered. When a new entry is about to
    be processed, its initial offset is checked to see if it is
    contained by a covered span. If so, the zip file is rejected as
    invalid.
So effectively it seems as though it just keeps track of which parts of the zip file have already been 'used', and if a new entry in the zip file starts in a 'used' section then it fails.

I wonder if this has actually been used for backing up in real use cases (think how LVM or ZFS do snapshotting)?

I.e. an advanced compressor could abuse the zip file format to share base data for files which only incrementally change (get appended to, for instance).

And then this patch would disallow such practice.


For any compression algorithm in general, you keep track of A = {uncompressed bytes processed} and B = {compressed bytes processed} while decompressing, and bail out when either of the following occur:

1. A exceeds some unreasonable threshold

2. A/B exceeds some unreasonable threshold


In practice one of the things that happens very often is that you compress a file filled with null bytes. Such files compress extremely well, and would trigger your A/B threshold.

On the other hand, zip bomb described in this blog post relies on decompressing the same data multiple times - so it wouldn't trigger your A/B heuristics necessarily.

Finally, A just means "you can't compress more than X bytes with my file format", right? Not a desirable property to have. If deflate authors had this idea when they designed the algorithm, I bet files larger than "unreasonable" 16MB would be forbidden.


> In practice one of the things that happens very often is that you compress a file filled with null bytes. Such files compress extremely well, and would trigger your A/B threshold.

Sure, if you expect to decompress files with high compression ratios, then you'll want to adjust your knobs accordingly.

> On the other hand, zip bomb described in this blog post relies on decompressing the same data multiple times - so it wouldn't trigger your A/B heuristics necessarily.

If you decompress the same data multiple times, then you increment A multiple times. The accounting still works regardless of whether the data is same or different. Perhaps a better description of A and B in my post would be {number of decompressed bytes written} and {number of compressed bytes read}, respectively.

> Finally, A just means "you can't compress more than X bytes with my file format", right? Not a desirable property to have. If deflate authors had this idea when they designed the algorithm, I bet files larger than "unreasonable" 16MB would be forbidden.

The limitation is imposed by the application, not by the codec itself. The application doing the decompression is supposed to process the input incrementally (in the case of DEFLATE, reading one block at a time and inflating it), updating A and B on each iteration, and aborting if a threshold is violated.


Embarrsingly simple for a scanner too as you just mark as suspicious when this happens. You can be wrong sometimes and this is expected

There's a short story by Qntm called "Valuable Humans in Transit" that I like quite a bit which hinges on this subject: https://qntm.org/transi

One of my favorite pieces of short fiction.

I really doubt the user data for a smart tv user is all that valuable. Meta has infinitely more rich data and an entire tightly optimized ad system and is on a platform where people commonly make large purchases and makes around $10 per user per year.


> I really doubt the user data for a smart tv user is all that valuable.

According to a 2021 article about Vizio's user-hostile advert display devices, they boast of an average revenue of $13/yr - up from $7.30/yr, though consider this was 2020 when more people were at-home watching TV instead of going outside, meeting people, touching grass, the usual.

https://deadline.com/2021/03/vizio-smart-tv-streaming-ipo-12...

> A range of advertising opportunities drive revenue, including revenue sharing with programmers and distribution partners as well as activations on the device home screen. In the fourth quarter of 2020, the company said average revenue per user on SmartCast was $12.99, up from $7.31 in the same period of 2019.

-------------

If you'll allow me to make an arbitrary assumption that a new TV set bought today will last about 10 years, then $13/yr means the advertising revenue implies Vizio has reduced the sale-price of their TVs by $130 compared to before we had no-opt-out advertising displayed on our own property as a condition for the privilege of using said device.


Surprisingly, apparently Americans average 279 eggs per year per person or 24 per month.

https://www.washingtonpost.com/business/2019/02/28/why-ameri...

(This is not a comment making any judgements about cost or the state of the economy, I was just surprised to find it that high)


cuz eggs are in breakfast sandwiches, are ingredients in pastries, act as binders in things like meatloaf or fried chicken, etc. etc.


That sounded high to me as well(probably because I rarely eat eggs), but then I remembered my parents who each eat two per day which isn't that uncommon I guess.


Maybe if you include all the eggs in processed food like cookies or cakes and in restaurants or other catering operations you reach that number? And eggs consumed at home could still be around 12 per person?


That site seems to date from the days before there were real usage limits on Claude Code. Note that none of the submissions are recent. As such, I think it's basically irrelevant - the general observation is that Claude Code will rate limit you long, long before you can pull off the usage depicted so it's unlikely you can be massively net-profit-negative on Claude Code.


I don't see how server-side-only anticheat could prevent cheats that simulate perfect input i.e., aimbots on known targets. Yes, you could attempt to heuristically identify cheat-y looking patterns of input, but I suspect that's much much easier said than done for anything other than very simple aimbots.


You don't need to split into multiple files to make large documents manageable, multiple pages works just fine (pages you're not using aren't loaded). But even still, I have absolutely massive pages with ~100 screens on them that work just fine on this base-tier M2 MBA.

Honestly given the complexity of the screens involved I feel Figma's performance is pretty reasonable. (Now, library publish and update - that's still unreasonably slow IMO)


Honestly, this is huge for people like me who tend to over-research and over-think the hell out of product choices. "Find me a top-fill warm-mist humidifier that looks nice, is competitively priced against similar products, and is available from a retailer in $city_name. Now watch for it to go on sale and lmk."

If they can figure out how to get the right kickbacks/referrals without compromising user trust and really nail the search and aggregation of data this could be a real money-maker.


Trusting AI with your shopping is very short sighted.

Lol what a terrible idea. Why not just hand every decision you'll ever make to AI?

Nobody needs critical thinking or anything. Just have AI do it so you save $3 and 4 minutes.


Why would I want to spend 1-2h researching humidifiers if I can spend that time in any other way, and still end up with a humidifier that fits my needs first try?

This kind of task is perfect for AI in a way that doesn't take away too much from the human experience. I'll keep my art, but shopping can die off.


Because you end up with a $1 value piece of crap that someone spent considerable time optimizing and faking reviews for LLMs instead of on the product. Basically in the medium term this strategy will get you temu stuff


How often do you buy the first result on an Amazon search? Because that's delegating your labour, isn't it? Surely the best products are getting to the top, right? Well no, they're being paid to get to the top. An LLM that has in-app shopping is gonna be the same thing


> This kind of task is perfect for AI in a way that doesn't take away too much from the human experience.

Not the current form of AI. I regularly use Project Farm to find the best "insert tool". In an ideal world a robot runs all of these tests in perpetuity covering every physical appliance possible (with every variation, etc.). However, current AI cannot do this. Obviously LLMs can't do this because they don't operate in the physical world.


Well, you can always do the same thing that an LLM would: open SEO spam ranking sites "best humidifiers 2025", filled with referral links to Amazon or other sellers, which basically copy product descriptions and assign rankings that aren't based on any tests or real data.


for the same problems with amazon, youre relying on a computer to tell you what to buy, which is very shortly going to be infested with promoted products and adverts instead of genuine advice. The AI implementers will poison the responses in the name of advertising, of this i have zero doubt in my mind.


10+ years ago you could in fact just pick the best-reviewed product on Amazon at a certain price point and have a great experience! God help you if you tried that today.


Fundamentally, is it really that different from being persuaded by an advertisement or trusting what the marketing says on the box?


> Just have AI do it so you save $3 and 4 minutes.

Maybe I am deeply suboptimal, but typically this kind of decision takes me far more than 4 minutes.


Compared to what, reading Amazon reviews? Google site:reddit.com?


For this to be useful they need up to date information, so it just Googles shit and reads Reddit comments. I just don't see how that is likely to be any better than Googling shit and reading Reddit comments yourself.

If they had some direct feed of quality product information it could be interesting. But who would trust that to be impartial?


It’s better in that I don’t have to waste my time reading Google and Reddit myself, but can let a robot do it.


Do you buy the first item that pops up on Amazon for a search that you've made? Because that's letting the robot do it for you.

If the answer is "no because that's an ad", well, how do you know that the output from ChatGPT isn't all just products that have bought their rank in the results?


You get the sources, you click through to them to see what they are.

EDIT: Like, have you actually tried this? If you ask it to summarise what Reddit is saying with sources, that’s pretty much exactly what you get.


Project Farm solves the trust problem with methodology + video documentation and the monetization problem with affiliate links for every product tested.


I can Google and read but it takes a lot of time


> If they can figure out how to get the right kickbacks/referrals without compromising user trust

This is a complete contradiction. Once there's money involved in the recommendation you can no longer trust the recommendation. At a minimum any kind of referral means that there's strong incentive to get you to buy something instead of telling you "there are no good options that meet your criteria". But the logical next step for this kind of system is companies paying money to tilt the recommendation in their favour. Would OpenAI leave that money on the table? I can't imagine they would.


> If they can figure out how to get the right kickbacks/referrals without compromising user trust

i'm trying to envision a situation in which the former doesn't cancel out the latter but i'm having a pretty hard time doing that. it seems inevitable that these LLM services will just become another way to deliver advertised content to users.


> If they can figure out how to get the right kickbacks/referrals without compromising user trust and really nail the search and aggregation of data this could be a real money-maker.

As another commenter points out, "not compromising user trust" seems at odds with "money-maker" in the long-term. Surely Google and other large tech companies have demonstrated that to you at this point? I don't understand why so many people think OpenAI or any of them will be any different?


I still approximately trust (yes, I know it's imperfect, but so is every other source) NYT's Wirecutter, and they do affiliate links.


> Run FLUX.2 [dev] on GeForce RTX GPUs for local experimentation with an optimized fp8 reference implementation of FLUX.2 [dev], created in collaboration with NVIDIA and ComfyUI.

Glad to see that they're sticking with open weights.

That said, Flux 1.x was 12B params, right? So this is about 3x as large plus a 24B text encoder (unless I'm misunderstanding), so it might be a significant challenge for local use. I'll be looking forward to the distill version.


Looking at the file sizes on the open weights version (https://huggingface.co/black-forest-labs/FLUX.2-dev/tree/mai...), the 24B text encoder is 48GB, the generation model itself is 64GB, which roughly tracks with it being the 32B parameters mentioned.

Downloading over 100GB of model weights is a tough sell for the local-only hobbyists.


100 GB is less than a game download, it's actually running it that's a tough sell. That said, the linked blog post seems to say the optimized model is both smaller and greatly improved the streaming approach from system RAM, so maybe it is actually reasonably usable on a single 4090/5090 type setup (I'm not at home to test).


Never mind the download size. Who has the VRAM to run it?


I do, 2x Strix Halo machines ready to go.


(Fellow Strix Halo owner): I don't really like calling it VRAM any more than when a dGPU dynamically maps a portion of system RAM. It's really just a system with quad channel RAM speeds attached to a GPU without VRAM - nearly 2x identical in performance to using the system RAM on my 2 channel desktop instead of actual VRAM on the dGPU in the system (which is something like 20x).

That's great, and I love the little laptop for the amount of x86 perf it can pack into so little cooling, but my used Epyc box of ~the same price is usually faster for AI (despite the complete lack of video card) and able to load models 3x the size (well, before RAM prices doubled this last month) because it has modular 12 channel RAM and memory speeds this low don't really need a GPU to keep up with the matrix math. Meanwhile, Flux is already slow when it's on actual real high bandwidth dedicated GPU memory VRAM.


The download is a trivial onetime cost and so is storing it on a direct attached NVMe SSD. The expensive part is getting a GPU with 64GB of memory.


Even a 5090 can handle that. You have to use multiple GPUs.

So the only option will be [klein] on a single GPU... maybe? Since we don't have much information.


As far as I know, no open-weights image gen tech supports multi-GPU workflows except in the trivial sense that you can generate two images in parallel. The model either fits into the VRAM of a single card or it doesn’t. A 5ish-bit quantization of a 32Gw model would be usable by owners of 24GB cards, and very likely someone will create one.


> Even a 5090 can handle that. You have to use multiple GPUs.

It takes about 40GB with the fp8 version fully loaded, but ComfyUI can (at reduced speed), with enough system RAM available, partially load models in VRAM during inference and swap at need (the NVidia page linked in the BFL announcement specifically highlights NVidia working with ComfyUI to improve this existing capacity specifically to enable Flux.2) to run on systems with too little VRAM to fully load the model.


Per https://www.abc.net.au/news/2025-11-24/bom-website-approved-... updated modelling, including the supercomputer to run the model, was included in the bill.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: