Okay, so I know back in the day you could choke scanning software (ie email attachment scanners) by throwing a zip bomb into them. I believe the software has gotten smarter these days so it won’t simply crash when that happens - but how is this done; How does one detect a zip bomb?
The detection maintains a list of covered spans of the zip files
so far, where the central directory to the end of the file and any
bytes preceding the first entry at zip file offset zero are
considered covered initially. Then as each entry is decompressed
or tested, it is considered covered. When a new entry is about to
be processed, its initial offset is checked to see if it is
contained by a covered span. If so, the zip file is rejected as
invalid.
So effectively it seems as though it just keeps track of which parts of the zip file have already been 'used', and if a new entry in the zip file starts in a 'used' section then it fails.
I wonder if this has actually been used for backing up in real use cases (think how LVM or ZFS do snapshotting)?
I.e. an advanced compressor could abuse the zip file format to share base data for files which only incrementally change (get appended to, for instance).
For any compression algorithm in general, you keep track of A = {uncompressed bytes processed} and B = {compressed bytes processed} while decompressing, and bail out when either of the following occur:
In practice one of the things that happens very often is that you compress a file filled with null bytes. Such files compress extremely well, and would trigger your A/B threshold.
On the other hand, zip bomb described in this blog post relies on decompressing the same data multiple times - so it wouldn't trigger your A/B heuristics necessarily.
Finally, A just means "you can't compress more than X bytes with my file format", right? Not a desirable property to have. If deflate authors had this idea when they designed the algorithm, I bet files larger than "unreasonable" 16MB would be forbidden.
> In practice one of the things that happens very often is that you compress a file filled with null bytes. Such files compress extremely well, and would trigger your A/B threshold.
Sure, if you expect to decompress files with high compression ratios, then you'll want to adjust your knobs accordingly.
> On the other hand, zip bomb described in this blog post relies on decompressing the same data multiple times - so it wouldn't trigger your A/B heuristics necessarily.
If you decompress the same data multiple times, then you increment A multiple times. The accounting still works regardless of whether the data is same or different. Perhaps a better description of A and B in my post would be {number of decompressed bytes written} and {number of compressed bytes read}, respectively.
> Finally, A just means "you can't compress more than X bytes with my file format", right? Not a desirable property to have. If deflate authors had this idea when they designed the algorithm, I bet files larger than "unreasonable" 16MB would be forbidden.
The limitation is imposed by the application, not by the codec itself. The application doing the decompression is supposed to process the input incrementally (in the case of DEFLATE, reading one block at a time and inflating it), updating A and B on each iteration, and aborting if a threshold is violated.
I really doubt the user data for a smart tv user is all that valuable. Meta has infinitely more rich data and an entire tightly optimized ad system and is on a platform where people commonly make large purchases and makes around $10 per user per year.
> I really doubt the user data for a smart tv user is all that valuable.
According to a 2021 article about Vizio's user-hostile advert display devices, they boast of an average revenue of $13/yr - up from $7.30/yr, though consider this was 2020 when more people were at-home watching TV instead of going outside, meeting people, touching grass, the usual.
> A range of advertising opportunities drive revenue, including revenue sharing with programmers and distribution partners as well as activations on the device home screen. In the fourth quarter of 2020, the company said average revenue per user on SmartCast was $12.99, up from $7.31 in the same period of 2019.
-------------
If you'll allow me to make an arbitrary assumption that a new TV set bought today will last about 10 years, then $13/yr means the advertising revenue implies Vizio has reduced the sale-price of their TVs by $130 compared to before we had no-opt-out advertising displayed on our own property as a condition for the privilege of using said device.
That sounded high to me as well(probably because I rarely eat eggs), but then I remembered my parents who each eat two per day which isn't that uncommon I guess.
Maybe if you include all the eggs in processed food like cookies or cakes and in restaurants or other catering operations you reach that number? And eggs consumed at home could still be around 12 per person?
That site seems to date from the days before there were real usage limits on Claude Code. Note that none of the submissions are recent. As such, I think it's basically irrelevant - the general observation is that Claude Code will rate limit you long, long before you can pull off the usage depicted so it's unlikely you can be massively net-profit-negative on Claude Code.
I don't see how server-side-only anticheat could prevent cheats that simulate perfect input i.e., aimbots on known targets. Yes, you could attempt to heuristically identify cheat-y looking patterns of input, but I suspect that's much much easier said than done for anything other than very simple aimbots.
You don't need to split into multiple files to make large documents manageable, multiple pages works just fine (pages you're not using aren't loaded). But even still, I have absolutely massive pages with ~100 screens on them that work just fine on this base-tier M2 MBA.
Honestly given the complexity of the screens involved I feel Figma's performance is pretty reasonable. (Now, library publish and update - that's still unreasonably slow IMO)
Honestly, this is huge for people like me who tend to over-research and over-think the hell out of product choices. "Find me a top-fill warm-mist humidifier that looks nice, is competitively priced against similar products, and is available from a retailer in $city_name. Now watch for it to go on sale and lmk."
If they can figure out how to get the right kickbacks/referrals without compromising user trust and really nail the search and aggregation of data this could be a real money-maker.
Why would I want to spend 1-2h researching humidifiers if I can spend that time in any other way, and still end up with a humidifier that fits my needs first try?
This kind of task is perfect for AI in a way that doesn't take away too much from the human experience. I'll keep my art, but shopping can die off.
Because you end up with a $1 value piece of crap that someone spent considerable time optimizing and faking reviews for LLMs instead of on the product. Basically in the medium term this strategy will get you temu stuff
How often do you buy the first result on an Amazon search? Because that's delegating your labour, isn't it? Surely the best products are getting to the top, right? Well no, they're being paid to get to the top. An LLM that has in-app shopping is gonna be the same thing
> This kind of task is perfect for AI in a way that doesn't take away too much from the human experience.
Not the current form of AI. I regularly use Project Farm to find the best "insert tool". In an ideal world a robot runs all of these tests in perpetuity covering every physical appliance possible (with every variation, etc.). However, current AI cannot do this. Obviously LLMs can't do this because they don't operate in the physical world.
Well, you can always do the same thing that an LLM would: open SEO spam ranking sites "best humidifiers 2025", filled with referral links to Amazon or other sellers, which basically copy product descriptions and assign rankings that aren't based on any tests or real data.
for the same problems with amazon, youre relying on a computer to tell you what to buy, which is very shortly going to be infested with promoted products and adverts instead of genuine advice. The AI implementers will poison the responses in the name of advertising, of this i have zero doubt in my mind.
10+ years ago you could in fact just pick the best-reviewed product on Amazon at a certain price point and have a great experience! God help you if you tried that today.
For this to be useful they need up to date information, so it just Googles shit and reads Reddit comments. I just don't see how that is likely to be any better than Googling shit and reading Reddit comments yourself.
If they had some direct feed of quality product information it could be interesting. But who would trust that to be impartial?
Do you buy the first item that pops up on Amazon for a search that you've made? Because that's letting the robot do it for you.
If the answer is "no because that's an ad", well, how do you know that the output from ChatGPT isn't all just products that have bought their rank in the results?
Project Farm solves the trust problem with methodology + video documentation and the monetization problem with affiliate links for every product tested.
> If they can figure out how to get the right kickbacks/referrals without compromising user trust
This is a complete contradiction. Once there's money involved in the recommendation you can no longer trust the recommendation. At a minimum any kind of referral means that there's strong incentive to get you to buy something instead of telling you "there are no good options that meet your criteria". But the logical next step for this kind of system is companies paying money to tilt the recommendation in their favour. Would OpenAI leave that money on the table? I can't imagine they would.
> If they can figure out how to get the right kickbacks/referrals without compromising user trust
i'm trying to envision a situation in which the former doesn't cancel out the latter but i'm having a pretty hard time doing that. it seems inevitable that these LLM services will just become another way to deliver advertised content to users.
> If they can figure out how to get the right kickbacks/referrals without compromising user trust and really nail the search and aggregation of data this could be a real money-maker.
As another commenter points out, "not compromising user trust" seems at odds with "money-maker" in the long-term. Surely Google and other large tech companies have demonstrated that to you at this point? I don't understand why so many people think OpenAI or any of them will be any different?
> Run FLUX.2 [dev] on GeForce RTX GPUs for local experimentation with an optimized fp8 reference implementation of FLUX.2 [dev], created in collaboration with NVIDIA and ComfyUI.
Glad to see that they're sticking with open weights.
That said, Flux 1.x was 12B params, right? So this is about 3x as large plus a 24B text encoder (unless I'm misunderstanding), so it might be a significant challenge for local use. I'll be looking forward to the distill version.
Looking at the file sizes on the open weights version (https://huggingface.co/black-forest-labs/FLUX.2-dev/tree/mai...), the 24B text encoder is 48GB, the generation model itself is 64GB, which roughly tracks with it being the 32B parameters mentioned.
Downloading over 100GB of model weights is a tough sell for the local-only hobbyists.
100 GB is less than a game download, it's actually running it that's a tough sell. That said, the linked blog post seems to say the optimized model is both smaller and greatly improved the streaming approach from system RAM, so maybe it is actually reasonably usable on a single 4090/5090 type setup (I'm not at home to test).
(Fellow Strix Halo owner): I don't really like calling it VRAM any more than when a dGPU dynamically maps a portion of system RAM. It's really just a system with quad channel RAM speeds attached to a GPU without VRAM - nearly 2x identical in performance to using the system RAM on my 2 channel desktop instead of actual VRAM on the dGPU in the system (which is something like 20x).
That's great, and I love the little laptop for the amount of x86 perf it can pack into so little cooling, but my used Epyc box of ~the same price is usually faster for AI (despite the complete lack of video card) and able to load models 3x the size (well, before RAM prices doubled this last month) because it has modular 12 channel RAM and memory speeds this low don't really need a GPU to keep up with the matrix math. Meanwhile, Flux is already slow when it's on actual real high bandwidth dedicated GPU memory VRAM.
As far as I know, no open-weights image gen tech supports multi-GPU workflows except in the trivial sense that you can generate two images in parallel. The model either fits into the VRAM of a single card or it doesn’t. A 5ish-bit quantization of a 32Gw model would be usable by owners of 24GB cards, and very likely someone will create one.
> Even a 5090 can handle that. You have to use multiple GPUs.
It takes about 40GB with the fp8 version fully loaded, but ComfyUI can (at reduced speed), with enough system RAM available, partially load models in VRAM during inference and swap at need (the NVidia page linked in the BFL announcement specifically highlights NVidia working with ComfyUI to improve this existing capacity specifically to enable Flux.2) to run on systems with too little VRAM to fully load the model.
reply