I really doubt it's that, as opposed to the maintenance cost of an extra flow to a boarding pass. Or perhaps just a perceived complexity/annoyance cost when something breaks in the desktop flow here and there.
I'd think it's only maybe 5-10% of customers at most who both use desktop over mobile to get their boarding pass and use an ad-blocker on desktop. And honestly I don't remember ever seeing an ad (even on Ryanair) when getting my boarding pass on mobile. OTOH I distinctly remember seeing many giant ads on printed boarding passes, most often on printed boarding passes brandished by other customers (usually printed in full color!). I'd think that's hugely more valuable as advertising real estate than the iota of additional data they get to collect on a few adblock users who have been forced to use mobile.
"99.9% of our passengers don't break the rules, they don't get penalised. The 0.1% of the guys who delay the boarding process, the guys who are there delaying the departure of the aircraft because their bag doesn't fit in the overhead (cabin), they are going to pay and we're going to eliminate them."
Somehow I doubt the compliance rate is anywhere near 99.9% -- that's roughly 1 passenger breaking the rules every 5 flights? Would they really be investing so much (including having the CEO spend time on air to rant) in catching the delinquents if that's the scale of the problem?
As someone who mostly follows the rules, the thing that really bothers me about Ryanair and the like are, I'll get to the airport at least 2 hours early as I'm supposed to, and then there'll be a massive hour+ long check-in queue (which I have to wait in just to show my passport, even if I'm not checking bags, since online check in never seems to work when I need to enter passport info), and all the while they'll have staff shouting "Anyone going to <destination of flight that closes boarding in 20-30 minutes>?" and shepherding those passengers to the front of the queue. It irritates me to no end -- why the hell should I bother arriving early if I'm just going to be punished with a longer wait for it?
> I'm supposed to, and then there'll be a massive hour+ long check-in queue (which I have to wait in just to show my passport, even if I'm not checking bags, since online check in never seems to work when I need to enter passport info
I’ve flown predominantly with Ryanair 4/5 times a year for over a decade and I have not once had to do this. Anecdotally I e always found their bag drop queues way quicker than other airlines too, even when it’s busy.
I guess it may be because I've only had a US passport but have primarily taken Ryanair to fly between the UK (where I live) and elsewhere in Europe, which is potentially an edge case they don't care to handle well. I recently gained UK citizenship / a passport so perhaps this will get easier for me going forward.
Either way, I stand by my main complaint about Ryanair letting late arrivers jump the queue even if I only have to "suffer" from it when I'm checking bags.
Every time I've been on a ryanair flight where they checked luggage sizes, it is always the case that 5-10 people have to pay. That is nowhere near one in a thousand passengers.
> But obviously in hindsight, it was the most expensive mistake I've made in my life.
Maybe a tiresome pedantic response, but this is only a "mistake" to the same extent that it was a mistake for [everyone in the world who had the required funds] not to buy 100 BTC in 2015. If you can overcome the cognitive biases (endowment effect, loss aversion, etc.), the fact that you previously owned the 100 BTC has no real bearing on the situation, beyond transaction fees and a couple hours max saved by holding (doing nothing) vs. buying.
It was also a mistake for me to not buy a lottery ticket with the numbers 4 11 17 25 41 51.
There's a slow drip feed of stories of people who did buy BTC and have lost it, either to human error, lack of backups, crime, or failure of institutions. Remember MtGox collapse in 2014? The whole fad was obviously over by then, wasn't it?
We're going to have another 2008-style bubble collapse eventually. But it's not clear what the radius of effect will be.
As someone who used to write academic ML papers, it's funny to me that people are treating this academic style paper written by a few Apple researchers as Apple's official company-wide stance, especially given the first author was an intern.
I suppose it's "fair" since it's published on the Apple website with the authors' Apple affiliations, but historically speaking, at least in ML where publication is relatively fast-paced and low-overhead, academic papers by small teams of individual researchers have in no way reflected the opinions of e.g. the executives of a large company. I would not be particularly surprised to see another team of Apple researchers publishing a paper in the coming weeks with the opposite take, for example.
That's kind of expected for a research intern -- internships are most commonly done within 1-2 years before graduation. But in any case, the fact that the first author is an intern is just the cherry on top for me -- my comment would be the same modulo the "especially" remark if all the authors were full time research staff.
"There's nothing to prevent any state from funding the universities in their states."
I would've thought one major issue is that a much larger chunk of tax revenue is collected by the IRS than by any state. From googling, CA has the highest state income tax rate but still collects <5% of US federal tax revenue, while having >10% of the population. ~2.5x'ing state taxes to attain similar per-capita revenue would probably lead to a fair number of people leaving the state, or at least get the party who passed that tax hike (presumably Democrats) voted out in the next state election.
OTOH the NSF annual budget is $10B/year, in theory "easily" fundable by CA alone with its $220B/year in tax revenue, in the worst case with a 5% tax increase. The NSF isn't the only federal agency that funds research (seems to provide around 25% of federal research funding) but it is probably enough for one state, even the most productive one. So maybe it really is doable.
Or just have the last data point include everything from the full 12 month period before (as the title "year on year" would suggest) and maybe even put it in the correct place on the x-axis (e.g. for today, Feb 12, 2025, about 11.8% of the full year gap width from the (Dec 31) 2024 point).
IME Netflix is a close 2nd best after Apple, which I don't think I can distinguish from a 4K BluRay. I've found that the quality depends on the platform a little -- for Netflix the native LG app seems to look best on my LG TV, while Apple looks best on the Apple TV app (perhaps unsurprisingly).
Amazon Prime 4K HDR on the other hand looks like garbage on every platform I've used -- the compression is unbearable in any dark scene.
I would put Disney+ after apple. Both AppleTV+ and Disney+ consistently looks great to me. Netflix is strange as it generally looks good but whatever compression they use does something funny to the picture which makes it look fuzzy and sharp at the same time to me.
Netflix: 15-18 Mbps
Disney+: 25-30 Mbps
Amazon Prime Video: 15-18 Mbps
Apple TV+: 25-40 Mbps
HBO Max: 15-20 Mbps
This is from an LLM but it tallies with what I remember reading. Apple TV is by far the best, followed by Disney+.
Netflix unfortunately seem to use any improvement in compression encoding efficiency to reduce bitrates, rather than improve PQ at the same bitrate. It's definitely got worse over time. I also remember reading that for content they deem more compressible they use a lower bitrate.
I can sort of get that on the lower plans, but its frustrating they won't improve PQ (or at least keep it the same) for the (expensive) 4K plan.
I like JAX but I'm not sure how an ML framework debate like "JAX vs PyTorch" is relevant to DeepSeek/PTX. The JAX API is at a similar level of abstraction to PyTorch [0]. Both are Python libraries and sit a few layers of abstraction above PTX/CUDA and their TPU equivalents.
[0] Although PyTorch arguably encompasses 2 levels, with both a pure functional library like the JAX API, as well as a "neural network" framework on top of it. Whereas JAX doesn't have the latter and leaves that to separate libraries like Flax.
Erm, why not? A 0.56 result with n=1000 ratings is statistically significantly better than 0.5 with a p-value of 0.00001864, well beyond any standard statistical significance threshold I've ever heard of. I don't know how many ratings they collected but 1000 doesn't seem crazy at all. Assuming of course that raters are blind to which model is which and the order of the 2 responses is randomized with every rating -- or, is that what you meant by "poorly designed"? If so, where do they indicate they failed to randomize/blind the raters?
> If so, where do they indicate they failed to randomize/blind the raters?
Win rate if user is under time constraint
This is hard to read tbh. Is it STEM? Non-STEM? If it is STEM then this shows there is a bias. If it is Non-STEM then this shows a bias. If it is a mix, well we can't know anything without understanding the split.
Note that Non-STEM is still within error. STEM is less than 2 sigma variance, so our confidence still shouldn't be that high.
Because you're not testing "will a user click the left or right button" (for which asking a thousand users to click a button would be a pretty good estimation), you're testing "which response is preferred".
If 10% of people just click based on how fast the response was because they don't want to read both outputs, your p-value for the latter hypothesis will be atrocious, no matter how large the sample is.
Yes, I am assuming they evaluated the models in good faith, understand how to design a basic user study, and therefore when they ran a study intended to compare the response quality between two different models, they showed the raters both fully-formed responses at the same time, regardless of the actual latency of each model.
I did read that comment. I don't think that person is saying they were part of the study that OpenAI used to evaluate the models. They would probably know if they had gotten paid to evaluate LLM responses.
But I'm glad you pointed that out, I now suspect that is responsible for a large part of the disagreement between "huh? a statistically significant blind evaluation is a statistically significant blind evaluation" vs "oh, this was obviously a terrible study" repliers is due to different interpretations of that post. Thanks. I genuinely didn't consider the alternative interpretation before.
Sure, it could be, you can define "preference" as basically anything, but it just loses its meaning if you do that. I think most people would think "56% prefer this product" means "when well-informed, 56% of users would rather have this product than the other".
I'd think it's only maybe 5-10% of customers at most who both use desktop over mobile to get their boarding pass and use an ad-blocker on desktop. And honestly I don't remember ever seeing an ad (even on Ryanair) when getting my boarding pass on mobile. OTOH I distinctly remember seeing many giant ads on printed boarding passes, most often on printed boarding passes brandished by other customers (usually printed in full color!). I'd think that's hugely more valuable as advertising real estate than the iota of additional data they get to collect on a few adblock users who have been forced to use mobile.