Create an image that displays two seven-pointed stars, two eight-pointed stars, and two nine-pointed stars. All stars are connected to each other, except for the ones with the same number of strands. The lines connecting the stars must NOT intersect.
I think it's not at all a marshmellow test; quite the opposite - docs used to be written way, way in advance of their consumption. The problem that implies is twofold. Firstly, and less significantly, it's just not a great return on investment to spend tons of effort now to maybe help slightly in the far future.
But the real problem with docs is that for MOST usecases, the audience and context of the readers matter HUGELY. Most docs are bad because we can't predict those. People waste ridiculous amounts of time writing docs that nobody reads or nobody needs based on hypotheses about the future that turn out to be false.
And _that_ is completely different when you're writing context-window documents. These aren't really documents describing any codebase or context within which the codebase exists in some timeless fashion, they're better understood as part of a _current_ plan for action on a acute, real concern. They're battle-tested the way docs only rarely are. And as a bonus, sure, they're retainable and might help for the next problem too, but that's not why they work; they work because they're useful in an almost testable way right away.
The exceptions to this pattern kind of prove the rule - people for years have done better at documenting isolatable dependencies, i.e. libraries - precisely because those happen to sit at boundaries where it's both easier to make decent predictions about future usage, and often also because those docs might have far larger readership, so it's more worth it to take the risk of having an incorrect hypothesis about the future wasting effort - the cost/benefit is skewed towards the benefit by sheer numbers and the kind of code it is.
Having said that, the dust hasn't settled on the best way to distill context like this. It's be a mistake to overanalyze the current situation and conclude that documentation is certain to be the long-term answer - it's definitely helpful now, but it's certainly conceivable that more automated and structured representations might emerge, or in forms better suited for machine consumption that look a little more alien to us than conventional docs.
It's not trash - it's quite nice for its niche. It's just not very scalable with cores, so it's best interpreted as a benchmark of lightly threaded workloads - like lots of typical consumer workloads are (gaming, web browsing, light office work). Then again, it's not hard to find workloads that scale much better, and geekbench 6 doesn't really have a benchmark for those.
For the first 8 threads or so, it's fine. Once you hit 20 or so it's questionable, or at least that's my impression.
I get how even for multithreaded workloads, having a few fast cores is often better than the equivalent many slow cores. Or NUMA. There can be value in a test like 8 threads full load regardless of how many cores there are. But Geekbench 6 isn't that either, at least according to the chart showing sharply diminishing returns after 2 cores.
Yep. Still, I think it's a pretty decent benchmark in the sense that it's fairly short, quite repeatable, does have a quite a few subtest, and it's horribly different from the nebulous concept that is "typical workloads". It's suspiciously memory-latency bound, perhaps more than most workloads, but that's a quibble. If they'd have simply labelled it "lightly threaded" instead of "multithreaded", it would have been fine.
As it is, it's just clearly misleading to people that haven't somehow figured out that it's not really a great test of multithreaded throughput.
I mean, reliably tracking ownership and therefore knowing that e.g. an aliased write must complete before a read is surely helpful?
It won't prevent all races, but it might help avoid mistakes in a few of em. And concurrency is such a pain; any such machine-checked guarantees are probably nice to have to those dealing with em - caveat being that I'm not such a person.
Put it this way: if I were in charge of a major OS, and I having one of the major app frameworks used on my OS tested on for my annual upgrade, I'd feel pretty embarrassed, even if there's a figleaf excuse why it's not my fault.
This doesn't exactly instill confidence in Apple's competence.
Apple doesn't care, they know their users will eat anything they throw at them.
Electron used non-public pieces to workaround an issue with Apple code which Apple knew about and was not interested to address, now it's broken after it changed. Nothing new.
if I were in charge of a major OS, and I having one of the major app frameworks used on my OS tested on for my annual upgrade
Each program contains its own version of Electron. How is Apple going to know the version of Electron your particular version of your particular app that you installed on one particular date perhaps years in the past works?
n apps in the world × n versions of Electron
The problem isn't Apple. It's that the developer of your program is using an outdated version of Electron, or it's put out an update and you haven't updated.
They only needed to test with the latest Electron at the time of release (or indeed, any chance version - they're all affected - but latest is a reasonable baseline). If they had, they would've seen this.
There are patches out now, but only after Apple released the OS to the entire team world and people reported the issue to the Electron team.
Imo, Electron is sufficiently popular that somebody should test at least one Electron app on a major new OS version sometime before releasing it as done! Any app would've worked, and there's plenty of popular ones, as this post shows.
There's just no way for Apple to maintain and run comprehensive test suites for all the different software platforms out there, even "popular" ones.
That's why they release betas early -- that gives each project an opportunity to run their own test suites, however comprehensive they may be.
It's a little hard to hold Apple responsible when there are a lot of app teams in a better position to catch this than Apple, and apparently none of them did.
(Maybe it was a late change in Tahoe? Still, no one found it in the RC either it seems.)
No, this is possible, but again, just drain the bathwater of "all, everything, comprehensive 100%" while blocking the baby of glaringly obvious system-wide visible bugs any good 9x% testing would've caught.
There's also the alternative of announcing this breakage publicly to electron beforehand; and the alternative of having a hack and publicly announcing it will be removed in a year. There's even the alternative of just announcing the caveat at all, so your users aren't unwitting guinea pigs. If they don't want to support a million workarounds forever, they don't have to it's not all or nothing.
So on the one hand we have a product which isn't even remotely designed for the use case (hamsters), and during normal use shows obvious behaviour (cooking) that should imply risk to said hamsters. On the other side, we have a product designed to be installed in an electrical system, and shows no signs during normal use that it's installed unsafely, and where the advertised specs are not actually safe for normal usage.
Whether or not the company in this case shares some or most of the blame with novice users - the analogy is not a great one.
The author's examples of rough edges are however no better when hosted on vercel. The architecture seems... overly clever, leading to all kinds of issues.
I'm sure commercial incentives would lead issues that affect paying (hosted) customers to have better resolutions than those self-hosting, but that's not enough to explain this level of pain, especially not in issues that would affect paying customers just as much.
I get the feeling that the real problem here are the IEEE specs themselves. They include a huge bunch of restrictions that each individually aren't relevant to something like 99.9% of floating point code, and probably even in aggregate not a single one is relevant to a large majority of code segments out in the wild. That doesn't mean they're not important - but some of these features should have been locally opt-in, not opt out. And at the very least, standards need to evolve to support hardware realities of today.
Not being able to auto-vectorize seems like a pretty critical bug given hardware trends that have been going on for decades now; on the other hand sacrificing platform-independent determinism isn't a trivial cost to pay either.
I'm not familiar with the details of OpenCL and CUDA on this front - do they have some way to guarrantee a specific order-of-operations such that code always has a predictable result on all platforms and nevertheless parallelizes well on a GPU?
Not being able to auto-vectorize is not the fault of the IEEE standard, but the fault of those programming languages which do not have ways to express that the order of some operations is irrelevant, so they may be executed concurrently.
Most popular programming languages have the defect that they impose a sequential semantics even where it is not needed. There have been programming languages without this defect, e.g. Occam, but they have not become widespread.
Because nowadays only a relatively small number of users care about computational applications, this defect has not been corrected in any mainline programming language, though for some programming languages there are extensions that can achieve this effect, e.g. OpenMP for C/C++ and Fortran. CUDA is similar to OpenMP, even if it has a very different syntax.
The IEEE standard for floating-point arithmetic has been one of the most useful standards in all history. The reason is that both hardware designers and naive programmers have always had the incentive to cheat in order to obtain better results in speed benchmarks, i.e. to introduce errors in the results with the hope that this will not matter for users, which will be more impressed by the great benchmark results.
There are always users who need correct results more than anything else and it can be even a matter of life and death. For the very limited in scope uses where correctness does not matter, i.e. mainly graphics and ML/AI, it is better to use dedicated accelerators, GPUs and NPUs, which are designed by prioritizing speed over correctness. For general-purpose CPUs, being not fully-compliant with the IEEE standard is a serious mistake, because in most cases the consequences of such a choice are impossible to predict, especially not by the people without experience in floating-point computation who are the most likely to attempt to bypass the standard.
Regarding CUDA, OpenMP and the like, by definition if some operations are parallelizable, then the order of their execution does not matter. If the order matters, then it is impossible to provide guarantees about the results, on any platform. If the order matters, it is the responsibility of the programmer to enforce it, by synchronization of the parallel threads, wherever necessary.
Whoever wants vectorized code should never rely on programming languages like C/C++ and the like, but they should always use one of the programming language extensions that have been developed for this purpose, e.g. OpenMP, CUDA, OpenCL, where vectorization is not left to chance.
If you care about absolute accuracy, I'm skeptical you want floats at all. I'm sure it depends on the use case.
Whether it's the standards fault or the languages fault for following the standard in terms of preventing auto-vectorization is splitting hairs; the whole point of the standard is to have predictable and usually fairly low-error ways of performing these operations, which only works when the order of operations is defined. That very aim is the problem; to the extent the stardard is harmless when ordering guarrantees don't exist you're essentially applying some of those tricky -ffast-math suboptimizations.
But to be clear in any case: there are obviously cases whereby order-of-operations is relevant enough and accuracy altering reorderings are not valid. It's just that those are rare enough that for many of these features I'd much prefer that to be the opt-in behavior, not opt-out. There's absolutely nothing wrong with having a classic IEEE 754 mode and I expect it's an essentialy feature in some niche corner cases.
However, given the obviously huge application of massively parallel processors and algorithms that accept rounding errors (or sometimes conversely overly precise results!), clearly most software is willing to generally accept rounding errors to be able to run efficiently on modern chips. It just so happens that none of the computer languages that rely on mapping floats to IEEE 754 floats in a straitforward fashion are any good at that, which is seems like its a bad trade off.
There could be multiple types of floats instead; or code-local flags that delineate special sections that need precise ordering; or perhaps even expressions that clarify how much error the user is willing to accept and then just let the compiler do some but not all transformations; and perhaps even other solutions.
> Most popular programming languages have the defect that they impose a sequential semantics even where it is not needed. There have been programming languages without this defect, e.g. Occam, but they have not become widespread.
We have memory ordering functions to let compilers know the atomic operation preference of the programmer… couldn’t we do the same for maths and in general a set of expressions?
An example of programming language syntax that avoids to specify sequential execution where not needed is to specify that a sequence of expressions separated by semicolons must be executed sequentially, but a sequence of expressions separated by commas may be executed in any order or concurrently.
This is just a minor change from the syntax of the most popular programming languages, because they typically already specify that the order of evaluation of the expressions used for the arguments of a function, which are separated by commas, can be arbitrary.
Early in its history, the C language has been close to specifying this behavior for its comma operator, but unfortunately its designers have changed their mind and they have made the comma operator behave like a semicolon, in order to be able to use it inside for statement headers, where the semicolons have a different meaning. A much better solution for C, instead of making both comma and semicolon to have the same behavior, would have been to allow a block to appear in any place where an expression is expected, giving it the value of the last expression evaluated in the block.
The precise requirements of IEEE-754 may not be important for any given program, but as long as you want your numbers to have any form of well-defined semantics beyond "numbers exist, and here's a list of functions that do Something™ that may or may not be related to their name", any number format that's capable of (approximately) storing both 10^20 and 10^-20 in 64 bits is gonna have those drawbacks.
AFAIK GPU code is basically always written as scalar code acting on each "thing" separately, that's, as a whole, semantically looped over by the hardware, same way as multithreading would (i.e. no order guaranteed at all), so you physically cannot write code that'd need operation reordering to vectorize. You just can't write an equivalent to "for (each element in list) accumulator += element;" (or, well, you can, by writing that and running just one thread of it, but that's gonna be slower than even the non-vectorized CPU equivalent (assuming the driver respects IEEE-754)).
A CUDA "kernel" is the same thing as what has been called "parallel DO" or "parallel FOR" since 1963, or perhaps even earlier.
This is slightly obfuscated by not using a keyword like "for" or "do", by the fact that the body of the loop (the "kernel") is written in one place and and the header of the loop (which gives the ranges for the loop indices) is written in another place, and by the fact that the loop indices have standard names.
A "parallel for" may have as well a syntax identical with a sequential "for". The difference is that for the "parallel for" the compiler knows that the iterations are independent, so they may be scheduled to be executed concurrently.
NVIDIA has been always greatly annoying by inventing a huge amount of new terms that are just new words for old terms that have been used for decades in the computing literature, with no apparent purpose except of obfuscating how their GPUs really work. Worse, AMD has imitated NVIDIA, by inventing their own terms that correspond to those used by NVIDIA, but they are once again different.
The spec doesn’t prevent auto-vectorization, it only says the language should avoid it when it wants to opt in to producing “reproducible floating-point results” (section 11 of IEEE 754-2019). Vectorizing can be implemented in different ways, so whether a language avoids vectorizing in order to opt in to reproducible results is implementation dependent. It also depends on whether there is an option to not vectorize. If a language only had auto-vectorization, and the vectorization result was deterministic and reproducible, and if the language offered no serial mode, this could adhere to the IEEE spec. But since C++ (for example) offers serial reductions in debug & non-optimized code, and it wants to offer reproducible results, then it has to be careful about vectorizing without the user’s explicit consent.
If you write a loop `for x in array { sum += x }` Then your program is a specification that you want to add the elements in exactly that order, one by one. Vectorization would change the order.
The bigger problem there is the language not offering a way to signal the author’s intent. If an author doesn’t care about the order of operations in a sum, they will still write the exact same code as the author who does care. This is a failure of the language to be expressive enough, and doesn’t reflect on the IEEE spec. (The spec even does suggest that languages should offer and define these sorts of semantics.) Whether the program is specifying an order of operations is lost when the language offers no way for a coder to distinguish between caring about order and not caring. This is especially difficult since the vast majority of people don’t care and don’t consider their own code to be a specification on order of operations. Worse, most people would even be surprised and/or annoyed if the compiler didn’t do certain simplifications and constant folding, which change the results. The few cases where people do care about order can be extremely important, but they are rare nonetheless.
They are, just check anything fixed-point for the 486SX vs anything floating under a 486DX. It's faster scaling and sum and print the desired precision than operating on floats.
I wonder... couldn't there just be some library type for this, e.g. `associative::float` and `associative::doube` and such (in C++ terms), so that compilers can ignore non-associativity for actions on values of these types? Or attributes one can place on variables to force assumption of associativity?
While it technically correct to say this it also gets the wrong point across because it leaves out the fact that ordering changes create only a small difference. Other examples where arithmetic is not commutative, e.g.
matrix multiplication , can create much larger differences.
Floating-point arithmetic is non-associative, but it is commutative for the operations that are algebraically commutative: x + y == y + x and x*y == y*x. And x - y = -(y - x) so subtraction is properly anti-commutative.
The only very marginal exception to this is that when both arguments are NaN, the return value will be NaN, but which NaN payload is returned can depend on argument order. But no one ever uses this because it's not specified, so it can't be used reliably for anything useful. The behavior I wish IEEE 754 had specified for this is to define a standard NaN value (or two), and when the return value of an op is NaN, and some of the arguments are non-standard NaNs, then one of those non-standard NaN values must be returned. This doesn't depend on argument order and allows NaN payloads to be reliably propagated, which would let you encode useful debugging information in NaN payloads and know that it will flow through the program.
For mathematical use, NaN payloads shouldn’t matter, and behave identically (aside from quiet vs. signaling NaNs). It also doesn’t matter for equality comparison, because NaNs always compare unequal.
from the user perspective it's not too bad, but from the compiler perspective it is. The result of this is that LLVM has decided that trying to figure out which nan you got (e.g. by casting to an Int and comparing) is UB, which means pretty much every floating point operation becomes non-deterministic.
This also adds extra complexity to the CPU. you need special hardware for == rather than just using the perfectly good integer unit, and every fpu operation needs to devote a bunch of transistors to handling this nonsense that buys the user absolutely nothing.
there are definitely things to criticize about the design of Posits, but the thing they 100% get right is having a single NaN and sane ordering semantics
> I get the feeling that the real problem here are the IEEE specs themselves.
Well, all standards are bad when you really get into them, sure.
But no, the problem here is that floating point code is often sensitive to precision errors. Relying on rigorous adherence to a specification doesn't fix precision errors, but it does guarantee that software behavior in the face of them is deterministic. Which 90%+ of the time is enough to let you ignore the problem as a "tuning" thing.
But no, precision errors are bugs. And the proper treatment for bugs is to fix the bugs and not ignore them via tricks with determinism. But that's hard, as it often involves design decisions and complicated math (consider gimbal lock: "fixing" that requires understanding quaternions or some other orthogonal orientation space, and that's hard!).
So we just deal with it. But IMHO --ffast-math is more good than bad, and projects should absolutely enable it, because the "problems" it discovers are bugs you want to fix anyway.
> (consider gimbal lock: "fixing" that requires understanding quaternions or some other orthogonal orientation space, and that's hard!)
Or just avoiding gimbal lock by other means. We went to the moon using Euler angles, but I don't suppose there's much of a choice when you're using real mechanical gimbals.
That is the "tuning" solution. And mostly it works by limiting scope of execution ("just don't do that") and if that doesn't work by having some kind of recovery method ("push this button to reset", probably along with "use this backup to recalibrate"). And it... works. But the bug is still a bug. In software we prefer more robust techniques.
FWIW, my memory is that this was exactly what happened with Apollo 13. It lost its gyro calibration after the accident (it did the thing that was the "just don't do that") and they had to do a bunch of iterative contortions to recover it from things like the sun position (because they couldn't see stars out the iced-over windows).
NASA would have strongly preferred IEEE doubles and quaternions, in hindsight.
While I'm most familiar with C#, and haven't used Ruby professionally for almost a decade now, I think we'd be better off looking at typescript, for at least 3 reasons, probably more.
1. Flowsensitivity: It's a sure thing that in a dynamic language people use coding conventions that fit naturally to the runtime-checked nature of those types. That makes flow-sensitive typing really important.
2. Duck typing: dynamic languages and certainly ruby codebases I knew often use ducktyping. That works really well in something like typescript, including really simple features such as type-intersections and unions, but those features aren't present in C#.
3. Proof by survival: typescript is empirically a huge success. They're doing something right when it comes to retrospectively bolting on static types in a dynamic language. Almost certainly there are more things than I can think of off the top of my head.
Even though I prefer C# to typescript or ruby _personally_ for most tasks, I don't think it's perfect, nor is it likely a good crib-sheet for historically dynamic languages looking to add a bit of static typing - at least, IMHO.
Bit of a tangent, but there was a talk by anders hejlsberg as to why they're porting the TS compiler to Go (and implicitly not C#) - https://www.youtube.com/watch?v=10qowKUW82U - I think it's worth recognizing the kind of stuff that goes into these choices that's inevitably not obvious at first glance. It's not about the "best" lanugage in a vacuum, it's a about the best tool for _your_ job and _your_ team.
wow i certainly appreciate your perspective and insight as a regular C# developer! My experience was limited to building a unity project for 6 years and learning the differences from Ruby.
Another commenter suggested another language like crystal, and that might actually be what it really needs, a ruby-like alternative.
I love building libraries, so having the chance to talk about the gotchas with things like this is a fun chance to reflect on what is and is not possible with the tools we have. I guess my favorite "feature" in C# is how willing they are to improve; and that many of the improvements really matter, especially when accumulated over the years. A C# 13 codebase can be so much nicer than a c# 3 codebase... and faster and more portable too. But nothing's perfect!
I think it's pretty usuable now, but there is scarring. The solution would have been much nicer had it been around from day one; especially surrounding generics and constraints.
It's not _entirely_ sound, nor can it warn about most mistakes when those are in the "here-be-dragons" annotations in generic code.
The flow sensitive bit is quite nice, but not as powerful as in e.g. typescript, and sometimes the differences hurt.
It's got weird gotcha interactions with value-types, for instance but likely not limited to interaction with generics that aren't constrained to struct but _do_ allow nullable usage for ref types.
Support in reflection is present, but it's not a "real" type, and so everything works differently, and hence you'll see that code leveraging reflection that needs to deal with this kind of stuff tends to have special considerations for ref type vs. value-type nullabilty, and it often leaks out into API consumers too - not sure if that's just a practical limitation or a fundamental one, but it's very common anyhow.
There wasn't last I looked code that allowed runtime checking for incorrect nulls in non-nullable marked fields, which is particularly annoying if there's even an iota of not-yet annoted or incorrectly annotated code, including e.g. stuff like deserialization.
Related features like TS Partial<> are missing, and that means that expressing concepts like POCOs that are in the process of being initialized but aren't yet is a real pain; most code that does that in the wild is not typesafe.
Still, if you engage constructively and are willing to massage your patterns and habbits you can surely get like 99% type-checkable code, and that's still a really good help.
> Related features like TS Partial<> are missing, and that means that expressing concepts like POCOs that are in the process of being initialized but aren't yet is a real pain; most code that does that in the wild is not typesafe.
If it's an object, it's as simple as having a static method on a type, like FromA(A value) and then have that static method call the constructor internally after it has assembled the needed state. That's how you'd do it in Rust anyway. There will be a warning (or an error if you elevate those) if a constructor exits not having initialized all fields or properties. Without constructor, you can mark properties as 'required' to prohibit object construction without assignment to them with object initializer syntax too.
Yeah, before required properties/fields, C#'s nullability story was quite weak, it's a pretty critical part of making the annotations cover enough of a codebase to really matter. (technically constructors could have done what required does, but that implies _tons_ of duplication and boilerplate if you have a non-trivial amount of such classes, records, structs and properties/fields within them; not really viable).
Typescript's partial can however do more than that - required means you can practically express a type that cannot be instantiated partially (without absurd amounts of boilerplate anyhow), but if you do, you can't _also_ express that same type but partially initialized. There are lots of really boring everyday cases where partial initialization is very practical. Any code that collects various bits of required input but has the ability to set aside and express the intermediate state of that collection of data while it's being collected or in the event that you fail to complete wants something like partial.
E.g. if you're using the most common C# web platform, asp.net core, to map inputs into a typed object, you now are forced to either expression semantically required but not type-system required via some other path. Or, if you use C# required, you must choose between unsafe code that nevertheless allows access to objects that never had those properties initialized, or safe code but then you can't access any of the rest of the input either, which is annoying for error handling.
typescript's type system could on the other hand express the notion that all or even just some of those properties are missing; it's even pretty easy to express the notion of a mapped type wherein all of the _values_ are replaces by strings - or, say, by a result type. And flow-sensitive type analysis means that sometimes you don't even need any kind of extra type checks to "convert" from such a partial type into the fully initialized flavor; that's implicitly deduced simply because once all properties are statically known to be non-null, well, at that point in the code the object _is_ of the fully initialized type.
So yeah, C#'s nullability story is pretty decent really, but that doesn't mean it's perfect either. I think it's important to mention stuff like Partial because sometimes features like this are looked at without considering the context. Most of these features sound neat in isolation, but are also quite useless in isolation. The real value is in how it allows you to express and change programs whilst simultaneously avoiding programmer error. Having a bit of unsafe code here and there isn't the end of the world, nor is a bit of boilerplate. But if your language requires tons of it all over the place, well, then you're more likely to make stupid mistakes and less likely to have the compiler catch them. So how we deal with the intentional inflexibility of non-nullable reference types matters, at least, IMHO.
Also, this isn't intended to imply that typescript is "better". That has even more common holes that are also unfixable given where it came from and the essential nature of so much interop with type-unsafe JS, and a bunch of other challenges. But in order to mitigate those challenges TS implemented various features, and then we're able to talk about what those feature bring to the table and conversely how their absence affects other languages. Nor is "MOAR FEATURE" a free lunch; I'm sure anybody that's played with almost any language with heavy generics has experienced how complicated it can get. IIRC didn't somebody implement DOOM in the TS type system? I mean, when your error messages are literally demonic, understanding the code may take a while ;-).
Of course they had a choice: they could have stuck with google maps for longer, and they probably also could have invested more in data and UI beforehand. They could have launched a submarine non-apple-branded product to test the waters. They could likely have done other things we haven't thought of here, in this thread.
Quite plausibly they just didn't realize how rocky the start would be, or perhaps they valued that immediate strategic autonomy more in the short-term that we think, and willingly chose to take the hit to their reputation rather than wait.