The "in the wild" part of the title is misleading. These chips are being presented in a very controlled environment.
An interesting aspect of Intel's design is they use Ethernet for connectivity. If they can get the performance on par with NVLink, that by itself could be a win because everybody knows how to manage Ethernet. Very few people know how to manage an NVLink network.
To be clear, this is data center hardware. The lower power versions of these cards consume like 600W, and no mention in the article on pricing.
Very few people actually know how to provision and manage a lossless Ethernet fabric and I’d wager someone who had literally never touched infiniband would have an easier time accomplishing it from zero with IB than with Ethernet on real vendor gear.
Ethernet has so, so many gotchas. Maybe if it was a layer 3 only network it would work. Maybe.
Ironically enough, since NVIDIA bought Mellanox, it's likely that the best documented route to get ROCE v2 going is with switches purchased from NVIDIA...
That’s not how NVIDIA’s solutions work; if you connect a DGX system to a NVLINK switch it will talk NVLINK if you connect it to another via Infiniband it will speak Infiniband, if you do PCIe Direct Attach it would talk that too and if you connect it via Ethernet it you guessed it talk Ethernet with RDMA and all the bells and whistles to boot.
In my understanding one of the big advantages of the protocol (v2, that is) is that it is routed over IP and can work with existing switches ($$) instead of needing specialized ones ($$$$)
Many years ago I met Naveen Rao and tried to demo the Nervana-derived Intel card, which at the time Facebook and a couple others were sampling. During more formal talks, Intel sent him literally surrounded by a Xeon sales team that sidetracked the whole meeting.
When these Intel GPUs are “in the wild” it actually means Xeon salespeople are out on the hunt.
> An interesting aspect of Intel's design is they use Ethernet for connectivity.
Interestingly, tenstorrent is doing something similar with their wormhole cards.
I'm not convinced yet that it is the right way to go. If the switching fabric on the card fails, you lose the whole card. Keeping it separated out is a bit less risky, at the cost of some speed.
I'm more partial to composable fabrics, but they aren't ready yet for PCIe5 and we have PCIe6 just around the corner next year.
They are 24x200G, which is already outdated. Everything we are doing is currently 400G (via 8xCX7 cards running in ethernet mode) and 800G at the spine. 800G NICs, which will come with PCIe6 next year and cuts the number of connections down.
What I'd prefer is the connection is through the UBB/OAM baseboard, such that you have PCIe connections. Look into what GigaIO and Liqid are doing. There is a 3rd option that is even cooler than those two, but I don't want to mention it here. ;-)
AMD is doing several billion in ai processor sales already and the new chip is selling as fast as they can make them. At least with AMD, a customer can actually get them now as opposed to the nearly 1 year lead time from nVidia.
This is a big part of why they are generally available. Building on top of AMD gear is currently more difficult than Nvidia. It's unfortunate, but the evolution of/from Cuda when AMD had little in this space gave intel a massive jump-start. As a dev it's way easier to pick up various stacks that work with nvidia gear and not have to worry anywhere near as much about what's underneath.
The CEO of NVIDIA was pretty clear: Nvidia's focus is now on selling Blackwell AI Factories. According to some napkin math, each of these Blackwell AI factories will have an annual power bill of roughly $8,000,000 USD and if you do a little apples to oranges comparison will outperform the computer that's currently #1 on the supercomputing top 500 (Frontier at Oak Ridge) by one to two orders of magnitude.
for comparison, over the past fifteen years, we've seen 3 orders of magnitude improvement in the supercomputing top 500. (from roadrunner in 2008 to frontier in 2024). Nvidia is going to do 1 to 2 orders of magnitude improvement instantaneously.
Nvidia's core market has shifted dramatically. There is no competition.
Not to rain on the parade here but nvidia is not going to haul a 1 to 2 order of magnitude in a single jump. Top500 measures fp64, nvidia likes to quote fp8. Blackwell is actually a regression on fp64 performance to provide more die space for AI compute. This isn't to say the silicon isn't good just different goals being hit.
In the keynote Nvidia is quoting FP4 performance for Blackwell specifically. Frontier and every computer that has ever hit the supercomputer top 500 is measuring fp64. This is why I qualified my statement above by saying if you’re willing to compare apples and oranges.
But I think this is where things open up to debate (and/or personal interpretation). My view is, if all you care about is AI workload (Specifically LLMs), then you really are seeing a full two orders of magnitude improvement. And if we are trying to get a feel for the space, and what that even means, then there really is nothing else to compare Blackwell to other than frontier, in terms of scale alone.
Above I say “one or two orders of magnitude”, the “one order of magnitude” is a value I got by taking Blackwell’s FP4 values and dividing by 16, to try to conceptually convert back to a value that can be compared to fp64.
—
I’m perfectly happy to admit that all I’m really capable of doing here is using estimates to compare a completely new thing (the new compute architecture of Blackwell) to what came before it (the majority of the history of the supercomputing 500). But that’s okay, because that’s my only goal.
—-
Believe me, I get it. Blackwell will never post a linpack score. Blackwell will never run big scientific computing jobs like… global weather simulations.
But as computer science nerds, how could we not be excited to see such a big move: changing the computer architecture to better suit a specific compute workload (LLMs)?
I know Bitcoin has its asic processors, but to me Blackwell’s mission statement of “computing intelligence” is a lot more interesting.
The only meaningful hardware competition, meaning lower prices, will come from Chinese designed, Chinese manufactured parts. This is still a long ways out.
Is it inevitable? I think so. Before 2019 there wasn't an opportunity, now there is.
For software, Chinese universities, Alibaba, Tencent and Bytedace are already releasing models, training code and in rare cases datasets that are competitive with private offerings. CogVLM/CogAgent is one that I use. It's very promising.
I don't think we'll be legally prohibited from buying it but there will be zero English docs (see Allwinner and such). Maybe if you're lucky you'll get an uncommented code dump with a forked years-old version of PyTorch.
Ironically, Transformers are relatively simple architectures - all you really need is a high performance matmul. So OpenCL could "do something" at this point, if it were alive.
From that exchange, clearly not. They do not have their stuff together. This is one area where AMD really should be employing quite a full team to ensure that major projects, including drivers, are able to support AMD cards without any friction. Whilst it's still a PITA to work.with AMD hardware in most AI frameworks Nvidia will continue to dominate.
Its very unfortunate. One of our vendors has spent a hell of an effort making spatial analysis tools work via OpenCL instead of being just processor bound, and its made those industry standard libraries 1-2 orders of magnitude faster. That is something the spatial industry desperately needs in order to improve iteratively.
With the state of OpenCL it's frustrating. So much to be had, but so little improvement and support.
In 50 years, I feel like GPGPU compute will be told as a Greek tragedy. Nvidia, pariah of Apple, would create a parallel compute ecosystem that was so powerful that even the combined effort of the industry couldn't topple it. At every corner where they could have thwarted them, Nvidia's competitors refused to up the ante or chided CUDA's efforts as silly and unnecessary. One crypto/AI craze later, everyone and their mother is berating their OEM for ignoring such basic functionality and refusing to collaborate on a competing alternative.
Frustrating is a good way to put it, but there's a certain causal satisfaction I feel from watching it all unfold. Of course everyone loses to CUDA when they refuse to sponsor an Open Source alternative. It's fascinating to me that hardware manufacturers would rather let CUDA dominate than establish a basic working relationship.
This is something I'd like to offer via my business (Hot Aisle) at some point in the near future. Right now, we are just getting started and focused on MI300x, but the long term goal is to offer any type of high end compute that people are willing to rent.
What USP are you aiming for, to differentiate from the many companies who have tried and failed to offer some form of HPCaaS over the last 10-15 years?
Great question! I'm going to answer it the only way I know how... with a bit of a story of the history of things. Sorry if this bores you.
The problem I realized over a year ago was that nobody had hourly rental access to high end AMD GPUs. In addition, access to high end Nvidia was equally difficult. I signed up for a CoreWeave account, put in my credit card and was told a few weeks later that my account was not approved.
In effect, the only way to get access to super high end compute, was to be involved in HPC and that requires connections. At the time, we also didn't even know if AMD was going to seriously adopt AI as a strategy.
My view was that there were actually two problems, lack of general access and that everyone was putting all their eggs into a single basket. Mostly because of that lack of access, and because AMD was lacking a great developer flywheel story.
I spent August to December building a business plan, closing funding, forming the business, hiring my co-founder full time, securing data center space, securing direct relationships with vendors, and designing the system we were going to deploy. There are a million other little details in there, but this is long enough as it is.
Oct/Nov of last year rolls around and suddenly AMD has changed their tune. Lisa Su doubles down. Dec 6th, MI300x rolls out. We made our first PoC order in January, received it in March. It just goes to show how cutting edge and how long all of this takes. 3 more small (not hyperscaler) businesses sprung up during that time, all offering effectively the same product. We went to the data center, deployed our PoC and about 2 weeks later, we had our first customer onboarded. I call all of that validation, and was able to secure further funding based on it.
To answer your question, I'm not sure that I need a specific USP. The demand for compute isn't going down. If I have a product that people want, and I can offer them ethical, honest, truthful, great service around that product. All based on decades of experience. Can't that be enough? Myself and my investors believe so.
NVDA customers are willing to pay to much for AI chips so they get the money back and more from the pumped up NVDA stock price . This is inflationary and we are still stuck with high interest rates because of those schmucks!
An interesting aspect of Intel's design is they use Ethernet for connectivity. If they can get the performance on par with NVLink, that by itself could be a win because everybody knows how to manage Ethernet. Very few people know how to manage an NVLink network.
To be clear, this is data center hardware. The lower power versions of these cards consume like 600W, and no mention in the article on pricing.