Intel Gaudi 3 the New 128GB HBM2e AI Chip in the Wild

loudmax · on April 22, 2024

The "in the wild" part of the title is misleading. These chips are being presented in a very controlled environment.

An interesting aspect of Intel's design is they use Ethernet for connectivity. If they can get the performance on par with NVLink, that by itself could be a win because everybody knows how to manage Ethernet. Very few people know how to manage an NVLink network.

To be clear, this is data center hardware. The lower power versions of these cards consume like 600W, and no mention in the article on pricing.

foobiekr · on April 22, 2024

Very few people actually know how to provision and manage a lossless Ethernet fabric and I’d wager someone who had literally never touched infiniband would have an easier time accomplishing it from zero with IB than with Ethernet on real vendor gear.

Ethernet has so, so many gotchas. Maybe if it was a layer 3 only network it would work. Maybe.

alfalfasprout · on April 22, 2024

I'm guessing it's RDMA over ethernet too which often has a lot of gotchas depending on the exact hardware being used.

epistasis · on April 22, 2024

Ironically enough, since NVIDIA bought Mellanox, it's likely that the best documented route to get ROCE v2 going is with switches purchased from NVIDIA...

Edit: and yes, it's RDMA over ethernet https://docs.habana.ai/en/latest/Gaudi_Overview/Gaudi_Archit...

dogma1138 · on April 22, 2024

You don’t need to manage NVLink.

NVLink either talks native NVlink to itself when you are using NVlink switches either intra-server or intra-rack or;

It can talk PCIe over NVlink when talking to a PCIe endpoint.

Or you can run Infiniband or Ethernet on top of it and talk to w/e is on the other side.

Gaudi isn’t that different remember Ethernet != TCP/IP.

dboreham · on April 22, 2024

So it works with Ethernet switches?

dogma1138 · on April 22, 2024

What does?

ubercore · on April 23, 2024

NVLink

dogma1138 · on April 23, 2024

That’s not how NVIDIA’s solutions work; if you connect a DGX system to a NVLINK switch it will talk NVLINK if you connect it to another via Infiniband it will speak Infiniband, if you do PCIe Direct Attach it would talk that too and if you connect it via Ethernet it you guessed it talk Ethernet with RDMA and all the bells and whistles to boot.

conradev · on April 22, 2024

RoCE is an IETF standard for this: https://en.wikipedia.org/wiki/RDMA_over_Converged_Ethernet

In my understanding one of the big advantages of the protocol (v2, that is) is that it is routed over IP and can work with existing switches ($$) instead of needing specialized ones ($$$$)

wmf · on April 22, 2024

RoCE never required expensive switches but getting the PFC configuration right can be tricky.

choppaface · on April 22, 2024

Many years ago I met Naveen Rao and tried to demo the Nervana-derived Intel card, which at the time Facebook and a couple others were sampling. During more formal talks, Intel sent him literally surrounded by a Xeon sales team that sidetracked the whole meeting.

When these Intel GPUs are “in the wild” it actually means Xeon salespeople are out on the hunt.

latchkey · on April 22, 2024

> An interesting aspect of Intel's design is they use Ethernet for connectivity.

Interestingly, tenstorrent is doing something similar with their wormhole cards.

I'm not convinced yet that it is the right way to go. If the switching fabric on the card fails, you lose the whole card. Keeping it separated out is a bit less risky, at the cost of some speed.

I'm more partial to composable fabrics, but they aren't ready yet for PCIe5 and we have PCIe6 just around the corner next year.

wmf · on April 22, 2024

You want each ASIC to have 24 external NICs (so 192 NICs for a server?) with all the cabling/backplanes that would require?

latchkey · on April 22, 2024

They are 24x200G, which is already outdated. Everything we are doing is currently 400G (via 8xCX7 cards running in ethernet mode) and 800G at the spine. 800G NICs, which will come with PCIe6 next year and cuts the number of connections down.

What I'd prefer is the connection is through the UBB/OAM baseboard, such that you have PCIe connections. Look into what GigaIO and Liqid are doing. There is a 3rd option that is even cooler than those two, but I don't want to mention it here. ;-)

benreesman · on April 22, 2024

I agree that it’s misleading to act like this is a product in market, we don’t really even know if the yield will happen.

But it’s a serious thing if it happens.

1024core · on April 22, 2024

About time. NVIDIA needs some serious competition.

FuriouslyAdrift · on April 22, 2024

AMD is doing several billion in ai processor sales already and the new chip is selling as fast as they can make them. At least with AMD, a customer can actually get them now as opposed to the nearly 1 year lead time from nVidia.

fransje26 · on April 22, 2024

Now, if they could also do the performant, unified, software and driver part..

anakaine · on April 23, 2024

This is a big part of why they are generally available. Building on top of AMD gear is currently more difficult than Nvidia. It's unfortunate, but the evolution of/from Cuda when AMD had little in this space gave intel a massive jump-start. As a dev it's way easier to pick up various stacks that work with nvidia gear and not have to worry anywhere near as much about what's underneath.

latchkey · on April 22, 2024

Confirmed. Buying them up as fast as I can. =)

schaefer · on April 22, 2024

Did you watch the Nvidia 2024 GTC keynote[1]?

The CEO of NVIDIA was pretty clear: Nvidia's focus is now on selling Blackwell AI Factories. According to some napkin math, each of these Blackwell AI factories will have an annual power bill of roughly $8,000,000 USD and if you do a little apples to oranges comparison will outperform the computer that's currently #1 on the supercomputing top 500 (Frontier at Oak Ridge) by one to two orders of magnitude.

for comparison, over the past fifteen years, we've seen 3 orders of magnitude improvement in the supercomputing top 500. (from roadrunner in 2008 to frontier in 2024). Nvidia is going to do 1 to 2 orders of magnitude improvement instantaneously.

Nvidia's core market has shifted dramatically. There is no competition.

[1]: https://www.nvidia.com/gtc/keynote/

picojoulebits · on April 23, 2024

Not to rain on the parade here but nvidia is not going to haul a 1 to 2 order of magnitude in a single jump. Top500 measures fp64, nvidia likes to quote fp8. Blackwell is actually a regression on fp64 performance to provide more die space for AI compute. This isn't to say the silicon isn't good just different goals being hit.

schaefer · on April 23, 2024

I’m aware.

In the keynote Nvidia is quoting FP4 performance for Blackwell specifically. Frontier and every computer that has ever hit the supercomputer top 500 is measuring fp64. This is why I qualified my statement above by saying if you’re willing to compare apples and oranges.

But I think this is where things open up to debate (and/or personal interpretation). My view is, if all you care about is AI workload (Specifically LLMs), then you really are seeing a full two orders of magnitude improvement. And if we are trying to get a feel for the space, and what that even means, then there really is nothing else to compare Blackwell to other than frontier, in terms of scale alone.

Above I say “one or two orders of magnitude”, the “one order of magnitude” is a value I got by taking Blackwell’s FP4 values and dividing by 16, to try to conceptually convert back to a value that can be compared to fp64. — I’m perfectly happy to admit that all I’m really capable of doing here is using estimates to compare a completely new thing (the new compute architecture of Blackwell) to what came before it (the majority of the history of the supercomputing 500). But that’s okay, because that’s my only goal.

—-

Believe me, I get it. Blackwell will never post a linpack score. Blackwell will never run big scientific computing jobs like… global weather simulations.

But as computer science nerds, how could we not be excited to see such a big move: changing the computer architecture to better suit a specific compute workload (LLMs)?

I know Bitcoin has its asic processors, but to me Blackwell’s mission statement of “computing intelligence” is a lot more interesting.

nullc · on April 23, 2024

When moving into a market where there are no competitors one should be wary that this may be because there are no customers.

schaefer · on April 23, 2024

The topic of customer interest is addressed at timestamp 59 minutes and zero seconds of the keynote.

By my count, 49 corporate logos were shown as launch partners.

At one point in the talk the CEO claims that “Blackwell will be Nvidia’s most successful product ever”.

—

And again, the product he’s talking about is a data centers packed with 56 racks…

anonylizard · on April 23, 2024

That's not very convincing against Nvidia. Since they single handedly created the modern AI market through CUDA.

doctorpangloss · on April 22, 2024

The only meaningful hardware competition, meaning lower prices, will come from Chinese designed, Chinese manufactured parts. This is still a long ways out.

Is it inevitable? I think so. Before 2019 there wasn't an opportunity, now there is.

For software, Chinese universities, Alibaba, Tencent and Bytedace are already releasing models, training code and in rare cases datasets that are competitive with private offerings. CogVLM/CogAgent is one that I use. It's very promising.

elzbardico · on April 22, 2024

How much time for that? I wouldn't expect nothing in industrial volumes for the next years, maybe 2028? who knows?

But, anyway, we will prohibited from buying it, probably. We still can't buy Cuban cigars.

wmf · on April 22, 2024

I don't think we'll be legally prohibited from buying it but there will be zero English docs (see Allwinner and such). Maybe if you're lucky you'll get an uncommented code dump with a forked years-old version of PyTorch.

jdhzzz · on April 22, 2024

Can't an LLM translate?

wmf · on April 22, 2024

Touche.

bufferoverflow · on April 23, 2024

Or from Tesla project Dojo

rbanffy · on April 22, 2024

Competition doesn't do much when all production everywhere is already taken in preorders. It'll only change when there is surplus production.

talldayo · on April 22, 2024

pokes OpenCL's corpse with a stick

C'mon, do something...

ein0p · on April 22, 2024

Ironically, Transformers are relatively simple architectures - all you really need is a high performance matmul. So OpenCL could "do something" at this point, if it were alive.

imtringued · on April 22, 2024

https://github.com/ROCm/ROCm/issues/2754

Wow and I thought that the latest generation of GPUs was better.

anakaine · on April 23, 2024

From that exchange, clearly not. They do not have their stuff together. This is one area where AMD really should be employing quite a full team to ensure that major projects, including drivers, are able to support AMD cards without any friction. Whilst it's still a PITA to work.with AMD hardware in most AI frameworks Nvidia will continue to dominate.

anakaine · on April 23, 2024

Its very unfortunate. One of our vendors has spent a hell of an effort making spatial analysis tools work via OpenCL instead of being just processor bound, and its made those industry standard libraries 1-2 orders of magnitude faster. That is something the spatial industry desperately needs in order to improve iteratively.

With the state of OpenCL it's frustrating. So much to be had, but so little improvement and support.

talldayo · on April 23, 2024

In 50 years, I feel like GPGPU compute will be told as a Greek tragedy. Nvidia, pariah of Apple, would create a parallel compute ecosystem that was so powerful that even the combined effort of the industry couldn't topple it. At every corner where they could have thwarted them, Nvidia's competitors refused to up the ante or chided CUDA's efforts as silly and unnecessary. One crypto/AI craze later, everyone and their mother is berating their OEM for ignoring such basic functionality and refusing to collaborate on a competing alternative.

Frustrating is a good way to put it, but there's a certain causal satisfaction I feel from watching it all unfold. Of course everyone loses to CUDA when they refuse to sponsor an Open Source alternative. It's fascinating to me that hardware manufacturers would rather let CUDA dominate than establish a basic working relationship.

jsight · on April 22, 2024

So, where can I use instances of Gaudi 3 and what is the hourly price for these instances?

latchkey · on April 22, 2024

This is something I'd like to offer via my business (Hot Aisle) at some point in the near future. Right now, we are just getting started and focused on MI300x, but the long term goal is to offer any type of high end compute that people are willing to rent.

alchemist1e9 · on April 22, 2024

I’m interested and have questions, but https://www.hotaisle.xyz/ doesn’t exactly provide a lot of answers.

latchkey · on April 22, 2024

Sorry about that. We are just getting started, so the lowest priority right now is the website.

Additionally, due to the KYC requirements around these GPUs (due to US export controls), we really want to get to know our customers first.

Feel free to ping me on email and happy to get on a call and talk more.

CapeTheory · on April 22, 2024

What USP are you aiming for, to differentiate from the many companies who have tried and failed to offer some form of HPCaaS over the last 10-15 years?

latchkey · on April 22, 2024

Great question! I'm going to answer it the only way I know how... with a bit of a story of the history of things. Sorry if this bores you.

The problem I realized over a year ago was that nobody had hourly rental access to high end AMD GPUs. In addition, access to high end Nvidia was equally difficult. I signed up for a CoreWeave account, put in my credit card and was told a few weeks later that my account was not approved.

In effect, the only way to get access to super high end compute, was to be involved in HPC and that requires connections. At the time, we also didn't even know if AMD was going to seriously adopt AI as a strategy.

My view was that there were actually two problems, lack of general access and that everyone was putting all their eggs into a single basket. Mostly because of that lack of access, and because AMD was lacking a great developer flywheel story.

I spent August to December building a business plan, closing funding, forming the business, hiring my co-founder full time, securing data center space, securing direct relationships with vendors, and designing the system we were going to deploy. There are a million other little details in there, but this is long enough as it is.

Oct/Nov of last year rolls around and suddenly AMD has changed their tune. Lisa Su doubles down. Dec 6th, MI300x rolls out. We made our first PoC order in January, received it in March. It just goes to show how cutting edge and how long all of this takes. 3 more small (not hyperscaler) businesses sprung up during that time, all offering effectively the same product. We went to the data center, deployed our PoC and about 2 weeks later, we had our first customer onboarded. I call all of that validation, and was able to secure further funding based on it.

To answer your question, I'm not sure that I need a specific USP. The demand for compute isn't going down. If I have a product that people want, and I can offer them ethical, honest, truthful, great service around that product. All based on decades of experience. Can't that be enough? Myself and my investors believe so.

sidkashyap · on April 22, 2024

https://www.intel.com/content/www/us/en/developer/tools/devc...

BreakKunt · on April 27, 2024

NVDA customers are willing to pay to much for AI chips so they get the money back and more from the pumped up NVDA stock price . This is inflationary and we are still stuck with high interest rates because of those schmucks!

seventytwo · on April 22, 2024

What are the row of green rectangles in the middles of the longe edges?

wmf · on April 23, 2024

Maybe VRMs (although I've never seen VRMs that look like that).

pezezin · on April 23, 2024

Judging by the pictures at the end of the article, it looks like the VRM are under the board o_O