More

shihab · 2025-11-02T18:19:21 1762107561

I'm curious what you did with the "active sorting range" after a push/pop event. Since it's a vector underneath, I don't see any option other than to sort the entire range after each event, O(N). This would surely destroy performance, right?

YesBox · 2025-11-02T21:17:21 1762118241

There's no need to resort when popping an item.

When adding an item, it gets added to the next unused vector element. The sorting range end offset gets updated.

Then it sorts (note you would actually need a custom sort since a PriorityQueue is a pair)

  std::push_heap( vec.begin() + startOffset, vec.begin() + endOffset ) [1]

Adding an item would be something like:

  endOffset++; vec.insert( vec.begin() + endOffset, $value );   [1]

Or maybe I just used

  endOffset++; vec[endOffset] = $value;  [1]

Popping an item:

  startOffset++;

[1] Im writing this from memory from my attempt many months ago. May have mistakes.. but should communicate the jist

shihab · 2025-10-13T05:19:31 1760332771

> To Koreans, they looked more like sauce bowls, leading them to conclude that the Japanese had starved themselves to stretch out the siege.

As a Bengali man, that's exactly how I felt when I came to USA and first visited japanese restaurants. Part of the reason we consume so much rice is that rice is kind of the main dish (not a side)- it literally takes up central and most of the space in your food plate.

https://commons.wikimedia.org/wiki/File:%E0%A6%87%E0%A6%B2%E...

teleforce · 2025-10-13T06:26:42 1760336802

Typical Japanese will devour their small rice bowl until there's none of rice grain is left over, since they're taught from the very young age not to waste food.

Most of other Asian nations will not eat their rice until it's completely finished. Even with their most delicious biryani dish there're always many rice grains left in the plate. I think the small bowl make it much easier to completely consume the rice unlike the big bowl or plate.

nirava · 2025-10-13T07:03:41 1760339021

The Japanese mostly eat sticky rice, which is very easy to eat and "clean up" even with a chopstick.

The Indian subcontinent eat long-grain Basmati or similar rice which fluff up into individual grains on the plate. It doesn't make sense to individually pick out single leftover grains.

In nearly every culture is the idea of "Annapurna" or the god of food, and wasting food is generally frowned upon and considered bad table manners. I've been scolded plenty of times as a child for not cleaning up my plate in Nepal.

I wouldn't attribute it to small bowls at least. The Japanese instilling good virtues into their children almost institutionally perhaps plays some part in it, but also some of it is just physics.

dayjaby · 2025-10-13T08:19:24 1760343564

Having had grandparents live through WWII (or any other war to be fair) also helps instill this attitude. I can barely imagine what kind of famines they had to endure.

ignoramous · 2025-10-13T11:08:10 1760353690

Wasting food is faux pas in many a eastern cultures, including South Asian & Middle Eastern.

hbarka · 2025-10-13T06:06:59 1760335619

Ilish fish also known as hilsa, the king of fish. That’s one delicious fish.

shihab · 2025-10-08T22:13:10 1759961590

Could you please elaborate on your example? Thanks.

Remnant44 · 2025-10-08T22:53:29 1759964009

Sure.. in detail and abstracted slightly, the byte table problem:

Maybe you're remapping RGB values [0..255] with a tone curve in graphics, or doing a mapping lookup of IDs to indexes in a set, or a permutation table, or .. well, there's a lot of use cases, right? This is essentially an arbitrary function lookup where the domain and range is on bytes.

It looks like this in scalar code:

transform_lut(byte* dest, const byte* src, int size, const byte* lut) { for (int i = 0; i < size; i++) { dest[i] = lut[src[i]]; } }

The function above is basically load/store limited - it's doing negligible arithmetic, just loading a byte from the source, using that to index a load into the table, and then storing the result to the destination. So two loads and a store per element. Zen5 has 4 load pipes and 2 store pipes, so our CPU can do two elements per cycle in scalar code. (Zen4 has only 1 store pipe, so 1 per cycle there)

Here's a snippet of the AVX512 version.

You load the lookup table into 4 registers outside the loop:

  __m512i p0, p1, p2, p3;
  p0 = _mm512_load_epi8(lut);
  p1 = _mm512_load_epi8(lut + 64);
  p2 = _mm512_load_epi8(lut + 128);
  p3 = _mm512_load_epi8(lut + 192);

Then, for each SIMD vector of 64 elements, use each lane's value as an index into the lookup table, just like the scalar version. Since we only can use 128 bytes, we DO have to do it twice, once for the lower and again for the upper half, and use a mask to choose between them appropriately on a per-element basis.

  auto tLow  = _mm512_permutex2var_epi8(p0, x, p1);
  auto tHigh = _mm512_permutex2var_epi8(p2, x, p3);

You can use _mm512_movepi8_mask to load the mask register. That instruction sets each lane is active if its high bit of the byte is set, which perfectly sets up our table. You could use the mask register directly on the second shuffle instruction or a later blend instruction, it doesn't really matter.

For every 64 bytes, the avx512 version has one load&store and does two permutes, which Zen5 can do at 2 a cycle. So 64 elements per cycle.

So our theoretical speedup here is ~32x over the scalar code! You could pull tricks like this with SSE and pshufb, but the size of the lookup table is too small to really be useful. Being able to do an arbitrary super-fast byte-byte transform is incredibly useful.

vincenthwt · 2025-10-09T01:31:47 1759973507

I love lookup tables. Thanks for sharing!

kbolino · 2025-10-08T22:36:51 1759963011

Here's a non-parallel and unoptimized implementation of that operation in Go:

  func _mm512_permutex2var_epi8(a, idx, b [64]uint8) [64]uint8 {
    var dst [64]uint8
    for j := 0; j < 64; j++ {
      i := idx[j]
      src := a
      if i&0b0100_0000 != 0 {
        src = b
      }
      dst[j] = src[i&0b0011_1111]
    }
    return dst
  }

Basically, for a lookup table of 8-bit values, you need only 1 instruction to perform up to 64 lookups simultaneously, for each 128 bytes of table.

shihab · 2025-10-05T05:47:01 1759643221

One feedback from someone interested in using this about the examples: I have looked at several and they seem too high level to get a sense of the actual API (i.e. the expected benefit of using this library vs the development complexity of using it).

For example, the cloth bending simulation is almost entirely: at __init__, call a function to add a cloth mesh to model builder obj, pass built model to initializer of a solver class; and at each timestep: call a collide model function, then call another function called solver.step. That's really it.

shihab · 2025-09-26T13:37:44 1758893864

That was government of qatar, this is Abu Dhabi (UAE). They had a diplomatic crisis, with full-scale blockade not so long ago.

meibo · 2025-09-26T13:44:31 1758894271

It's fine, I don't think he can tell the difference.

bgwalter · 2025-09-26T13:45:11 1758894311

UAE was engaged in crypto dealings instead:

https://www.nytimes.com/2025/09/15/us/politics/trump-uae-chi...

"Earlier this year, World Liberty, the crypto firm run by the Trumps and Witkoffs, announced an agreement with an investment firm backed by the ruling family of the U.A.E. The Emirati firm would conduct a $2 billion transaction using World Liberty’s digital coins, a deal that would provide a windfall to the Trump and Witkoff families."

alephnerd · 2025-09-26T15:25:51 1758900351

That's the Emirate of Dubai, not Abu Dhabi.

tdeck · 2025-09-26T15:36:22 1758900982

I wish there were some clause in the US constitution that broadly and expressly prohibited this kind of thing.

bix6 · 2025-09-26T15:09:52 1758899392

What does the UAE get out of this? Is it just a massive financial loss in exchange for US market access?

fib11235 · 2025-09-26T15:25:58 1758900358

One of NYT's recent podcasts (The Daily) covered this, basically the Biden administration was reluctant to give the UAE access to Nvidia chips because of their close dealings with China. 2 weeks after this crypto investment, the white house agrees to give the UAE access to the chips.

Here's an article if you're interested: https://www.nytimes.com/2025/09/15/us/politics/trump-uae-chi...

bix6 · 2025-09-27T15:38:17 1758987497

ceejayoz · 2025-09-26T16:56:57 1758905817

> What does the UAE get out of this?

What does the UAE gain by funneling $2B to Trump, who is notoriously a) transactional and b) one of the most powerful people on the planet?

bix6 · 2025-09-27T15:38:06 1758987486

yeah my question was around the return transaction which the other commenter answered

mrguyorama · 2025-09-26T23:02:00 1758927720

Being "transactional" requires you to hold up your end of the bargain, which Trump famously does not.

Trump is not transactional.

ceejayoz · 2025-09-27T01:42:11 1758937331

He's very transactional. He just regularly backstabs after he gets his end of things.

tempodox · 2025-09-27T06:49:41 1758955781

Is quibbling over a euphemism really worth the time? He’s corrupt, plain and simple.

shihab · 2025-09-01T20:49:40 1756759780

I’m a big, big fan of acquired podcast, listened to every single of their episodes.

But I remember their entire Microsoft episodes felt like a lengthy defense of Steve Ballmer. There were too many instances of “here’s why this bad decision of Steve made sense given the circumstances” or “ here is how people underestimate the contribution of Steve on this good decision.” They were all well argued points, of course, but so numerous that I found myself wondering if the hosts does not have a relationship with Steve.

The existence of this interview does not help with that suspicion.

SlowTao · 2025-09-02T00:16:31 1756772191

I think the big thing was Steve did make a lot of great decisions, some of the best the company could at that time in those respective fields but completely missed on everything that Apple did. Portable media players, smart phones and tablets and those are the three huge misses and that is really were it counted.

The old three envelope joke.

You become CEO and there are three envelopes on your office desk, a note says "Every time there is an issue you open them in order and do what is inside". First envelope says "Blame your predecessor.". The second says "Blame yourself". The third says "Prepare three envelopes".

opo · 2025-09-02T01:05:17 1756775117

>...The second says "Blame yourself".

The second step in this joke is something like "Reorganize.", not "Blame yourself".

https://kevinkruse.com/the-ceo-and-the-three-envelopes/

big_toast · 2025-09-01T21:57:21 1756763841

This is the problem with podcasts, but also modern media, in general. You have to play softball or be ideologically homogeneous to get access. Anything else has a negative k for a variety of reasons.

mrandish · 2025-09-02T01:02:31 1756774951

I haven't listened to all their MSFT coverage but it's possible they genuinely feel Ballmer's gotten a worse rap than he deserves and they're trying to contextualize some of the decisions and circumstances.

fruitplants · 2025-09-02T05:40:32 1756791632

Yes, but as sibling comment says there's that thing about softball.

I think Ballmer was better than how he was perceived. So I did expect some justification, etc in their MS episodes. But these points seemed, to use your word, numerous. I think they must have done this because in preparation for those MS episodes, they did talk to Ballmer, and expected him to listen to the episodes. Comparatively their Bernard Arnault, LVMH games take on episodes like LVMH, Hermes seemed somewhat balanced.

shihab · 2025-07-03T00:19:49 1751501989

People don’t understand that other countries (primary suppliers of stem graduate students) do have lots of research positions, it’s just they don’t usually get first rate talent because USA is far more attractive for those people. Now they will

shihab · 2025-06-23T17:24:12 1750699452

In one view, the fact that it's a software problem is actually a weakness of (GPU) hardware design.

In the olden, serial computing days, our algorithms were standard, and CPU designers did all sorts of behind-the-scene tricks to improve performance without burdening software developers. It wasn't perfect abstraction, but they tried. Algorithm led the way; hardware had to follow.

CUDA threw that all away, exposed lots of ugly details of GPU hardware design that developers _had to_ take into account. This is why, for a long time, CUDA's primary customers (HPC community & Natl labs) refused to adopt CUDA.

It's interesting that now that CUDA has become a legitimate, widely adopted computing paradigm, how much our view on this has shifted.

djmips · 2025-06-23T19:06:14 1750705574

You can still live your abstract, imperfect universe, there's nothing stopping you.

shihab · 2025-06-23T20:12:45 1750709565

I don't believe you really can in GPU world. With CPU, if you ignore something important like cache hierarchy, the performance penalty is likely to be in double digits percentage. Something people can and do often ignore. With GPU, there are many many things (memory coalescing, warp, SRAM) that can have triple digits % of impact, hell maybe even more than that.

WithinReason · 2025-06-25T08:15:03 1750839303

Ignoring the cache hierarchy on a CPU for matrix multiplication gets you a 100x performance drop, just like a GPU

shihab · 2025-06-22T01:24:30 1750555470

> Iran is the principle destabilising element in the middle east

Says Israel, the nation who tore up every single international laws, directly led campaign against UN and ICC, and whose right-wing (ones in power now) have been dreaming about a Greater Israel that threatens territorial integrity of like 10 different ME countries.

shihab · 2025-06-22T01:22:19 1750555339

I understand Iran is a headache to Israel, but did it have to be an enemy of USA? Isn't Iran's ambition, and its proxies, are all regional in nature? Have they ever attempted to harm an american living in America?

Israel has led an amazingly succesful campaign in presenting their problems (often arising out of their territorial ambitions) as a problem for the entire west.

nailer · 2025-06-22T02:36:52 1750559812

Iran has killed a bunch of Americans, but typically not inside America.

Here’s a list, make of that what you want: https://x.com/chalavyishmael/status/1936107345093996775?s=46

andsoitis · 2025-06-22T01:25:07 1750555507

The US has many economic and strategic interests in the Middle East.

The US is leaving many moments for Iran to come to the table to stop building towards nuclear power.

Workaccount2 · 2025-06-22T02:10:26 1750558226

[flagged]

wudangmonk · 2025-06-22T02:29:46 1750559386

I agree which is why we need to get all these evangelical nuts actively trying to destroy the world so that Jesus come back out of power. No more death cults!.

ndiddy · 2025-06-22T02:55:57 1750560957

Agreed, I also support the denuclearization of Israel.

yencabulator · 2025-06-22T03:19:48 1750562388

And hopefully also keeping US religious nuts away from power.

goatlover · 2025-06-22T03:33:00 1750563180

Religious zealots close to power also exist in Israel and the US.

Ar-Curunir · 2025-06-22T03:04:45 1750561485

So, Israel then?