More

justinpombrio · 2025-10-08T22:08:26 1759961306

Could you name one that seems likely to fail?

Dylan16807 · 2025-10-08T22:29:08 1759962548

Off the top of my head: Assuming there's a specific point the data needs to get to. Assuming the size of the data sphere doesn't impact the speeds of anything inside it. Assuming we're using a classical computer. Assuming the support scaffolding of the computer stays a fixed percentage of the mass and doesn't eat into the data budget.

And I know some of those still fit into "at least" but if one of those would make it notably worse than sqrt(n) then I stand by my claim that it's a bad speed rating.

gpm · 2025-10-08T23:53:51 1759967631

Time dilation for one. As you get closer to a black hole, an observer far away sees it as if time is slowing down for you. Not really sure how this changes the Big O analysis.

justinpombrio · 2025-09-23T01:35:40 1758591340

I was going to say that that's not a CRDT because it requires a centralized server (the conflict resolution is "order in which the server received the messages", and clients aren't allowed to share updates with each other, they can only get updates from the server). But now I'm looking at definitions of CRDTs and it's not clear to me whether this is supposed to count or not.

Still, every algorithm that's actually labeled a CRDT shares a magical property: if my replica has some changes, and your replica has some changes, our replicas can share their changes with each other and each converge closer to the final state of the document, even if other people have been editing at the same time, and different subsets of their changes have been shared with you or I. That is, you can apply peoples' changes in any order and still get the same result. I don't think it's useful to call anything without that property a CRDT.

immibis · 2025-09-23T07:50:03 1758613803

The C in CRDT means the order doesn't matter, which means you can just put all the gossiped changes into a big bag of changes and if everyone knows the same changes, they have the same final document, so a simple gossip protocol that just shares unshared data blobs will eventually synchronize the document. If order matters, it's not a CRDT. This one isn't a CRDT because the order matters if two clients insert text at the same position.

justinpombrio · 2025-09-07T18:22:52 1757269372

Rust's traits _do_ solve the expression problem.

Each data type is a `struct`. Each operation is a trait. You `impl` each trait on each struct.

This works even if you're using a library that has declared `struct A` and `struct B` and `trait F` and `trait G`, and you want to add both a `struct C` and a `trait H`, and fill out the whole 3x3 grid without modifying the library.

The library says:

    struct A { ... }
    struct B { ... }

    trait F { ... }
    impl F for A { ... }
    impl F for B { ... }

    trait G { ... }
    impl G for A { ... }
    impl G for B { ... }

    fn some_function<T: F + G>(data: T) { ... }

Your code says:

    use library::{A, B, F, G};

    struct C { ... }
    impl F for C { ... }
    impl G for C { ... }

    trait H { ... }
    impl H for A { ... }
    impl H for B { ... }
    impl H for C { ... }

    fn another_function<T: F + G + H>(data: T);

Now `library::some_function()` can be called with an A, B, or C, even though C was defined by the user of the library. And `another_function()` can be called with A, B, or C, even though A was defined by the library.

ceronman · 2025-09-07T20:56:49 1757278609

I don't know why you're getting down voted. But you are right. Rust type system solves this in a very nice way. Maybe to clarify we can show how to do the exact same example shown with Clojure multi-methods, but in Rust:

    struct Constant { value: i32 }
    struct BinaryPlus { lhs: i32, rhs: i32 }
    
    trait Evaluate {
        fn evaluate(&self) -> i32;
    }
    
    impl Evaluate for Constant {
        fn evaluate(&self) -> i32 { self.value }
    }
    
    impl Evaluate for BinaryPlus {
        fn evaluate(&self) -> i32 { self.lhs + self.rhs }
    }
    
    // Adding a new operation is easy. Let's add stringify:
    
    trait Stringify {
        fn stringify(&self) -> String;
    }
    
    impl Stringify for Constant {
        fn stringify(&self) -> String { format!("{}", self.value) }
    }
    
    impl Stringify for BinaryPlus {
        fn stringify(&self) -> String { format!("{} + {}", self.lhs, self.rhs) }
    }
    
    // How about adding new types? Suppose we want to add FunctionCall
    
    struct FunctionCall { name: String, arguments: Vec<i32> }
    
    impl Evaluate for FunctionCall {
        fn evaluate(&self) -> i32 { todo!() }
    }
    
    impl Stringify for FunctionCall {
        fn stringify(&self) -> String { todo!() }
    }

mrkeen · 2025-09-08T06:45:55 1757313955

The only thing missing here is separation of files.

Assuming the whole Stringify section goes into a new file (likewise with FunctionCall) then I agree that this solves the expression problem.

Kranar · 2025-09-07T18:57:49 1757271469

While this has nothing to do with the expression problem, it's worth noting that in any case your solution does not work in general.

Rust does let you impl traits for types or traits that are inside of your crate, so your example strictly speaking works, but it does not let you impl traits if both the type and the trait are not inside of your crate. This is known as the orphan rule:

https://doc.rust-lang.org/reference/items/implementations.ht...

As the article points out, the expression problem itself is pretty simple to solve for code that you control, it's when writing modular software that can vary independently that gives rise to issues.

ceronman · 2025-09-07T20:31:05 1757277065

Why would the the orphan rule be a problem here?

The orphan rule only disallow impls if both the trait and the type are defined outside the crate.

But in this example if you are adding a new type (struct) or a new operation (trait), well this new item should be in your crate, so all the impls that follow are allowed.

Kranar · 2025-09-07T22:33:33 1757284413

It's not a problem here as it has nothing to do with this to begin with. I am pointing out a limitation in a feature that the author has presented, but that feature does not resolve anything about the topic being discussed.

The goal isn't to allow a new type to work for an existing implementation of a function nor is it to take an existing type and write a new function that works with it. In the proposed solution you have `some_function` and the author claims that this solves the expression problem because you can take a new type C and pass it into some_function. Pretty much every language has a way to define new types and pass them into existing functions.

The goal is to allow for that new type, C, to have its own implementation of `some_function` that is particular to C, as well as the ability to write new functions that can be specialized for existing types. In particular we want calls to `some_function` through an interface to call C's specific implementation when the runtime type of an object resolves to C, and calls whatever other implementations exist when called through another interface.

The author's solution doesn't do that, it literally does nothing that you can't do in pretty much any other language.

wavemode · 2025-09-07T18:48:02 1757270882

That's not the expression problem. The expression problem is the question of, what is required to add new methods to an existing trait. In your example, it requires modifying the existing impls for all types implementing the trait.

What you've done is different, you've simply added a new trait entirely. That's not unique to Rust, you can add new interfaces in any language...

ceronman · 2025-09-07T20:47:26 1757278046

> That's not the expression problem.

To me it looks like this is exactly the expression problem. The expression problem is not about adding methods to a trait, it's about adding an operation to a set of types. In this case every operation is defined by a single trait.

The idea behind the expression problem is to be able to define either a new operation or a new type in such a way that the code is nicely together. Rust trait system accomplish this beautifully.

> That's not unique to Rust, you can add new interfaces in any language...

Many languages have interfaces, but most of them don't allow you to implement them for an arbitrary type that you have not defined. For example, in Java, if you create an interface called `PrettyPrintable`, but you can't implement it for the `ArrayList` type from the standard library. In Rust you can do this kind of things.

kbolino · 2025-09-07T19:10:20 1757272220

There is something in Rust that can help, though. You can define multiple impls for different bounds.

Supposing we started off with a trait for expression nodes:

  pub trait ExprNode { fn eval(&self, ctx: &Context) -> Value; }

Now, this library has gone out into the wild and been implemented, so we can't add new methods to ExprNode (ignoring default implementations, which don't help solve the problem). However, we can define a new trait:

  pub trait CanValidate { fn validate(&self) -> Result<(), ValidationError>; }

And now we get to what is (somewhat) unique to Rust: you can define different method sets based upon different generic bounds. Suppose we have a parent Expr struct which encapsulates the root node and the context:

  pub struct Expr<N> { root: N, ctx: Context }

We would probably already have this impl:

  impl<N: ExprNode> Expr<N> {
    pub fn eval(&self) -> Value { self.root.eval(&self.ctx) }
  }

Now we just need to add this impl:

  impl<N: CanValidate + ExprNode> Expr<N> {
    pub fn validate(&self) -> Result<(), ValidationError> { self.root.validate() }
  }

Of course, this is a trivial example (and doesn't even require the intersection bound), but it does illustrate how the expression problem can be solved. The problem this strategy creates, though, is combinatorial explosion. When we just have two traits, it's not a big deal. When we have several traits, and useful operations start to require various combinations of them (other than just Original + New), the number of impls and test cases starts to grow rapidly.

justinpombrio · 2025-03-19T05:06:59 1742360819

> (not an actual dictionary) Urban Dictionary

<troll> The definition of dictionary is just "Words about words" (source: Urban Dictionary), so I'd say that Urban Dictionary qualifies. </troll>

justinpombrio · 2025-02-28T04:46:23 1740717983

Gödel's Incompleteness Theorem places a limit on what you can prove within a formal system. Neither humans nor LLMs are a formal system, so it says nothing about them.

Someone's going to think that, since you can formally model computer programs (and thus formally model how an LLMs runs), that means that Gödel's Incompleteness Theorem applies to computer programs and to LLMs. But that's irrelevant; being model-able doesn't make something a formal system!

"Formal system" has a very technical meaning: it has a set of consistent axioms, and the ability to enumerate formal proofs using those axioms. For example, Zermelo-Fraenkel set theory is a formal system [1]. It has nine formal axioms, which you can see listed in its Wikipedia article. Utterly different kind of thing than humans or LLMs. Comparing an LLM to ZFC is like comparing a particular watermelon to the number 4.

[1] https://en.wikipedia.org/wiki/Zermelo%E2%80%93Fraenkel_set_t...

jltsiren · 2025-02-28T05:38:05 1740721085

This is unnecessary nitpicking. You can easily derive a logical contradiction of your choice by assuming that a system that can be formally modeled can prove something that cannot be proven within a formal system. If a specific theorem doesn't prove that, a simple corollary will.

If you want to escape the limits of computability, you have to assume that the physical Church–Turing thesis is false. That the reality allows mechanisms that cannot be modeled formally or simulated on any currently plausible computer.

justinpombrio · 2025-02-28T17:03:12 1740762192

> by assuming that a system that can be formally modeled can prove something that cannot be proven within a formal system

Something that cannot be proven within which formal system? Every true statement can be proven by some formal system. No formal system can prove all true statements.

(This is all about Godel incompleteness. Turing incomputability does apply though, see the sibling comments! It's important to keep them separate, they're saying different things.)

Tainnor · 2025-02-28T12:55:45 1740747345

I'm confused by this. Why are you talking about Gödel?

LLMs, like any model of computation, are subject to the limits of Turing Machines, so they cannot for example solve the halting problem.

justinpombrio · 2025-02-28T16:57:54 1740761874

Oof, I pattern matched OP to a different argument I've seen a lot. That's embarrassing.

Yes, that's correct, it's provably impossible to construct an LLM that, if you ask it "Will this program halt? <program>", answers correctly for all programs. Likewise humans, under the assumption that you can accurately simulate physics with a computer. (Note that QM isn't an obstacle, as you can emulate QM just fine with a classical computer with a mere exponential slowdown.)

ccvannorman · 2025-02-28T05:46:45 1740721605

While it's true that LLMs are not strictly a formal system, they do operate on Turing-complete hardware and software, and thus they are still subject to the broader limitations of computability theory, meaning they cannot escape fundamental constraints like undecidability or incompleteness when attempting formal reasoning.

loki49152 · 2025-02-28T07:11:57 1740726717

LLMs don't do formal reasoning. Not in any sense. They don't do any kind of reasoning - they replay combinatorics of the reasoning that was encoded in their training data via "finding" the patterns in the relationships of the tokens at different scales and then applying those to the generation of some output triggered by the input.

Tainnor · 2025-02-28T13:05:07 1740747907

That's irrelevant. They have an "input tape" and an "output tape" and whatever goes on inside the LLM could in principle be implemented in a TM (of course that wouldn't be efficient but that's besides the point).

justinpombrio · 2025-02-25T02:02:02 1740448922

Can you give a reference to your teaching materials? The way this sort works is bizarre enough that I'm curious what you said about it. (Also to verify that this is actually the algorithm you taught, given how many other people thought it was something else.)

justinpombrio · 2025-02-24T22:13:46 1740435226

The number of swaps is nearly the same between Bubble Sort and Can't Believe Sort. For a random list with 100 elements, it's 2505 swaps vs. 2513 swaps.

The number of comparisons is of course about twice as large, because Bubble Sort always does exactly n(n-1)/2 comparisons while Can't Believe sort always does exactly n*n comparisons.

https://play.rust-lang.org/?version=stable&mode=debug&editio...

thaumasiotes · 2025-02-24T22:22:32 1740435752

> The number of swaps is nearly the same between Bubble Sort and Can't Believe Sort. For a random list with 100 elements, it's 2505 swaps vs. 2513 swaps.

Why is there any difference? Does the entire difference come from the iteration where i = 1?

> Bubble Sort always does exactly n(n-1)/2 comparisons

Bubble sort can do less than this. When you're bubbling down into a sorted list, and you find a value lower than you are, you can assume that all values below that one are also lower than you are and terminate the inner loop early.

justinpombrio · 2025-02-24T22:40:19 1740436819

> Why is there any difference? Does the entire difference come from the iteration where i = 1?

Sometimes it's more swaps, and sometimes it's less. The vague pattern looks like Can't Believe Sort does slightly fewer swaps on larger lists, but that could also have been noise.

Why do you expect them to have exactly the same number of swaps?

> Bubble sort can do less than this.

Oh sorry yeah, I implemented bubble sort without any sort of early return, for a more direct comparison.

thaumasiotes · 2025-02-24T23:00:32 1740438032

I expect them to have exactly the same number of swaps in every iteration of the loop past the first one, because they are doing exactly the same thing. It's not just the same number of swaps. It's essentially the same set of swaps, in a different order. In the iteration where i = 1, the number of swaps may differ. Was I wrong about that?

justinpombrio · 2025-02-25T00:56:40 1740445000

Yeah, 2410 vs. 2509 swaps not including the first iteration. You should try measuring these things yourself, I'm not sure if it's doing what you think it's doing.

thaumasiotes · 2025-02-25T05:44:00 1740462240

Well, I sort of agree with you and sort of disagree.

    from random import randrange

    def list_swap(lst, i, j):
        temp = lst[i]
        lst[i] = lst[j]
        lst[j] = temp


    def icbicsort(lst):
        swap_count = 0
        n = len(lst)
        for i in range(n):
            for j in range(n):
                if lst[i] < lst[j]:
                    swap_count += 1
                    list_swap(lst, i, j)
        return swap_count

    def bubble(lst):
        swap_count = 0
        n = len(lst)
        # preprocess
        # this is intentionally identical to icbicsort with i = 0
        for j in range(n):
            if lst[0] < lst[j]:
                swap_count += 1
                list_swap(lst, 0, j)
        # go forward, bubbling backward
        for i in range(1, n):
            for j in range(i, 0, -1):
                if lst[j] < lst[j-1]:
                    swap_count += 1
                    list_swap(lst, j-1, j)
        return swap_count

    def cmp(n=100):
        alst = []
        blst = []
        for t in range(n):
            value = randrange(65536)
            alst.append(value)
            blst.append(value)
        icb_swaps = icbicsort(alst)
        bubble_swaps = bubble(blst)
        print(f"icb: {icb_swaps} swaps; bubble: {bubble_swaps} swaps")
    
    # just for completeness
    def bubble_classic(lst):
        n = len(lst)
        sorted = False
        while not sorted:
            sorted = True
            for i in range(n-1):
                if lst[i] > lst[i+1]:
                    sorted = False
                    list_swap(lst, i, i+1)
    return

26 calls to cmp() produced identical swap counts in 24 cases and slight discrepancies (of 2 and 3 swaps, both favoring icbsort) in two cases. (For statistical purposes, this was a manual inspection where I stopped after getting the second discrepancy.)

If you're regularly seeing discrepancies, something is weird. If you're seeing them occasionally, I still think something is weird, but I'm prepared to believe that it's a bug in the code.

I will note, in support of the paper's actual thesis, that I wrote icbicsort() correctly on the first try, and I had to work many bugs out of bubble(). Maybe it's still bugged.

justinpombrio · 2025-02-24T18:15:36 1740420936

I did search for "trivial sorting network", but the only networks that were called trivial were the ones for exactly two elements, while this algorithm sorts an arbitrary number of elements.

Could you link to what you're talking about? And what's its big-O runtime?

javierlc · 2025-02-25T15:38:17 1740497897

This is a much stronger result from the 80s: https://www.researchgate.net/publication/221590321_An_On_log...

javierlc · 2025-02-25T12:31:34 1740486694

There is discussion about Sorting Networks in the old post (find link in one of the top comments of this post)

justinpombrio · 2025-02-20T18:02:13 1740074533

The paper: https://arxiv.org/abs/2410.20392

justinpombrio · 2025-02-12T06:19:23 1739341163

Yup, that's all correct. I drew `squash` and `backout` in terms of files in order to avoid needing notation for the opposite of an edit and the composition of two edits.