Indeed all the DockYard pieces announced are nice, but the posts are puff pieces...

bitwalker · on Sept 3, 2022

> Only AOT compiled native code, except targeting WASI.

More precisely, a number of targets are supported, WASI would be just one.

> Generally BEAM is understood to be slower of the big runtimes (compared against Java, CLR, Go and often V8/Javascript). FireFly claims to be faster, and smaller in compiled form than current interation compiled against BEAM.

The point is that by placing some restrictions on what is possible at runtime (specifically by removing the possibility of hot code loading), we can do whole program analysis and thereby do much more aggressive forms of optimization and dead code elimination across compiled applications _and_ the runtime they link to. It isn't guaranteed that such programs would be faster (though I suspect in some cases they would be), but they almost certainly should be smaller, which is important for Wasm, and other constrained targets.

> How that plays with actor model and preemptively switched lightweight processes is a mystery to me too.

It makes no difference, you can implement all of that with identical semantics from the perspective of the developer, the strategy used for compilation is orthogonal to those features, though naturally the implementation details are tightly integrated.

derefr · on Sept 4, 2022

> It isn't guaranteed that such programs would be faster (though I suspect in some cases they would be), but they almost certainly should be smaller, which is important for Wasm, and other constrained targets.

Smaller in code size, maybe (though that possibly doesn't matter given that code is pretty compressible), but possibly less cache-coherent! One thing that's interesting about bytecode interpreters like BEAM, the JVM, etc., is that since all the actual native-ISA code is just the same small set of instruction impls being jumped to over and over, the interpreter can stay entirely hot in L2 or even L1 cache at all times; with the program bytecode being executed through, while less hot, being more concise, and therefore also being "hotter per byte" since code that does more per op will take longer to fully run through before it needs to be evicted in favor of something else.

This has been a consideration for decades—it's why programs compiled to Pascal p-code tended to be faster than programs compiled natively for the low-level (mostly ALUless) host instruction sets of the time.

bitwalker · on Sept 4, 2022

Smaller code size was the goal; and BEAM bytecode adds up quick, even compressed. For example, the BEAM files for _just_ Ecto come out to about 1.2M in an uncompressed tarball, but in a gzipped tarball at maximum compression is still 799K. In my experience, about the smallest release for the average application in terms of BEAM bytecode is about 30M (uncompressed), across the standard library, dependencies, and your own code. Virtually none of that can be dead code eliminated, because the BEAM has to treat it all as potentially reachable. Shipping that much to a browser (even compressed) is just not viable.

Your point about instructions remaining hot in the cache might very well be true more often than not, but is highly sensitive to the application in question. The core interpreter opcode impls might all fit in cache at the same time (though I doubt even that with the BEAM due to how many there are), but any call to BIFs/NIFs is likely to cause evictions. It still might do better than natively-compiled code overall in that specific sense, but I would be hesitant to state any generalities about it when considered as a whole with all of the many other factors that play in to overall performance.

In any case, Firefly wasn't about building a faster BEAM, but about bringing BEAM languages to the browser (or really any Wasm host). While targeting standard server/desktop architectures was something we also wanted to support, particularly for writing CLI tools and such, we expect the BEAM will always be the first choice for people deploying to those systems. If we can build something that is faster than the BEAM in some cases due to the tradeoffs we make, that's great, but it isn't an explicit goal.

derefr · on Sept 4, 2022

> For example, the BEAM files for _just_ Ecto come out to about 1.2M in an uncompressed tarball

Did you try stripping out debugging chunks? IIRC every Elixir module embeds its own source code by default.

e3bc54b2 · on Sept 4, 2022

Thank you! That makes quite a few things clear.

As a naïve question, how does SBCL (Common Lisp compiler) generate such a wicked fast code? Other than actor model, it is also dynamically typed and hot code loading is one of the headline features there.

bitwalker · on Sept 4, 2022

When I say hot code loading, I’m really specifically referring to how that works in the BEAM, which is a more sophisticated mechanism than simply compiling/generating code on the fly. The biggest problem though is that it prevents most forms of dead code elimination. It also means that you can never assume anything about how a function will be called, because at any point new code could be loaded that calls it differently. You can still optimize such code with a tracing JIT, but in Wasm (at least currently) that’s not even an option.

Without knowing too much detail about SBCL specifically, I suspect that they use a combination of clever static analysis and specialization to unlock a lot of that speed. That way functions can be specialized/inlined where beneficial, but new code can safely call the unoptimized versions when loaded at runtime, but even hot loaded code could be specialized with a JIT on hand. The big reason we made the trade off with hot code loading though is due to the restrictions that Wasm imposes - there’s no particular reason we couldn’t support it otherwise. In general it is rarely used in production BEAM apps in my experience, so from my perspective it seemed like an opportunity to stop paying for a feature unused and gain something in return.

e3bc54b2 · on Sept 4, 2022

Thank you for detailed reply here and elsewhere in the thread. That probably took as much time as writing a new blog post, so it is really appreciated.

As mentioned before, as a newbie student of Elixir this is all very exciting. Please keep up the good work!

hosh · on Sept 3, 2022

If you are looking for more substantial technical information, they are found in the ElixirConf ‘19 keynote speech and demo when Lumen was announced.

https://youtu.be/uMgTIlgYB-U

freedomben · on Sept 3, 2022

Agreed, these posts from Dockyard have been really low quality. Dockyard makes some awesome stuff generally, so I'm surprised at how shallow and marketing-speak these have been.

That said I'm really excited for this and the other projects. Looking forward to some more technical dives.

bitwalker · on Sept 4, 2022

Speaking as the lead on the project, this is partially due to this weekend being ElixirConf, so things are hectic, but you can also blame me, as I probably should have written this post, but didn’t make the time as I was pretty heads down on the lead up to the conference.

I’ll make sure to do a follow up post in the near future that is more in depth

nunb · on Sept 4, 2022

Thank you for all the follow up posts on this thread... It strikes me they're probably taking up as much, if not more, time just in the clarifications!

hosh · on Sept 3, 2022

They make up for it in their ElixirConf keynotes and presentations.

e3bc54b2 · on Sept 4, 2022

I'd love to watch it, but the recordings are not available yet and 250$ virtual.attendance was way too much for personal interest language.

hosh · on Sept 4, 2022

They are published free after the conference. The 2019 keynote with substantial technical information on Lumen is available for free on youtube.