Hacker Newsnew | past | comments | ask | show | jobs | submit | bcardarella's commentslogin

I agree that borrowing against unrealized gains is crap, it's lead to major economic divide. However, just make borrowing against unrealized gains illegal. Taxing unrealized gains is the wrong solution for a real problem.


"written in Rust"

ok


[flagged]


I don't see any reference to `0xiDizEd` in the docs or on github. Are you sure that you're discussing the right project?


Just a small comparison, compiled for release:

Boa: 23M Brimstone: 6.3M

I don't know if closing the gap on features with Boa and hardening for production use will also bloat the compilation size. Regardless, for passing 97% of the spec at this size is pretty impressive.


It looks like Boa has Unicode tables compiled inside of itself: https://github.com/boa-dev/boa/tree/main/core/icu_provider

Brimstone does not appear to.

That covers the vast bulk of the difference. The ICU data is about 10.7MB in the source (boa/core/icu_provider) and may grow or shrink by some amount in the compiling.

I'm not saying it's all the difference, just the bulk.

There's a few reasons why svelte little executables with small library backings aren't possible anymore, and it isn't just ambient undefined "bloat". Unicode is a big one. Correct handling of unicode involves megabytes of tables and data that have to live somewhere, whether it's a linked library, compiled in, tables on disks, whatever. If a program touches text and it needs to handle it correctly rather than just passing it through, there's a minimum size for that now.


Brimstone does embed Unicode tables, but a smaller set than Boa embeds: https://github.com/Hans-Halverson/brimstone/tree/master/icu.

Brimstone does try to use the minimal set of Unicode data needed for the language itself. But I imagine much of the difference with Boa is because of Boa's support for the ECMA-402 Internationalization API (https://tc39.es/ecma402/).


Yeah, the majority of the difference is from the Unicode data for Intl along with probably the timezone data for Temporal.


Is it possible to build Boa without these APIs?


For the engine, the answer is yes, Intl and Temporal are feature flagged due to the dependencies. What I suspect they’re comparing above is the CLIs, which is completely different than the engine. I’d have to double check for the CLI. If I recall correctly, we include all features in the CLI by default.


Unicode is everywhere though. You'd think there'd be much greater availability of those tables and data and that people wouldn't need to bundle it in their executables.


Unfortunately operating systems don't make the raw unicode data available (they only offer APIs to query it in various ways). Until they do we all have to ship it seperately.


For some OSes like Windows, some relevant APIs can be indeed used to reconstruct those tables. I found that this is in fact viable for character encoding tables, only requiring a small table for fixes in most cases.


Debian has a unicode-data package, so you can just depend on it.


I was currious to see what that data consisted of and aparently that's a lot of translations, like the name of all possible calendar formats in all possible languages, etc. This seems useless in the vast majority of use cases, including that of a JS interpreter. Looks to me like the typical output of a comitee that's looking too hard to extend its domain.

Disclaimer: I never liked unicode specs.


Unicode is an attempt to encode the world's languages: there is not much to like or dislike about it, it only represents the reality. Sure, it has a number of weird details, butnif anything, it's due to the desire to simplify it (like Han unification or normal forms).

Any language runtime wanting to provide date/time and string parsing functions needs access to the Unicode database (or something of comparable complexity and size).

Saying "I don't like Unicode" is like saying "I don't like the linguistic diversity in the world": I mean sure, OK, but it's still there and it exists.

Though note that date-time, currency, number, street etc. formatting is not "Unicode" even if provided by ICU: this is similarly defined by POSIX as "locales", anf GNU libc probably has the richest collection of locales outside of ICU.

There are also many non-Unicode collation tables (think phonebook ordering that's different for each country and language): so no good sort() without those either.


I am not questionning the goal of representing all the fine details of every possible languages and currencies and calendars in use anywhere at any time in the universe, that's a respectable achievment. I'm discussing the process that lead to a programming language interpreter needing, according to the comment I was replying to, to embed that trove of data.

Most of us are not using computers to represent subtle variants of those cultural artifacts and therefore they should be left in some specialized libraries.

Computers are symbolic machines, after all, and many times we would be as good using only 16 symbols and typing our code on a keyboard with just that many keys. We can't have anything but 64bits floats in JS, but somehow we absolutely need to be able to tell between the "peso lourd argentin (1970–1983)" and the "peso argentin (1881–1970)"? And that to display a chemical concentration in millimole per liter in German one has to write "mmol/l"?

I get it, the symbolic machines need to communicate with humans, who use natural languages written in all kind of ways, so it's very nice to have a good way to output and input text. We wanted that way to not favor any particular culture and I can understand that. But how do you get from there to the amount of arcane specialized minute details in the ICU dataset is questionable.


You bring up numbers, but you ignore the strings, another fundamental data type in all programming languages.

Without this trove of data, you can't do something as simple as length(str) or uppercase(str) — even in a CLI if you want to line text up.

So yes, this database has a big chunk that represents rarely useful data like you mention. But majority of it is still generally useful.


I may be wrong, but a cursory look at the data gave me the impression that the actual majority of that data was actually not related to dealing with commonplace string manipulations. Other than that, we probably agree.


The big one that's often ignored are collation tables: while there's the default in ISO 10646 IIRC, each region-language combo might have their specific overrides (imagine "ss" being sorted as a separate letter in German, and not as after "sr" and before "st", so it would be sa..., sb..., sr..., st..., ssa..., ssb... etc); and then Austrian German might have a different phonebook ordering.


> Saying "I don't like Unicode" is like saying "I don't like the linguistic diversity in the world": I mean sure, OK, but it's still there and it exists.

Respectfully disagree, linguistic diversity isn't by definition impossible to create a good abstraction on top of; I think that it's more of a failure of this particular attempt.


Care to point out a -- by your definition -- successful attempt to do it?


Does that include emojis?


Emojis are complicated from a font rendering perspective. But from a string processing perspective, they're generally going to be among the simplest characters: they don't have a lot of complex properties with a lot of variation between individual characters. Compare something like the basic Latin characters, where the mappings for precomposed characters are going to vary wildly from 'a' to 'b' to 'c', etc., whereas the list of precomposed characters for the emoji blocks amounts to "none."


Agreed!

FWIW, they are not even "complicated" from a font rendering perspective: they're simple non-combining characters and they are probably never used in ligatures either (though nothing really stops you; just like you can have locale-specific variants with locl tables). It's basically "draw whatever is in a font at this codepoint".

Yes, if you want to call them out based on Unicode names, you need to have them in the database, and there are many of them, so a font needs to have them all, but really, they are the simplest of characters Unicode could have.


> they're simple non-combining characters

Skin-tone emoji's are combined characters: base emoji + tone.


TIL, thanks for pointing it out.


To add to the skintone emojis example, country flags emojis are combined characters using two letter characters corresponding to the country code. The various "family" emojis are also combined characters of individual person emojis, and so on.


"draw whatever is in a font at this codepoint" is doing quite a lot of work there. Some emoji fonts just embed a PNG which is easy. But COLRv1 fonts define an entire vector graphics imaging model which is similar what you need to render an SVG.


Yes, but at this point we're completely outside the scope of Unicode, which has nothing to do with how anything actually gets drawn to the screen.


Sorry, what? You mean, emoji composition rules are simpler than combining diacritics? https://blog.codepoints.net/emojis-under-the-hood.html


I was unaware of this: thanks for pointing it out!


I just wish we could use system tables for that, instead of bloating every executable with their own outdated copy.

I have no issue with my system using an extra 10mb for Ancient Egyptian capitalization to work correctly. Every single program including those rules is a lot more wasteful.


If someone builds, say, a Korean website and needs sort(), does the ICU monolith handle 100% of the common cases?

(Or substitute for Korean the language that has the largest amount of "stuff" in the ICU monolith.)


Yes, though it's easy to not use the ICU library properly or run into issues wrt normalization etc


As well-defined as Unicode is, surprising that no one has tried to replace ICU with a better mousetrap.

Not to say ICU isn’t a nice bit of engineering. The table builds in particular I recall having some great hacks.


POSIX systems actually have their own approach with "locales" and I it predates Unicode and ICU.

Unfortunately, for a long time, POSIX system were uncommon on desktops, and most Unices do not provide a clean way to extend it from userland (though I believe GNU libc does).


I was gonna say the last few percent might increase the size disproportionally as the last percent tend to do[0] but looks like boa passes fewer tests (~91%).

This is something I notice in small few-person or one-person projects. They don't have the resources to build complex architectures so the code ends up smaller, cleaner and easier to maintain.

The other way to look at it is that cooperation has an overhead.

[0]: The famous 80:20 rule. Or another claiming that each additional 9 in reliability (and presumably other aspects) takes the same amount of work.


Is that with any other size optimizations? I think by default, most of them (like codegen-units=1, remove panic handling, etc) are tuned for performance, not binary size, so might want to look into if the results are different if you change them.


Stripping can save a huge amount of binary size, there’s lots of formatting code added for println! and family, stacktrace printing, etc. However, you lose those niceties if stripping at that level.


I only ran both with `cargo build --release`


One that was tossed in 0.15 was `usingnamespace` which was a bit rough to refactor away from.


You may be correct, but the rate of change in baseball is glacially slow compared to the other sports. One of baseball's intrinsic values is it's legacy, tradition, and history. Some may scoff at that, and I think there are good arguments against legacy/tradition as a reason to withhold change, but there are a lot of people out there that believe this. The MLB Commissioner's have largely been tasked with protecting that tradition and history.


This is how CSS should be written. I will never understand why class names need to repeat the semantic purpose for a given element.


I think divs sent a generation down the wrong track. It’s weakly semantic and omnipresent in every 101 tutorial; it makes the semantics overall seem weak/insufficient.


It’s just the default block-level containing element, so it serves its place but is not well explained in these tutorials (just as spans are the default inline element).

In my 25 years of experience writing HTML and CSS most engineers don’t understand semantic HTML, nor do they take the time to learn it; largely because companies don’t value it, unless they’re heavily SEO-focused companies.

I once worked at a company that would run an HTML5 validation test in our CI/CD pipeline. That was very helpful as it identified invalidly nested elements and taught proper semantic HTML.


I don’t know if it was the wrong track, it was a frame of reference with little before it.

Unless one likes using tables instead or spacer gifs.

It’s nice html has options beyond. divs now too


How far should I be able to progress? I can out of the first room and open the cabinet in the second room, but cannot progress further.


You concluded, and in another message I mention that I exaggerated in calling it a game instead of a demo. I hope you liked it.


Ok, we've changed the title to say 'demo' above - hopefully that will help!


Epic paintball venues


I tried to do this a few weeks ago, I tried to build a NIF around an existing C lib. I was using Claude Opus and burned over $300 (I didn't have Pro) on tokens with no usable results.


Get Pro, 4 is quite good at Elixir now but you have to stay on it. 3.5 was not, so I imagine next version of Claude will be able to handle the more esoteric things like NIFs, etc.


The issue in this case was Opus was pretty crap at C. It kept introducing segfaults.


Get Pro 5.. it will work I promise


I've completely refactored my Elixir codebase with Claude 4, expanded the test suite by 1,000 more tests, and released a few more features faster than I ever have to actual paying customers. Tidewave MCP is helpful as are some custom commands and a well tunded CLAUDE.md But you do you.


Would you be willing to share your CLAUDE.md file contents? I’m vibe coding in Elixir.


I somewhat followed this:

https://elixirforum.com/t/coding-with-llms-conventions-md-fo...

It's not perfect - you often have to remind it not to write imperative style code and to lean on Elixir conventions like "with" statements, function head matching, not reassigning vars, etc.


So hard to tell if this is parody or not.


Here's one Claude-vibed project that makes me money that I run in addition to my saas, which is Elixir. I'm not strong in TypeScript and this is an Astro static site, so Claude has been really helpful. Backend is Supabase (postgres) and a few background jobs via https://pgflow.dev (pgmq) that fetch and populate job openings and uses some AI steps to filter then classify into the correct categories (among other things, there's also an job application flow and automated email newsletter): https://jobsinappraisal.com

I also "vibed" up this: https://livefilter.fly.dev/todos (https://github.com/cpursley/livefilter) and this: https://star-support-demo.vercel.app/en/getting-started (https://github.com/agoodway/star-support-demo)

I hand wrote very little of this code, but can read most of it - the SQL and Elixir at least ;)


Is there a reason why you're using 'when is_struct/2' instead of pattern matching here?

https://github.com/cpursley/livefilter/blob/main/lib/live_fi...


This is clearly low quality, non-idiomatic AI-generated Elixir code. So the likely answer is that "you" did not use this at all; AI did.

I review this kind of AI-generated Elixir code on a daily basis. And it makes me want to go back to ~2022, when code in pull requests actually made sense.

Apologies for the rant, this is just a burnt out developer tired of reviewing this kind of code.

PS: companies should definitely highlight "No low-quality AI code" in job listings as a valid perk.


Fwiw, the date range part of this is the lowest quality, I even have an issue open: https://github.com/cpursley/livefilter/issues/2

In production code I'd do a couple passes and tell it to lean into more function head and guard matching, etc.

But it does compiles, and works: https://livefilter.fly.dev/todos?filters%5Bassigned_to%5D%5B...


Its not really hard to tell.


1,000 more tests!? That reads coherent to you?


This is another extraction from the project that I discussed in this HN thread: https://news.ycombinator.com/item?id=44651539#44652356

tldr; we're building a headless browser in Elixir that will embed on device, communicate to the native 1st class rendering engine (i.e. SwiftUI, Jetpack Compose, WinUI3, etc...) via disterl, and allow for web-like ergonomics to build truly native SSR applications.

The README for Elixir Pack is a bit focused on the LiveView Native project but that documentation will soon be updated to remove mention of it.


I don't understand why this headless browser should be in Elixir and why it should communicate via disterl/BERT? Although disterl is native to Erlang/OTP/BEAM VM, it should be implemented in native rendering engines.

Don't get me wrong, I prefer writing in Elixir to JS/TS or native (Swift/Kotlin etc.)

> to build truly native SSR applications

Why do you still call it SSR if it is rendered on the client device?

Is there a long-form article about this project, preferably with visuals/diagrams?


Can you explain a bit of the workflow you expect for offline support?

Like would I have one set of LiveViews that run on device and a database wrapper that handles online vs offline queries? Do you envision all view code running on device then?

Either way, super cool to see!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: