More

pm215 · 2025-12-27T10:57:48 1766833068

I think it's quite hard to talk about norms for reading quantity, because it varies so much between people. There are a lot of people (more than half the population) who read basically no books in a year, and a tiny slice who read a huge number: so your intuitive take on what's "normal" is going to depend a lot on whether your social circle happens to have voracious readers in it. I suppose you can statistically determine some point in between as the "norm" but I'm not sure that point would reflect many people's experience...

pm215 · 2025-12-24T09:39:48 1766569188

I suspect that a 1970s mower for UK garden use was typically not very beefy. Wikipedia thinks Larkin had some kind of Victa.

strken · 2025-12-24T12:35:08 1766579708

We grew up with an '80s Victa, but it was one of the super 600 slashers with a newer 5hp engine. He probably didn't have a slasher, but the rest of their lineup used similar engines and weren't especially underpowered.

If his model was anything like ours, a hedgehog could probably crawl between the blade disc (not the blades but the thing they're attached to) and the chassis and get itself wedged in there.

potato3732842 · 2025-12-24T11:47:27 1766576847

Probably a reel mower. A wet fart will clog a reel mower. They were real popular back in the day.

pm215 · 2025-12-23T16:30:08 1766507408

"suspiciously close" isn't close enough, unfortunately, though it might make the task less work:

If your uname output, compiler architecture ifdefs, etc, don't match the existing architecture then basically every program that does per architecture specialisation will need updating, even if fairly trivially so.

If you're not building and running identical binaries, then anybody who distributes binaries needs to be persuaded that it's worthwhile to get hold of build machines and devote archive space and maintenance time to your new architecture.

There may be political issues where neither the owners of the old architecture nor the owners of the new clone want to come out and admit that it's very similar. This may result in projects treating it as "genuinely new architecture" rather than "variant of an existing one", which is more work. (There are also technical concerns about future divergence that might argue for "not just a variant".)

If you have to have the code structure of a complete new architecture then this can also trigger more work where the old arch code got to get away with legacy practices and APIs but the newcomer is expected to reach any project standards for new code, so "copy, paste, rename" is insufficient. Sometimes this imposes constraints that make more work elsewhere: for instance a new architecture in Linux is expected to follow a modern syscall numbering scheme and set of syscalls, so it won't have a userspace ABI that's compatible with the arch it is cloning.

If the architecture you're cloning is a "declining" architecture now mostly in legacy setups, then modern projects you care about for your new architecture might not have good or any support for it. (If you cloned sh4 you won't have easy rust support, for example.)

Overall, getting a new architecture from "we have a spec" to comprehensive open source ecosystem support is a heavy lift, and clone and copy doesn't get you out of all of it. (Look back at how long it took 32 bit Arm and then 64 bit Arm and now how riscv is following similar paths. These have all been years long efforts with a very long tail.)

pm215 · 2025-12-23T11:55:48 1766490948

I think there's two parts to this:

1) these historical source code releases really are largely historical interest only. The original programs had constraints of memory and cpu speed that no modern use case does; the set of use cases for any particular task today is very different; what users expect and will tolerate in UI has shifted; available programming languages and tooling today are much better than the pragmatic options of decades past. If you were trying to build a Unix clone today there is no way you would want to start with the historical release of sixth edition. Even xv6 is only "inspired by" it, and gets away with that because of its teaching focus. Similarly if you wanted to build some kind of "streamlined lightweight photoshop-alike" then starting from scratch would be more sensible than starting with somebody else's legacy codebase.

2) In this specific case the licence agreement explicitly forbids basically any kind of "running with it" -- you cannot distribute any derivative work. So it's not surprising that nobody has done that.

I think Doom and similar old games are one of the few counterexamples, where people find value in being able to run the specific artefact on new platforms.

pm215 · 2025-12-22T16:00:13 1766419213

I'm curious about whether there are well coded AI scrapers that have logic for "aha, this is a git forge, git clone it instead of scraping, and git fetch on a rescrape". Why are there apparently so many naive (but still coded to be massively parallel and botnet like, which is not naive in that aspect) crawlers out there?

ncruces · 2025-12-23T01:33:31 1766453611

If they're handling it as “website, don't care” (because they're training on everything online) they won't know.

If they're treating it specifically on “code forge” (because they're after coding use cases), there's lots of interesting information that you won't get by just cloning a repo.

It's not just the current state of the repo, or all commits (and their messages). It's the initial issue (and discussion) that lead to a pull request (and review comments) that eventually gets squashed into a single commit.

The way you code with an agent is a lot more similar to the: issue, comments, change, review, refinement sequence; that you get by slurping the website.

ffsm8 · 2025-12-22T17:21:28 1766424088

I'm not an industry insider and not the source of this fact, but it's been previously stated that traffic costs to fetch the current data for each training run is cheaper then caching it in any way locally - wherever it's a git repo, static sites or any other content available through http

pm215 · 2025-12-22T17:34:56 1766424896

This seems nuts and suggests maybe the people selling AI scrapers their bandwidth could get away with charging rather more than they do :)

telliott1984 · 2025-12-22T18:26:50 1766428010

I'd see this as coming down to incentive. If you can scrape naively and it's cheap, what's the benefit to you in doing something more efficient for git forge? How many other edge cases are there where you could potentially save a little compute/bandwidth, but need to implement a whole other set of logic?

Unfortunately, this kind of scraping seems to inconvenience the host way more than the scraper.

Another tangent: there probably are better behaved scrapers, we just don't notice them as much.

the_biot · 2025-12-22T17:43:54 1766425434

True, and it doesn't get mentioned enough. These supposedly world-changing advanced tech companies sure look sloppy as hell from here. There is no need for any of this scraping.

LtWorf · 2025-12-22T23:26:52 1766446012

I guess they're vibe coded :D

pm215 · 2025-12-19T20:48:02 1766177282

I believe (per the stuff at the bottom of https://www.kernel.org/doc/Documentation/vm/overcommit-accou... ) that the kernel does the accounting of how much memory the new child process needs and will fail the fork() if there isn't enough. All the COW pages should be in the "shared anonymous" category so get counted once per user (i.e. once for the parent process, once for the child), ensuring that the COW copy can't fail if the fork succeeded.

pm215 · 2025-12-19T20:43:30 1766177010

Because the point of forbidding overcommit is to ensure that the only time you can discover you're out of memory is when you make a syscall that tries (explicitly or implicitly) to allocate more memory. If you don't account the COW pages to both the parent and the child process, you have a situation where you can discover the out of memory condition when the process tries to dirty the RAM and there's no page available to do that with...

inkyoto · 2025-12-20T03:24:44 1766201084

The described scenario (and, consequently, a concern) is mostly a philosophical question or a real concern for a very specific workload.

Memory allocation is a highly non-deterministic process which highly depends on the code path, and it is generally impossible to predict how the child will handle its own memory space – it can be little or it can be more (relatively to the parent), and it is usually somewhere in the middle. Most daemons, for example, consuming next to zero extra memory after forking.

The Ruby garbage collector «mark-and-sweep» (old versions of Ruby – 1.8 and 1.9) and Python reference counting (the Instagram case) bugs are prime examples of pathological cases when a child would walk over its data pages, making each dirty and causing a system collapse, but the bugs have been fixed or workarounds have been applied. Honourable mention goes to Redis in a situation when THP (transparent huge pages) are enabled.

No heuristics exist out there that would turn memory allocation into a deterministic process.

pm215 · 2025-12-16T16:56:35 1765904195

You do relatively commonly see wastewater piping on the outside of a house in the UK, especially older stock (soil stack from the toilet, waste pipe from sink or bath running into it). This is fine in the UK climate where a normally empty pipe doesn't need insulation. I hear that it won't work in places that get extreme low winter temperatures, but the UK doesn't have winters that cold.

You don't see them on new builds, I think, probably because the pipe going from inside to outside would reduce insulation effectiveness.

Scoundreller · 2025-12-16T21:34:11 1765920851

Yeah it makes sense for buildings where plumbing was retrofitted.

Otherwise people try to retrofit narrow drain pipes in the walls which are prone to clogging or give you poor flushing performance. Or outside big enough pipes outside interior walls where you get to hear every flush/shower unless you build a box around that. Easier to just run it outside if you can configure your bathrooms that way.

pm215 · 2025-12-10T17:02:36 1765386156

The OED agrees about the Dutch idea, giving the etymology as:

"early modern Dutch maelstrom (now maalstroom) whirlpool < malen to grind, to whirl round (compare meal n.1) + stroom stream n"

and also thinks Dutch is the origin, with Swedish/Danish etc taking it from Dutch too:

"The use of maelstrom as a proper name (also in French) seems to come from Dutch maps, e.g. that in Mercator's Atlas (1595). There is little doubt that the word is native to Dutch (compare synonymous German regional (Low German) Maling). It is true that it is found in all the modern Scandinavian languages as a common noun, but in them it is purely literary, and likely to have been adopted from Dutch."

pm215 · 2025-11-22T20:38:40 1763843920

And I would guess that most of the kernel devs who are "working for free" are doing the stuff they personally enjoy and find satisfaction in working on, because it's a hobby -- so many of them are probably not interested in fixing random bugs for cash either.