Unless you reserve on fork you're still over committing because after the fork w...

inkyoto · 2025-12-20T03:48:27 1766202507

Also,

> Thread stacks come up because reserving them completely ahead of time would incur large amounts of memory usage. Typically they start small and grow when you touch the guards. This is a form of overcommit.

Ahead of the time memory reservation entails a page entry being allocated in the process’s page catalogue («logical» allocation), and the page «sits» dormant until it is accessed and causes a memory access fault – that is the moment when the physical allocation takes place. So copying reserved but not accessed yet pages has zero effect on the physical memory consumption of the process.

What actually happens to the thread stacks depends on the actual number of active threads. In modern designs, threads are consumed from thread pools that implement some sort of a run queue where the threads sit idle until they get assigned a unit of work. So if a thread is idle, it does not use its own stack thread and, consequently, there is no side effect on the child's COW address space.

Granted, if the child was copied with a large number of active threads, the impact will be very different.

> Even windows dynamically grows stacks like this

Windows employs a distinct process/thread design, making the UNIX concept of a process foreign. Threads are the primary construct in Windows and the kernel is highly optimised for thread management rather than processes. Cygwin has outlined significant challenges in supporting fork(2) semantics on Windows and has extensively documented the associated difficulties. However, I am veering off-topic.

barchar · 2025-12-20T05:42:38 1766209358

I am aware reserving excess memory doesn't commit said memory. But it does reserve memory, which is what we were talking about. The point was that because you can have a lot of threads and restricting reserved stacks to some small value is annoying all systems overcommit stack. Windows initially commits some memory (reserving space in the page file/ram) for each but will dynamically commit more when you touch the guard. This is overcommit. Linux does similarly.

Idle threads do increase the amount of committed stack. Once their stack grows it stays grown, it's not common to unmap the end and shrink the stacks. In a system without overcommit these stacks will contribute to total reserved phys/swap in the child, though ofc the pages will be cow.

> Windows employs a distinct process/thread design, making the UNIX concept of a process foreign. Threads are the primary construct in Windows and the kernel is highly optimised for thread management rather than processes. Cygwin has outlined significant challenges in supporting fork(2) semantics on Windows and has extensively documented the associated difficulties. However, I am veering off-topic.

The nt kernel actually works similarly to Linux w.r.t. processes and threads. Internally they are the same thing. The userspace is what makes process creation slow. Actually thread creation is also much slower than on Linux, but it's better than processes. Defender also contributes to the problems here.

Windows can do cow mappings, fork might even be implementable with undocumented APIs. Exec is essentially impossible though. You can't change the identity of a process like that without changing the PID and handles.

Fun fact: the clone syscall will let you create a new task that both shares VM and keeps the same stack as the parent. Chaos results, but it is fun. You used to be able to share your PID with the parent too, which also caused much destruction.

inkyoto · 2025-12-21T03:51:53 1766289113

> Idle threads do increase the amount of committed stack.

I am not clear on why the stack of an idlying thread would continue to grow. If a previously processed unit of work resulted in large amounts of memory pages backing the thread stack getting committed, then yes, it is not common to unmap the no longer required pages. It is a deliberate trade-off: automatic stack shrink is difficult to do safely and cheaply.

Idle does not actually make stacks grow, put simply.

> The nt kernel actually works similarly to Linux w.r.t. processes and threads.

Respectfully, this is slightly more that entirely incorrect.

Since Linux uses a single kernel abstraction («task_struct») for both processes and threads, it has one schedulable kernel object – «task_struct» – for both what user space calls a process and what user space calls a thread. «Process» is essentially a thread group leader plus a bundle of shared resources. Linux underwent the consolidation of abstractions in a quest to support POSIX threads at the kernel level decades ago.

Since fork(2) is, in fact, clone(2) with a bunch of flags, what you get depends on clone flags: sharing VM, files, FS context, signal handlers, and whether you are in the same thread group (CLONE_THREAD) and that creates a new thread group with its own memory management (but populated using copy-on-write), separate signal disposition context, etc.

Windows has two different kernel objects: a process (EPROCESS) and a thread (ETHREAD/KTHREAD). Threads are the schedulable entities; a process is the container for address space, handle table, security token, job membership, accounting, etc. They are tightly coupled, but not «the same thing».

On Windows, «CreateProcess» is heavier than Linux fork for structural reasons: it builds a new process object, maps an image section, creates the initial thread, sets up the PEB/TEB, initialises the loader path, environment, mitigations, etc. A chunk of that work is kernel-side and a chunk is user-mode (notably the loader and, for Win32, subsystem involvement). Blaming only «userspace» is wrong.

Defender (and third-party AV/EDR) can measurably slow process creation because it tends to inspect images, scripts, and memory patterns around process start, not because of deficiences of the kernel and system calls design.

inkyoto · 2025-12-20T03:33:15 1766201595

> […] because after the fork writes to basically any page in either process will trigger memory commitment.

This is largely not true for most processes. For a child process to start writing into its own data pages en masse, there has to exist a specific code path that causes such behaviour. Processes do not randomly modify their own data space – it is either a bug or a peculiar workload that causes it.

You would have a stronger case if you mentioned, e.g., the JVM, which has a high complexity garbage collector (rather, multiple types of garbage collectors – each with its own behaviour), but the JVM ameliorates the problem by attempting to lock in the entire heap size at startup or bailing if it fails to do so.

In most scenarios, forking a process has a negligible effect on the overall memory consumption in the system.

minitech · 2025-12-20T04:58:40 1766206720

> This is largely not true for most processes.

> In most scenarios, forking a process has a negligible effect on the overall memory consumption in the system.

Yes, that’s what they’re getting at. It’s good overcommitment. It’s still overcommitment, because the OS has no way of knowing whether the process has the kind of rare path you’re talking about for the purposes of memory accounting. They said that disabling overcommit is wasteful, not that fork is wasteful.

barchar · 2025-12-20T05:48:38 1766209718

Yep. If you aren't overcommitting on fork it's quite wasteful, and if you are overcommitting on fork then you've already given up on not having to handle oom conditions after malloc has returned.

inkyoto · 2025-12-21T03:28:23 1766287703

> […] the purposes of memory accounting.

This is a crucial distinction and I agree when the problem is framed this way.

The original statement by another GP, however, was that fork(2) is wasteful (it is not).

In fact, I have mentioned it in a sister thread that the OS does not have a way to know of the kind of behaviour the parent or the child will exhibit after forking[0].

Generally speaking, this is in line with the foundational ethos of the UNIX philosophy where UNIX gives its users a wide array of tools tantamount to shotguns that shoot both forward and backward simultaneously and the responsibility for with the number of deaths and permanent maimings ultimately lies with its users. In comparison, memory management in operating systems that run mainframes is substantially more complex and sophisticated.

[0] In a separate thread, somebody else has mentioned a valid reverse scenario where the child idles by after forking and it is the parent that makes its data pages dirty causing the physical memory consumption to baloon.