Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is really part of the intro, but I wonder if someone knows anyway:

My very naive first guess at making a generic linked list would be, of course, a struct with a 'next,' 'prev,' and the void pointer to the payload (What can I say? At some point my brain was damaged by OO languages). I guess their method of instead defining a struct, embedding the linked list into it, and then using macros would probably interact more nicely with the type system.

Are there other benefits, though? I vaguely suspect their system might tend to prod users toward laying out their memory more nicely.



Nobody's mentioned what is IMO the main benefit of an intrusive data-structure: items within the data-structure can belong to multiple containers.

This allows you to locate an item using more than one criteria: for example, you could store blocks of free memory in two intrusive lists, one sorted by size and one by location. This would allow you to first locate eg. the largest memory block, and then quickly locate blocks adjacent to that one in memory.

With non-intrusive data-structures this is not possible, because while you can go from an iterator -> the item pointed to by the iterator, there is no way to efficiently do the reverse. This is possible in an intrusive data-structure because the iterator and the item itself are the same thing.


So when you remove an element from the list, how do you make sure both lists are updated? Are elements even likely to be removed in this use-case?


> So when you remove an element from the list, how do you make sure both lists are updated?

The remove_from_list() function has to be either payload-specific, or it needs a function pointer parameter which is called to handle the payload-specific unlinking. Assuming the lists are doubly-linked, which they typically are in the Linux kernel.


This is one of the purposes of doubly-linking the nodes.

Say your struct is in two lists. To remove it from both, first follow the next/prev pointers of the first list embedded struct to remove it from the first list, and then do the same thing with the second list embedded struct to remove it from the second list.


Yes, a very important point. In the kernel, it is pretty common for objects to need to be on multiple different lists.


This is called "internal" vs. "external" storage, or sometimes an "intrusive linked list". Fewer allocations and better locality of reference for faster access are some of the biggest benefits. There's more discussion here: https://en.wikipedia.org/wiki/Linked_list#Internal_and_exter...


> Are there other benefits, though?

Yes; you're avoiding an extra pointer indirection between list navigation and element access, which is usually going to be a cache miss. Also if you have a pointer to an element of an intrusive list, you can cheaply navigate to the next element; this wouldn't work with your external list scheme (you'd need to maintain a separate list pointer instead).


Not only avoiding the cache miss. It's also avoiding an allocation. The non-intrusive list that the parent described means the list node requires one allocation, and the content another. Intrusive lists mean there's just one allocation required for the actual element - and hooking it into a list comes for free.

If the struct we want to insert is on the stack we don't even need a single allocation. That is for example be useful on wait nodes, where we put some waiter object on the stack, register it in a shared wait queue, and then wait until the object gets signalled => No allocation is required for any waiter.


Yes. I would caution on inserting stack objects into shared lists, though, because it's such an easy footgun if the reference outlives the stack frame.


It's not really worse than heap allocated objects. For them you also need to make sure the pointer in the list does not outlive the actual object.

The only safe structures are heap allocated plus reference counted ones, where inserting something into a list also increments the refcount.


I think you're completely correct, in a pedantic sense. However, heap object lifetimes aren't bound to control flow in quite the same way.


You could allocate a single novel struct containing the structure of interest and any number of list elements that will contain it in a single malloc().

If the lists don’t need to know about each other, each novel struct could be different.

This would work, but it’s a bit exotic, so I’m not recommending it. But it addresses the locality and multiple allocation issues.


isn't that exactly equivalent to an intrusive linked list in memory except harder to implement?


Not quite. With traditional intrusive linked lists, you'll have to modify the payload struct whenever you want to add it to another or different lists; with OP's approach, maybe not (but it depends on the implementation details).


This was one of my first questions in my OS class in college. The answer was that the kernel implementation takes advantage of the cache.


Cache locality and memory consumption.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: