> the per-function-call overhead is due to dynamic initialization on first use requirement
Thanks - I didn’t realize this was mandated by the standard as opposed to “permitted” as one possibility (similarly to how eg a constructor of a global variable can be called before main or upon first use or anywhere in-between according to the standard). Updated the post with this point
> The desired efficient GOTTPOFF code sequence is only feasible when the shared object is available at program start, in which case you can guarantee that “you would need the TLS areas of all the shared libraries to be allocated contiguously”
Indeed I didn’t mention -ftls-model=initial-exec originally (I now added it based on reader feedback; it can work when it will work, which for my use case is a toss-up I guess…), but my point is that you could allocate the TLSes contiguously even if dlopen was used, and I describe how you could do it in the post, albeit in a somewhat hand-wavy way. This is totally not how things were done and I presume one reason is that you don’t carve out chunks of the address space for a use case like this as described in my approach - I just think it would be nice if things worked that way.
Actually sounds like it isn't mandated by the standard after all; it's mandated for block thread_locals but not for thread_locals in the global scope:
3.7.2/2 [basic.stc.thread]: A variable with thread storage duration shall be initialized before its first odr-use (3.2) and, if constructed, shall be destroyed on thread exit.
This allows the constructor to be called at any point before the first use, similarly to "normal" globals, though implementations made different tradeoffs in these 2 cases
Thanks - I didn’t realize this was mandated by the standard as opposed to “permitted” as one possibility (similarly to how eg a constructor of a global variable can be called before main or upon first use or anywhere in-between according to the standard). Updated the post with this point
> The desired efficient GOTTPOFF code sequence is only feasible when the shared object is available at program start, in which case you can guarantee that “you would need the TLS areas of all the shared libraries to be allocated contiguously”
Indeed I didn’t mention -ftls-model=initial-exec originally (I now added it based on reader feedback; it can work when it will work, which for my use case is a toss-up I guess…), but my point is that you could allocate the TLSes contiguously even if dlopen was used, and I describe how you could do it in the post, albeit in a somewhat hand-wavy way. This is totally not how things were done and I presume one reason is that you don’t carve out chunks of the address space for a use case like this as described in my approach - I just think it would be nice if things worked that way.