Fat pointers in C using libcello

eqvinox · on April 29, 2020

In my personal experience, "a bit more than a pointer" works best as a pair of (start, end) pointers (where "end" points to just beyond the last element.) The most obvious reasons for this are:

- slices become a total non-issue since a pair of (start, end) already is a slice and you can just move start and end.

- comparing against an end pointer is generally easier than adding up a length value first, particularly if you're slicing at the same time.

- the end pointer value is independent of the array element type, so if you e.g. cast to uint8_t * (which arguably you shouldn't in most cases) it stays exactly the same. If you store a count you need to adjust a multiplier. If you store a byte length, you need to do a lot of divides or casts to deal with pointer arithmetics.

Also, this is a huge red flag to me:

https://github.com/orangeduck/Cello/blob/master/include/Cell...

  #define is ==
  #define isnt !=
  #define not !
  #define and &&
  #define or ||
  #define in ,

P.S.: This also is a "try to invent a new programming language without inventing a new programming language" thing. Have your cake and eat it... either it's C or it isn't, and this library is leaving the space of "normal" C.

shawa_a_a · on April 29, 2020

I don’t think you’re too far off with ‘leaving the space of normal C’, but I think it may help to see the context from which the author was coming when writing Cello [1] and evaluating it in that context.

It’s been a while since I watched the talk but I believe his intention was to do just that, to push the bounds of what could be done in a header file purely for the fun of it. The second half of the talk specifically addresses the “why are you doing this?” in quite a charming way.

1: https://youtu.be/bVxfwsgO00o

dllu · on April 29, 2020

huh. Interestingly, for "and", "or" and "not", iso646.h already defines C alternative tokens: https://en.wikipedia.org/wiki/C_alternative_tokens

eqvinox · on April 29, 2020

I know :) ... but have you seen anyone actually use those? I'd be curious to get a pointer or two.

saagarjha · on April 29, 2020

Twice: once for a CTF challenge that stripped out certain symbols, and one other time when I set up an emulator with a sketchy key map and didn’t want to bother figuring out what those symbols were mapped to.

ritter2a · on April 29, 2020

I've only seen wrong use: CS students working on a project in C++, using "and" and "or" as identifiers (which their local MSVC compiler apparently was cool with) and then being surprised that the course's testing environment (using gcc) found some errors.

dllu · on April 29, 2020

I have seen people coming to C++ from Python use "and", "or", and "not" in their if-statements :)

eqvinox · on April 29, 2020

Hah - I kid you not, I have the reverse problem! I work a lot with both C and Python (embedded interpreter & CPython extensions) and whenever I switch from a stretch of C coding to doing some Python, I catch myself typing "&&" and "||" instead of "and" and "or" :D

(C is my "native" language, so it doesn't really happen the other way around, but that's just me)

But honestly, I'm pretty sure we can call iso646.h "fringe" ;)

int_19h · on April 29, 2020

Curiously, & and | are occasionally useful in Python. They do work properly with booleans, so the difference compared to "and" and "or" is that you don't get short circuiting, which in some very rare cases you might not want. ^ is less useful because it's almost the same as != (except stricter about types) - but it documents the intent more clearly.

aloisdg · on April 29, 2020

In programming languages, does we have something akin to a mother-tongue?

macintux · on April 29, 2020

There’s a fair bit of debate about the impact of learning specific programming styles as your first language. Quite a few people would argue that learning functional programming first is a benefit to understanding the concept of state and how to write clean code, even if you end up using something less pure.

Certainly, speaking as someone brought up on imperative and later OO code, it took a fair bit of unlearning to understand functional programming, and I still have no clue about type theory.

jcelerier · on April 29, 2020

what's the problem ? it's legal C++ (https://en.cppreference.com/w/cpp/language/operator_alternat...) and imho more readable than && / ||

eqvinox · on April 29, 2020

We're talking about C here, not C++. The keywords don't exist in C unless you #include "iso646.h"

pjmlp · on April 29, 2020

So what? Unless you include std...h you also need to use _Bool, _Complex and so forth, hardly much different.

int_19h · on April 29, 2020

But the repo doesn't include them - it defines them itself.

Technically, I believe this is actually UB, just like any other #define for a keyword or an identifier from the standard library.

pjmlp · on April 30, 2020

I don't follow.

When I do #include <stdbool.h> it also brings the #define bool _Bool, a keyword from the standard library.

How is it UB?

int_19h · on May 2, 2020

It's not UB to #include standard headers that do such things. It is UB to #undef or redefine them yourself, though. And it looks like <stdbool.h> gets an explicit exemption, at least in C11:

"Notwithstanding the provisions of 7.1.3, a program may undefine and perhaps then redeﬁne the macros bool, true, and false"

I guess it's because defining these was so common even before C99. But there's no similar verbiage for <iso646.h>, so...

jcelerier · on April 29, 2020

let me quote you the comment I was replying to :

> I have seen people coming to C++ from Python use "and", "or", and "not" in their if-statements :)

eqvinox · on April 29, 2020

Yet curiously absent from that quoted comment is anything about it being a problem, which you nonetheless ask about. I wonder if more context or a more well-intentioned reading is the solution? ;)

jcelerier · on April 29, 2020

I read and reread it and I honestly don't understand how it is possible to come up with an interpretation where that is not seen as an issue by the author

aaron-lebo · on April 29, 2020

I'll admit I've never written more than small programs in C, but the criticism that this "isn't C" isn't a fair criticism to me. He's not doing anything more than any other library can do, were he to write a compiler that mapped directly to really boring C, would it be more or less C than this? I don't feel like those questions are useful. We need experiments like this, and for me personally, cello was a revelation when it was posted here years ago. There are no rules that say you can't do it.

I know he's doing a little hackiness by placing the size before the start of the object, but they also take that approach here (https://www.piumarta.com/software/cola/objmodel2.pdf) so maybe it's not that uncommon. How would you do the dual pointer setup? Is there any overhead from that, or is it small enough to not worry about?

eqvinox · on April 29, 2020

The "isn't C" argument is more about the library as a whole, not specifically about the fat pointer suggestions. (Please look at the github repo, IMHO it's obvious.)

Placing a length "before" a pointer is perfectly fine on a _technical_ level. It's also how glibc's malloc works, it has its own data before any allocation it returns. However, hackiness is not the question here - it's whether it's the "best" approach. I simply believe, based on my own experiences, that twin pointers cover/win out in a much larger subset of applicable scenarios.

As for how to implement it - declare a struct with 2 pointers in it. Or just pass 2 pointers around.

artemonster · on April 29, 2020

Do you, by chance, know where else object model of Piumarta is used? I think the Per6 VM "Potion" used it too as a base, but I am unsure

aaron-lebo · on April 29, 2020

Unfortunately, no, your info about Perl 6 is news to me. Only ran into it a few months ago, tried it in Python and Nim, but haven't seen it in the wild. It's very cool though.

artemonster · on April 29, 2020

http://perl11.org/potion/

gumby · on April 29, 2020

> In my personal experience, "a bit more than a pointer" works best as a pair of (start, end) pointers (where "end" points to just beyond the last element.)

There’s a PARC paper from the Cedar team benchmarking base-and bounds vs marker-terminated (e.g. null terminated) strings and base-and-bounds won hands down on a variety of use cases. Won on speed, not just the safety grounds. In those days some people actually worried about the space taken up by the extra bounds variable.

pjmlp · on April 29, 2020

While people did worry about the space taken up by the bounds checking information, systems 10 years older than the PDP-11 had enough hardware resources to support high level systems programming languages with bounds checking data structures, let alone the beefy PDP-11 (by comparison with those older models).

gumby · on April 29, 2020

As I know well as a KA-10 MACLISP programmer!

But most people these days have a hard time realizing what it was like to program a machine with only a few K of memory. PL/1 (the Multics implementation language) had bounded strings. C did not as it was believed the PDP-7 couldn't afford it. Of course by today's standards the GE 645 is an absurdly tiny machine.

pjmlp · on April 30, 2020

Yeah, that was my point, this knowledge is getting lost, only to be re-discovered by those that love to dig into computer history.

Just to think that what is now on a ESP32 die took over a complete desk at the high school computer club and was still weaker than what ESP32 is capable of, yet that mentality also persists for what 80's hardware was already capable of.

kortex · on April 29, 2020

This is how Go does slices, though they are 3 fields:

    type slice struct {
        zerothElement *type
        len int
        cap int
    }

Though I suspect under the hood the len and cap are ordered first.

If it's good enough for C greybeard Ken Thompson and unix hacker Rob Pike, its good enough for me.

In fact I've looked for a port of go-style slices to C and haven't found one. Maybe people think sds is good enough?

LukeShu · on April 29, 2020

> Though I suspect under the hood the len and cap are ordered first.

A slice is stored in memory as a `reflect.SliceHeader` https://golang.org/pkg/reflect/#SliceHeader ; the pointer does come first.

jgbaldwinbrown · on April 29, 2020

I actually wrote a (sort of) Go-style slice library. It's a little heavier than Go slices because it allows for dynamic array resizing and tracking the parentage of slices:

https://github.com/jgbaldwinbrown/slice

It's just a proof-of-concept, though, and would need a lot of work to be used in anything serious.

int_19h · on April 29, 2020

Go has a rather idiosyncratic take on arrays and how they're used, which is reflected in its slices. I can't think of any other language or framework that did it this way.

saati · on April 30, 2020

Rust Vec has the same representation.

hannibalhorn · on April 29, 2020

Start and end pointers is the approach used by the C++ STL, too.

vardump · on April 29, 2020

> Also, this is a huge red flag to me:

Puke! Who would ever want to create that kind of macro abomination.

dgellow · on April 29, 2020

Bjarne Stroustrup! https://en.wikipedia.org/wiki/C_alternative_tokens

And I'm not even joking :)

arcticbull · on April 29, 2020

Man Bjarne Stroustrup is like the guy in college who flatly refused to conform in any way whatsoever no matter how good or bad it would have been for 'em. It's more of an oppositional personality disorder ;) That kid grew up and invented, because of course he did, C++.

int_19h · on April 29, 2020

It might have something to do with various symbols that C uses not being convenient to type in many non-US keyboard layouts. In the standard Danish one, ~|^ all require the use of AltGr.

arcticbull · on April 30, 2020

C++ isn't exactly short on symbols haha

vardump · on April 29, 2020

Yeah, damn, just found out myself. I thought I knew C inside out, but I guess there are still bits and pieces you just don't encounter very often.

That said, my opinion about those macros is unchanged.

garaetjjte · on April 29, 2020

You can do worse: http://oldhome.schmorp.de/marc/bournegol.html

vardump · on April 29, 2020

Pretty curious: someone seriously disagrees this header should not exist? I'd really love to hear the arguments!

joejev · on April 29, 2020

Interesting idea, but this implementation has UB:

    typedef void* var;

    struct Header {
      var type;
    };

    // ...

    #define alloc_stack(T) header_init( \
      (char[sizeof(struct Header) + sizeof(struct T)]){0}, T)

    var header_init(var head, var type) {
      struct Header* self = head;
      self->type = type;
      return ((char*)self) + sizeof(struct Header);
    }

The section "struct Header* self = head" is UB. The alignement requirement of the local char array is 1 but the alignment requirement of struct Header is that of void* which is probably 8.

tomp · on April 29, 2020

That's just what I was wondering, are "magic" libraries like this still safe to use considering modern compiler's UB shenanigans?

asveikau · on April 29, 2020

Not only that but you have a pointer to a parameter returned back and used outside its scope ...

nitrogen · on April 29, 2020

It's not a pointer to a parameter, it is just the parameter itself.

var is a typedef for void* and no & appears in the function.

asveikau · on April 29, 2020

> no & appears in the function.

It's an array, so you don't need & to take its address, it decays into a pointer without &.

Imagine:

    char buf[sizeof(struct Header) + sizeof(struct T)];
    char *p = buf;

Then take away the names so that you are effectively passing p as a parameter... Then returning p.

As in, an anonymous temporary being given to the function, and the function returns its address back.

It's assuming that this temporary parameter buffer will exist after it is used and the function has returned. I'm not sure what the standard says for that but it is crazy sketchy. [Edit: Googling around, it seems like maybe this is illegal in C99 but possibly legal in C11? Or that C11 changed the rules for this. Does not seem like a great thing to rely upon.]

eMSF · on April 29, 2020

There is no issue here (except the one highlighted by joejev).

Many standard library functions return a pointer which they got as a parameter (or another pointer offset from it, as is the case here). The compound literal is no more "temporary" than a variable that was introduced right before the function call. Lifetimes of compound literals are specified in section 6.5.2.5 of both C99 and C11.

  func((int[256]){ 0 });
  // is mostly equivalent to 
  int __a__[256] = { 0 };
  func(__a__);

asveikau · on April 29, 2020

> Many standard library functions return a pointer which they got as a parameter

Obviously. But this does not extend the lifetime of the buffer they are passed. Namely you can't use this as a technique to extend the life of automatic storage falling out of scope.

> Lifetimes of compound literals are specified in section 6.5.2.5 of both C99 and C11.

This is what I was missing. So it is valid by the standard. Which is good if you have looked it up. It remains not obvious when reading source without a copy of the standard on hand, or prior knowledge of that section. Passing an expression of that sort and keeping a pointer to it visually looks like the intent is to retain a pointer of more limited scope. Intuitively it would make just as much sense if the lifetime were shorter. If you are seeking clarity of intent this is not a great thing to rely upon.

alkonaut · on April 29, 2020

Perhaps a stupid questoon: Why isn't a vector type similar to { ptr, count } a normal thing to pass around in C? It's what you reach for in any other language, why did it become idiomatic to pass pointers and lengths separately in C?

A C standard library has a header file for complex math but it doesn't define a simple fixed size array struct? Why is that? Is it because they become pointless when there is no generics to deal with the stride?

matheusmoreira · on April 29, 2020

> why did it become idiomatic to pass pointers and lengths separately in C?

I've read that it's because there used to be binary interface issues with structures. They can be returned from functions and passed as parameters but it isn't immediately clear how that happens: is it on the stack, in one register or in several registers? Even today there are compiler options that affect the generated code in those cases:

  -fpcc-struct-return

  Return “short” struct and union values in memory like
  longer ones, rather than in registers.

  -freg-struct-return

  Return struct and union values in registers when possible.

https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html#ind...

https://gcc.gnu.org/onlinedocs/gcc/Incompatibilities.html#in...

andrepd · on April 29, 2020

>They can be returned from functions and passed as parameters but it isn't immediately clear how that happens: is it on the stack, in one register or in several registers?

Why does it have to be clear? It can be unspecified and the compiler will do what it thinks is best given the struct, e.g. return `struct {int x,y};` in registers, return `struct int[80] x}` as pointer to memory or write in-place to the caller's stack, via RVO.

matheusmoreira · on April 29, 2020

> Why does it have to be clear?

Because it's part of the binary interface. Changing the binary interface prevents existing programs from interoperating. Everything breaks until the software is recompiled and that can be a major nuisance at best and impossible in the worst case.

Simple, well-defined and stable binary interfaces are a major reason why C is still widely used. Uncertainty in this area is never a good thing so people will actively avoid language features that introduce it. Looks like enums aren't favored for similar reasons: what is the underlying type?

saagarjha · on April 29, 2020

Because then code compiled separately can’t interoperate.

asveikau · on April 29, 2020

It doesn't have to be a return value though. You could pass pointers to it as parameters.

To answer the question though, a number of people do define structures containing a buffer and a length (and potentially capacity), there just isn't such a structure standardized so everybody who wants to do this has to bring their own.

Some examples from Unix: iovec, sendmsg/recvmsg. Surely there are others I'm just not thinking of right now.

In the Windows world you have UNICODE_STRING and similar structures. SChannel has "PSecBufferDesc". Again, surely there are others.

And prominent libraries might also have their own.

int_19h · on April 29, 2020

Passing struct arguments by value is also complicated in some ABIs. The most reliable way is to pass a pointer to struct, but with slices that means double indirection.

I don't think ABI is a concern these days, though, even if things were different 30 years ago. The specs that we have today do cover how to pass and return structs in the standard manner.

kps · on April 29, 2020

Originally C didn't have the ability to pass or return structures. That was added in the V7 period, after some habits were well established.

glxxyz · on April 29, 2020

Because C is a thin layer above assembly language. A pointer fits in one register, { ptr, count } would require two registers. Also if the count is being passed around, it should surely be checked when doing ptr + i, which slows things down further and is unnecessary if the caller knows what they are doing. If you start trying to make C safe and idiot-proof you also make it slower.

alkonaut · on April 29, 2020

Isn’t ptr+count always two registers if passed as two parameters too? It seems like just a syntactic difference.

A ptr to an array of unknown length can’t be used without the length being passed around next to it whereever it’s passed.

You cant deref ptr+i without knowing that it’s under length etc.

Two bits of data that belong together seems like they would be convenient to pass (or return!) as one argument or return value.

It’s especially terrible with methods that e.g take two lists and return a third. That should be two arguments and a return value, not six arguments.

pjmlp · on April 29, 2020

Yet plenty of Assembly languages have opcodes for bounded checking memory accesses, some of them in computer systems developed in the early 60's, 10 years before C was born.

adrianmonk · on April 29, 2020

> Why isn't a vector type similar to { ptr, count } a normal thing to pass around in C?

For one thing, as I recall in the original K&R version of C (before ANSI C89), the language didn't support passing a struct as a function argument or return value.

That means if you did make a struct, then every time you wanted to pass one of these pairs around, you'd have to pass a pointer to the struct, and you'd have to dereference that on every use. Which is arguably just as cumbersome as just passing two arguments, at least in terms of how much code you have to type. Plus it was probably slower.

From there it's no surprise if using separate arguments becomes the normal, idiomatic way to do it.

alkonaut · on April 29, 2020

That’s a good reason. And a weird restriction (with 2000’s glasses on designing an ergonomic language).

int_19h · on April 29, 2020

Given the circumstances of C's birth, it was, in many ways, optimized to be ergonomic for the compiler writer. If you don't ever need to return structs, then you can always just use the same register to hold the return value. Similarly, if there are no struct-typed arguments, then each argument is either a register, or a single machine word on the stack (C did other things for the latter - e.g. consider the conversion rules for functions called without being declared).

pjmlp · on April 30, 2020

Quite to be expected given its ancestry, BCPL was originally designed only to bootstrap CPL.

Matthias247 · on April 29, 2020

It's indeed a weird thing. It should have been an easy addition to the standard library. But instead people either pass around pointer/length pairs all the time - or even worse: They rely on null-terminated strings / arrays.

I had discussions with people who claimed that null terminated strings are the only idiomatic thing to do in C - because that is how C does strings. They assumed that since the standard library only provided methods which acted on those kinds of strings it was a preferred way to do things. Even though that is a lot less efficient than the string/array types that other languages use as defaults.

pjc50 · on April 29, 2020

On the "weird pointer solutions" tangent, there's the ARM authenticated pointers: https://lwn.net/Articles/718888/

Given that years ago we added a mandatory piece of hardware to most systems to implement virtual memory, I'm now starting to wonder what security and/or performance benefits could be achieved by delegating memory allocation to (or through) hardware.

pjmlp · on April 29, 2020

That is quite easy to validate.

Since years, Oracle has shipped SPARC Solaris with ADI turned on.

Since iPhone X iOS makes use of memory tagging for pointers.

Starting with Android 11, hardware memory tagging is a required feature on ARM platforms on the CPUs that support it, while on other CPUs the kernel will randomly attach GWP-ASan to user processes and is enabled by default on all system processes during the ongoing preview releases.

arethuza · on April 29, 2020

That's a fascinating approach - how widely used is it? Would love to know whether it causes any problems for 'existing' code.

saagarjha · on April 29, 2020

Every new (post-2018) iPhone ships with this. iOS developers can build code for the architecture but I believe Apple currently strips it out before distribution, so its use is limited to the OS for now. I would assume at some point they’ll flip the switch to allow it; until then developers can use the toolchain to test if their code still works (generally it does, but messing with function pointers in ways unspecified by the standard can occasionally cause problems). ‘pjmlp is fairly interested in this topic so they might be able to share some more examples of it being used if they drop by the thread.

pjmlp · on April 29, 2020

Besides iOS, Solaris SPARC, and Android 11 onwards.

dgellow · on April 29, 2020

The concept of "fat pointer" the article is about has been described by Walter Bright (D creator) as "C's Biggest Mistake": https://www.drdobbs.com/architecture-and-design/cs-biggest-m.... It's also an interesting read.

The summary version (from Walter Bright's article) is:

> C can still be fixed. All it needs is a little new syntax:

> void foo(char a[..])

> meaning an array is passed as a so-called "fat pointer", i.e. a pair consisting of a pointer to the start of the array, and a size_t of the array dimension.

tom_mellior · on April 29, 2020

It's worth noting that fat pointers didn't originate with Walter Bright or that 2009 article. The oldest C-with-fat-pointers I can think of off the top of my head is CCured from 2002: https://people.eecs.berkeley.edu/~necula/Papers/ccured_popl0...

The paper mentions fat pointers in passing, not putting the term in quotes, not defining it, and not giving a citation -- which makes it clear that the term was already well established at the time.

caspper69 · on April 29, 2020

Fat pointers were part of Pascal (and derivatives), although I'm sure the concept has existed in one form or another going back to the beginning.

edit: Pascal pointers were just a location and size, however, not a slice-type fat pointer; however, I have always heard of any pointer containing more information than a memory address referred to as a fat pointer (except tagged pointers). YMMV.

int_19h · on April 29, 2020

I don't think Pascal pointers had to be implemented as location+size, since most operations that required checking the size were undefined or implementation-defined anyway. Some implementations might have used them to provide runtime checks, but it was certainly not the case in Turbo Pascal, for example.

dgellow · on April 29, 2020

Sure, I never said that Walter Bright created the concept. What I’m saying is that the link from Cello that I posted on HN is actually using that definition from Walter Bright’s article.

They even link to it.

clarry · on April 29, 2020

They're clearly using a different definition. In Cello, you are not passing a fat pointer ("a pair consisting of a pointer to the start of the array, and a size_t"), but a totally standard pointer that points to the second member of a packed struct containing a size_t and an array.

The difference is massive. Bright's idea would allow me to use existing interfaces with fixed structure: as long as I know the declared length of an array, I could pass a fat pointer to a function that accepts one. The Cello approach would require me to modify the interface to accommodate the new size header that is packed with the array. ABI break, not compatible with existing C code and libraries.

One could justify the name of fat pointer, the other is really just arrays with headers.

tom_mellior · on April 29, 2020

> The Cello approach would require me to modify the interface to accommodate the new size header that is packed with the array. ABI break, not compatible with existing C code and libraries.

Isn't the entire point of the article that this is not the case?

My understanding is that a Cello array is laid out like this:

    <metadata header>.<actual payload data in standard C format>
                     ^
                     |
                     this pointer is what you pass to existing C code

The existing C code gets a pointer to data exactly as it expects. It does not get a pointer to the metadata. It could access it by subtracting an offset from the pointer it's given, but it will not do that since it does not expect anything meaningful to be there. "This pointer is fully compatible with normal pointers".

clarry · on April 30, 2020

No. The problem is that you have to have that metadata there, where ever it is that you want to point. What if it's not there? Oh yeah, doesn't work. It's just a normal pointer, pointing to an array and header.

If I pick up some library (or system call interface..), and it uses this structure

  struct foo {
    struct bar a;
    char b[64];
  };

The metadata field that cello wants is not there. I cannot use cello's "fat pointers" because they are not fat pointers at all. I would have to modify the struct to have array-with-length-header (which is what cello calls a fat pointer.. talk about confusing pointers and arrays!) , which might be impossible if it's someone else's binary interface that I am using.

If I had actual fat pointers, then the size would be simply passed together with the pointer and I wouldn't have to have modify structures to accommodate an extra field. Of course, I can't pass these pointers to functions that expect normal pointers, but it wouldn't make sense to do that anyway, and if implemented right, doing that would require me to misdeclare a function. You would pass normal pointers to old C code that expects and deals with normal pointers.

tom_mellior · on April 30, 2020

You can pass Cello pointers to old C code that expects and deals with normal pointers.

It's true that the internal array inside a struct cannot magically become a Cello fat pointer. So yes, you cannot pass a pointer to that array into a Cello function that expects a fat pointer, but that's the opposite of "pass normal pointers to old C code".

clarry · on April 29, 2020

Is this blog post confused or am I confused? It keeps talking about fat pointers but the description looks much more like "arrays with their length stored before their first element," which is a massive difference.

dataflow · on April 29, 2020

I think the latter. Fat pointers were supposed to be 2 pointers wide, not 1...

eqvinox · on April 29, 2020

It's just using "fat pointer" to refer to the concept of passing around a pointer with extra information concerning the data it points to. I agree that generally people would expect "fat pointer" to imply a larger pointer itself, but I don't think the label is misused egregiously enough to warrant picking at this.

clarry · on April 29, 2020

> It's just using "fat pointer" to refer to the concept of passing around a pointer with extra information concerning the data it points to.

Is it actually? Here's a quote: "The trick is to place the value representing the number of items in the array in memory just before the pointer we actually pass to functions. This pointer is fully compatible with normal pointers,"

So suppose I pass this "fully compatible with normal pointers" pointer to a function.. and it ends up being stored in a register. Now where is this location "just before the pointer"? In the previous register?

I don't think this post is describing fat pointers at all. I think it is describing arrays with a prepended length header. There is no data in or before the pointer, there is metadata before the pointee assuming you're pointing at something with metadata prepended to it.. No extra information is passed with a pointer, it's just assumed to be there at the pointee. Call it a fat pointee if you want a fancy name; at least that acknowledges the onus is on the pointee to have the right metadata in the right place. There's nothing in the pointer here.

saagarjha · on April 29, 2020

> Is it actually?

Yes, the extra data being the length that is located in memory before the pointer.

> So suppose I pass this "fully compatible with normal pointers" pointer to a function.. and it ends up being stored in a register. Now where is this location "just before the pointer"? In the previous register?

You’re at the wrong level of indirection; the header in in the memory before what the pointer points to, as in p - 1 rather &p - 1.

> I think it is describing arrays with a prepended length header.

Correct.

> Call it a fat pointee if you want a fancy name; at least that acknowledges the onus is on the pointee to have the right metadata in the right place.

I would prefer that this was not called a “fat pointer”, but the claim made above is that a pointer with any implicit data associated with it is “fat”.

> There's nothing in the pointer here.

No data, but there is an implicit guarantee that it points to the data portion of a length-prepended array.

clarry · on April 29, 2020

> Yes, the extra data being the length that is located in memory before the pointer.

> You’re at the wrong level of indirection; the header in in the memory before what the pointer points to, as in p - 1 rather &p - 1.

You are contradicting yourself. &p - 1 is before the pointer. p - 1 is before the pointee.

> I would prefer that this was not called a “fat pointer”, but the claim made above is that a pointer with any implicit data associated with it is “fat”.

> No data, but there is an implicit guarantee that it points to the data portion of a length-prepended array.

I think that's a borderline useless definition. In my C, there's an implicit guarantee that any pointer, in a context where it may be dereferenced, points to a valid object (as long as it's not NULL and there's no programming error causing it to point at nothing well defined). Usually it points at the start of a struct, sometimes it points at list node (or whatever) embedded in a struct and I might have to work my way back with offsetof. Either virtually all of my pointers are fat or none of them are. I go by that none of them are, because whatever data I have (implicit or explicit) is not a property of the pointers I use but of the data I point them to.

dataflow · on April 29, 2020

I'd be (just) a little bit more inclined to agree with you if merely the representation was different. But the thing is, so much of the utility of fat pointers is that a fat pointer can refer to any portion of memory. Even a slice of a bigger allocation. The entire idea is that it can denote the size of what it points to, but be still used like a slim pointer. This thing can't pull off the same tricks. Calling it a fat pointer almost an insult to fat pointers. It's kinda like buying a motorcycle and calling it a "car" just because it refers to the concept of transportation with wheels, running on gasoline.

gumby · on April 29, 2020

I understand their desire to use a library, but there's a faster and safer way to do this that's more C-like if you have access to the compiler:

Just locate anything declared as an array in a particular linker section so the pointer manipulation can be done with two (or one if it's at the top of memory) comparison, possibly even to a constant.

If you do this you can even forbid pointer arithmetic except in actual []-declared memory, and can do transparent bounds checking (&array-1 can hold the array length or, possibly faster, the address of the location after the end of the array).

An advantage of this over the library route is you can prevent pointer/array punning but otherwise allow any C program to work fine. And apart from a few corner cases (there are legit non-array uses of pointer arithmetic, though very few) and noncompliant program can be changed to use [] and still work perfectly fine without this option being used.

wyldfire · on April 29, 2020

"This proposal wasn't accepted into the C standard..."

Walter often shows up on HN, so I'll ask: was this proposal merely on the Dr Dobbs article or did it actually go to a committee for review? If the latter, why wasn't it accepted?

Should C reconsider this? Especially now that C++ has std::span<> and std::string_view<>?

pjmlp · on April 30, 2020

Currently Checked C seems to be only attempt left, and a mentality shift to at very least use the static analysis tools that come with the compilers.

Contrary to common HN wisdom, most C and C++ related surveys show that only up to 50% actually use some kind of analysis tooling.

dataflow · on April 29, 2020

Also see: BSTR

https://docs.microsoft.com/en-us/previous-versions/windows/d...

tonyedgecombe · on April 29, 2020

Because you can never have enough string types in your Windows projects.

72deluxe · on April 29, 2020

Or enough wrappers to safely acquire and free those things in C++ without having to write many extra lines or spend a long time looking for leaks

pjmlp · on April 29, 2020

And Strsafe

https://docs.microsoft.com/en-us/windows/win32/menurc/strsaf...

saagarjha · on April 29, 2020

Previous discussion: https://news.ycombinator.com/item?id=10526159

(Fixed to actually point at discussion of fat pointers; thanks, ‘dgellow.)

dgellow · on April 29, 2020

The link I posted is the documentation about fat pointers, not the homepage of libcello.

saagarjha · on April 29, 2020

Fair enough; I’ll replace the links above with https://news.ycombinator.com/item?id=10526159, which as far as I can tell points to the same page. Thanks!

seventh-chord · on April 29, 2020

This seems to assume you want to actually allocate individual objects using malloc/calloc, when in reality you usually want to pool your allocations somehow, both for performance and for your own sanity.

arcticbull · on April 29, 2020

Might be a nice safety feature to tag the first few bits of the size with a magic sequence so the "free" method can sometimes catch an attempt to free a non-fat pointer passed into it.

projektfu · on April 29, 2020

Another question for language lawyers: if you are given a pointer to char, is it defined behavior to somehow cast the data before that into a correct integer? Assuming you have

   typedef struct MyStr {
      size_t mystr_length;
      char [1];
   } MyStr;
   
   typedef char *PMyStr;

   size_t MyStrLen(const PMyStr p) {
      const MyStr *pmystr = ?;
      const size_t *psize_t = ?;

      return *psize_t;
   }

How do you make a legal cast to get either pmystr or psize_t?

saagarjha · on April 29, 2020

That depends on what p points to. If it’s the address of the char[1] array inside a valid PMyStr, then as far as I’m aware you can subtract the correct number of bytes (offsetof is your friend) and cast the resulting pointer to your MyStr, then get the size from there.

projektfu · on April 29, 2020

Ah, with your information I'm looking into the container_of macro which I'm not entirely certain is legitimate but hasn't run into problems.

projektfu · on April 29, 2020

Many string functions are intended to operate on a substring. These functions would appear to need the original string passed in every time to find the length.

glxxyz · on April 29, 2020

Every pointer into an array could pass the triple {current, start, len}, or even maybe the quadruple {current, start, len, allocated}, which is the kind of minimal efficiency C programmers are looking for.

MichaelMoser123 · on April 29, 2020

They use a similar trick with std::string in libstdc++. The string object has a pointer to the null terminated character string, right before that string is a structure that contains the reference count. (I think libcello could add reference counted objects that way)

krilovsky · on April 29, 2020

From a cursory look, this is one of the most unsafe pieces of code I have seen, with complete disregard to memory alignment requirements and the lifetime of temporary objects passed as arguments to functions.

Definitely don't use this in production code.

jedimastert · on April 29, 2020

From the FAQ[0]

> Can it be used in Production?

> It might be better to try Cello out on a hobby project first. Cello does aim to be production ready, but because it is a hack it has its fair share of oddities and pitfalls, and if you are working in a team, or to a deadline, there is much better tooling, support and community for languages such as C++.

[0]: http://libcello.org/home

asveikau · on April 29, 2020

The author has already made it clear they consider "memory allocation by unaligned offset into temporary char[] cast into a struct pointer" to be a valid strategy, so frankly I'm not very interested in their opinions on whether it's production ready.

I've seen it on HN before, whenever this project gets mentioned. People who don't know much about C confuse it for a really cool thing you can do with C, as if it's just another legit library that you can pick up and use. It's a lot of undefined behavior. People have enough problems writing safe C as it is, and on top of this complaint about alignment and misuse of temporaries, this thing makes the problem worse in other ways too, removing the few safeguards that exist by treating everything as void* for instance.

krilovsky · on April 29, 2020

Well it shouldn't be used for toy projects either with all of that hackery that it pulls (even if you don't use atomics, violating alignment requirements alone means that your code can never make use of vector instructions).

The new code on GitHub makes it clear that this library is not about fat pointers, but about writing unsafe C in another language, powered by undefined behaviour and the glory of the C preprocessor.

If you want to have safe arrays in C, just use a proper library for that instead of going the route of abusing undefined behaviour. There are plenty of libraries to choose from: https://github.com/search?l=C&q=vector+array&type=Repositori... (not all of them are serious projects, so you should be careful when you pick one. I personally have been using this one: https://github.com/iscgar/cvec2 because I like the easy interface it provides and especially the fact that type safety is a top priority, but other may have different tastes and priorities).