> with a binary format you can transmit NaN, Infinity, -infinity, and -0. You ca...

Avamander · on Aug 17, 2019

I think people undervalue clean-looking (alphabet-only, few special character) things, things that don't require people to use the symbol-parsing part of their brain. Basically easily human-parseable things. I suspect this phenomenon can be observed in the case of relative popularity of JSON, TOML, YAML and Python plus the relative unpopularity of Lisp, Haskell, Rust and XML. And if we look at protobuf in this context it is not easy to parse for humans, which causes people not to want to use it, developers are not

> more comfortable with the parser being a black box

they're more comfortable with the parser being a black box but the format being relatively easy to parse compared to the parser being easy to understand but the format basically unreadable for a human.

dragonwriter · on Aug 17, 2019

> I think people undervalue clean-looking (alphabet-only, few special character) things, things that don't require people to use the symbol-parsing part of their brain. Basically easily human-parseable things.

The symbol parsing part of the human brain is what parses letters and numbers, as well as other abstract symbols. The division of symbols into letters, numbers, and others is fairly arbitrary. Most people would say “&”, but the modern name of that symbol is a smoothing over of the way it was recited when it was considered part of the alphabet and recited with it.

> I suspect this phenomenon can be observed in the case of relative popularity of JSON, TOML, YAML and Python plus the relative unpopularity of Lisp, Haskell, Rust, XML.

I suspect not: Lisp and Haskell have less use of non-alphanumeric characters than most more-popular general purpose languages, and not significantly more than Python; also, if this was the phenomenon in play, popularity would be TOML > YAML > JSON but in reality it's closer to the reverse.

Avamander · on Aug 17, 2019

> The symbol parsing part of the human brain is what parses letters and numbers, as well as other abstract symbols.

I really don't think that's true when you talk about about someone using the latin alphabet, words in that alphabet compared to some other alphabet (e.g. {}():!) and "words" (or meanings) in those. Just as a crude example parsing "c = a - b", where equals and minus are one symbol each and have been taught for a while, is different from parsing "c := a << b" where ":=" and "<<" basically act as a separate meaning someone has to learn to understand. Similar to the difference of latin alphabet and say simplified Chinese.

> also, if this was the phenomenon in play, popularity would be TOML > YAML > JSON but in reality it's closer to the reverse.

There could be somewhat of an sigmoid response to the effect, decreased reaction if you go into either extreme compared to deviating from the average.

I'm not a linguist so it is my speculation, so don't take it too seriously :D

kentonv · on Aug 17, 2019

That's exactly what I was trying to say. Sorry if I wasn't clear.

userbinator · on Aug 17, 2019

Those aren't caused by values being length-delimited, they are caused by people deserializing variable-length values into fixed-length buffers without a check. That mistake can be made just as easily with a text format as with a binary format. In fact I've seen it much more often with text.

Especially since text can be arbitrarily long. From that perspective, length-delimited text (I've seen that before in a few file formats, and more notably HTTP) is probably the worst of both worlds.