Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well, em dashes are not all that common in text that people have written on computers, because em dashes were left out of ASCII. They're common in high-quality text like Wikipedia, academic papers, and published books.

My guess is that comma-separated lists tend to be a feature of text that is attempting to be either comprehensively expository—listing all the possibilities, all the relevant factors, etc.—or persuasive—listing a compelling set of examples or other supporting arguments so that at least one of them is likely to convince the reader.



I was surprised to learn from your comment that em dashes were left out of ASCII, because I thought I've been using them extensively in my writing. Perhaps I'm just relying heavily on the hyphen key. I mention that because it's likely instances of true em dash use (e.g. in the high-quality text you cite) and hyphen usage by people like me are close enough together in a vector space that the general pattern of a little horizontal line in the middle of a sentence is perceived as a common writing style by the LLMs.

I find myself constantly editing my natural writing style to sound less like an AI so this discussion of em dash use is a sore spot. Personally I think many people overrate their ability to recognize AI-generated copy without a good feedback loop of their own false positives (or false negatives for that matter).


On typewriters all characters are the same width, typically about ½em wide. Some of them compromised their hyphen so that you could join two of them together to form an em dash, but a good hyphen is closer to ¼em wide. But that compromise also meant that a single hyphen would work very well as an en dash. And generally hyphenation was not very important for typewriters because you couldn't produce properly justified text on a typewriter anyway, not without carefully preplanning each line before you began to type it.

Computers unfortunately inherited a lot of this typewriter crap.

Related compromises included having only a single " character; shaping it so that it could serve as a diaeresis if overstruck; shaping some apostrophes so that they could serve as either left or write single quotes and also form a decent ! if overstruck with a .; alternatively, shaping apostrophe so that it could serve as an acute accent if overstruck, and providing a mirror-image left-quote character that doubled as a grave accent; and shaping the lowercase "l" as a viable digit "1", which more or less required the typewriter as a whole to use lining figures rather than the much nicer text figures.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: