Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Which is exactly why LLMs use these techniques so often. They're very common.


Well, em dashes are not all that common in text that people have written on computers, because em dashes were left out of ASCII. They're common in high-quality text like Wikipedia, academic papers, and published books.

My guess is that comma-separated lists tend to be a feature of text that is attempting to be either comprehensively expository—listing all the possibilities, all the relevant factors, etc.—or persuasive—listing a compelling set of examples or other supporting arguments so that at least one of them is likely to convince the reader.


I was surprised to learn from your comment that em dashes were left out of ASCII, because I thought I've been using them extensively in my writing. Perhaps I'm just relying heavily on the hyphen key. I mention that because it's likely instances of true em dash use (e.g. in the high-quality text you cite) and hyphen usage by people like me are close enough together in a vector space that the general pattern of a little horizontal line in the middle of a sentence is perceived as a common writing style by the LLMs.

I find myself constantly editing my natural writing style to sound less like an AI so this discussion of em dash use is a sore spot. Personally I think many people overrate their ability to recognize AI-generated copy without a good feedback loop of their own false positives (or false negatives for that matter).


On typewriters all characters are the same width, typically about ½em wide. Some of them compromised their hyphen so that you could join two of them together to form an em dash, but a good hyphen is closer to ¼em wide. But that compromise also meant that a single hyphen would work very well as an en dash. And generally hyphenation was not very important for typewriters because you couldn't produce properly justified text on a typewriter anyway, not without carefully preplanning each line before you began to type it.

Computers unfortunately inherited a lot of this typewriter crap.

Related compromises included having only a single " character; shaping it so that it could serve as a diaeresis if overstruck; shaping some apostrophes so that they could serve as either left or write single quotes and also form a decent ! if overstruck with a .; alternatively, shaping apostrophe so that it could serve as an acute accent if overstruck, and providing a mirror-image left-quote character that doubled as a grave accent; and shaping the lowercase "l" as a viable digit "1", which more or less required the typewriter as a whole to use lining figures rather than the much nicer text figures.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: