Text Editing Hates You Too

goranmoomin · on Oct 29, 2019

As a person living with the CJK languages (I’m specifically a Korean), I find that some of these problems are prominent, even in the 21th century.

There are an excessive amount of programs that conflate key presses & text input, and ones that don’t consider input methods.

I use macOS & Linux, and while the default text handling system called Cocoa Text System in macOS handles input methods well, almost all applications that implement it’s own, like big apps like Eclipse and Firefox, don’t get this right.

On Linux, it’s terrifying; I’ve never seen any app that allows input systems to work naturally, and after a week of use you get used to pressing space & backspace after finishing every Hangul word. The Unix-style composability they want (apps should work whether or not input methods are used - and looks like Linux users that use Latin characters don’t use any input methods (opposed to macOS where Latin characters are input by a Latin input system), so looks like this state will persist.

About the emoticons, I’m not that concerned with that since most (if not all) users won’t really input the color modifier separately (or even encounter files that have a separate one), so you can just select a sensible behavior like the #2 or 3 or 4. Users who understand the color modifiers, and other Unicode fiasco will understand what is happening under the hood, and ones that don’t will just think the file is broken and none of the behaviors will make sense, whatever you do.

SiVal · on Oct 29, 2019

For decades I've been saying that all text input should go through OS-level IMEs (input method editors). For more complex scripts, the need is obvious, but even writing in English, we can get great benefits from a system that expands abbreviations, replaces easy-to-type sequences with proper Unicode chars, runs little scripts and inserts the output, gives you quick dictionary/thesaurus lookups, gives you emmet-style powers, etc., whenever you're writing and in any app.

Of course, everyone's preferences will be different, so you'll get default IMEs with default configs, but with the idea that you can reconfigure or replace them entirely with systems that work the way you want to work everywhere you input text. There have been utilities that do this sort of things for decades, but they've always been treated as clever hacks rather than standard text input.

In other words, instead of powerful text input methods being the exception, they would be the rule, and apps that didn't use them would be the exception.

goranmoomin · on Oct 29, 2019

> I've been saying that all text input should go through OS-level IMEs

This is so true, all commercial OSes that take i18n seriously do this, while most open source OSes' community (which communication revolves around English) decided that IMEs are an add-on for CJK people.

It's a pity, and this one reason is enough for Linux to be never adopted for ordinary users in the non-western world.

mikekchar · on Oct 29, 2019

That's not really fair, though. Linux users still overwhelmingly use X windows. There is no standard IME for X. This is just a historical issue. X Windows is OLD -- far older that Windows or Mac (even the old Mac), for instance.

My biggest problem with IMEs in free software land is that we've had groups like Gnome lean on IME developers and push through their vision of how it should work -- even though the people pushing their vision don't use IMEs. I ended up migrating to FCITX just because it was the last hold out not to cave to pressure.

shadowgovt · on Oct 29, 2019

Old really means there's been more time to identify and solve the problem, and the fact it hasn't been cleanly solved is a lot more indicative of priorities than time or tech.

Microsoft and Apple have the incentive of selling to billion-user markets. The open source community, on average, appears to have demonstrated a lack of interest in opening up the user-base further (and expecting that user-base to just roll their own solution creates multiple catch-22 and tragedy of the commons problems).

Why one of the commercial open-source vendors hasn't taken this on as a core challenge, I do not know.

hsivonen · on Oct 29, 2019

> There is no standard IME for X.

Technically true, but for practical purposes IBus is the standard. That's what Fedora and Ubuntu use out of the box. Firefox beta telemetry shows about 89% IBus vs. 11% FCITX.

> My biggest problem with IMEs in free software land is that we've had groups like Gnome lean on IME developers and push through their vision of how it should work -- even though the people pushing their vision don't use IMEs. I ended up migrating to FCITX just because it was the last hold out not to cave to pressure.

What wrong thing did the Gnome folks push for?

mikekchar · on Oct 30, 2019

Insisting that there can only be one input type for the entire session rather than one per window. If you are using multiple languages which each require an IME, Gnome's interpretation is completely broken. I had to stop using IBus because of it. It is possible they have changed their mind since then (several years back), but I haven't followed it. Incidentally, it was also the thing that meant I had to stop using Gnome. Before that I was a happy Gnome Shell user :-(

hsivonen · on Oct 30, 2019

Gnome appears to have fixed this.

efdee · on Oct 29, 2019

That's not really fair either. X Windows might be older, but neither Mac OS nor Windows had anything like this in their older versions.

Some things evolved, some things didn't. O hi X Windows!

rmah · on Oct 29, 2019

MacOS had rather robust internationalization support 25 years ago. Non-standard, but still.

pvg · on Oct 29, 2019

far older that Windows or Mac (even the old Mac)

They are all very nearly the same age.

musicale · on Oct 29, 2019

Pretty much. The W window system for the V distributed system predates the Macintosh (as does the Apple Lisa/1983), and W was ported to Unix in 1983. Their immediate successors - X and the Macintosh - came out in 1984, and Windows 1.0 in 1985.

Windows 1.0 was fairly primitive, but Windows 2.0 supported overlapping windows (!) in 1987, coincidentally the year that X11 was released.

rodgerd · on Oct 29, 2019

> That's not really fair, though

It's completely fair. Most of the developer community just doesn't care.

hsivonen · on Oct 29, 2019

> This is so true, all commercial OSes that take i18n seriously do this,

Does Windows really?

> while most open source OSes' community (which communication revolves around English) decided that IMEs are an add-on for CJK people.

IIRC, Fedora ships an iOS/Android like Latin-script IME with autocomplete. It's not mandatory, though.

> It's a pity, and this one reason is enough for Linux to be never adopted for ordinary users in the non-western world.

The Korean situation doesn't really generalize. Among CJK, AFAICT, the Korean IBus IME on Ubuntu 18.04 is pretty broken but the Japanese IME and the various Chinese IMEs appear to be at least OK. (It's a rather surprising situation considering that the Hangul part of a Korean IME should be much simpler than the Japanese and Chinese IMEs.)

WorldMaker · on Oct 29, 2019

> Does Windows really?

Yes, the signs that even English is passing through Windows' IME infrastructure in Windows 10 is pretty minimal by default (in Desktop mode, Tablet mode immediately turns on a couple more), but at this point Windows 10 makes almost all of it opt-in. Some of it is referred to as accessibility tools from an English perspective, because IMEs are also useful for accessibility.

The "big one" IME for most English users is the Emoji Keyboard accessible with Win+. or Win+; (whichever you prefer). It's really interesting how well emoji have helped Latin script users with further understanding the complexities of Unicode, fixing old ugly bugs in Unicode handling, and even introducing some such users to an IME that they want to use (sometimes every day).

Under Settings > Devices > Typing > Hardware Keyboard you can turn on the IME "Show text suggestions as I type" even on a hardware keyboard in Windows 10, which gives you mobile-style auto-suggestions (you can also turn on mobile-style autocorrect even on a hardware keyboard).

hsivonen · on Oct 29, 2019

> The "big one" IME for most English users is the Emoji Keyboard accessible with Win+. or Win+; (whichever you prefer).

No, API-wise the on-screen keyboard generates keystrokes for emoji (astral keystokes, multiple keystokes for multi-scalar-value emoji) even to an IME-aware app. In contrast, the emoji picker built into the Windows 10 Pinyin IME enters emoji via IME API.

> Under Settings > Devices > Typing > Hardware Keyboard you can turn on the IME "Show text suggestions as I type" even on a hardware keyboard in Windows 10, which gives you mobile-style auto-suggestions (you can also turn on mobile-style autocorrect even on a hardware keyboard).

Thanks!

WorldMaker · on Oct 29, 2019

The emoji keyboard may not be entirely using the IME API, but it does do some IME-like things even in English. The big thing I'm thinking of is the way it works when you type English words to search the emoji. I think it still most often defaults to passing the keys along to the application as well and replaces it with keyboard keystrokes or selection APIs, but I have seen it sometimes do the IME thing that the text you are typing is shown underlined and not sent to the application. Though in mentioning it, I don't recall the exact combination of app and emoji I was trying to find where I saw that happen or know precisely enough why it would vary in order to reproduce it just now, and maybe that was just a difference between early Insider versions of the emoji keyboard and current operations or something similar that I'm misremembering.

That said, even if it isn't using the IME APIs directly in most cases, it's still useful as a teaching tool/analogy tool/example tool to English speakers of what an IME can be like to use, even if a nice-to-have for an English writer versus a necessary required tool for other languages.

munmaek · on Oct 29, 2019

On debian 9/10 ibus works for the Korean IME (both Hangeul and Hanja input).

Windows (10 at least) has native support for these IMEs; setting it up on Windows was much easier than on Linux (which doesn't even have a standard IME). Windows also comes with basic CJK fonts, which different distros may or may not have. On Debian I have to install noto-cjk or Adobe's source han fonts.

I originally tried fcitx but had to switch to ibus. Definitely not a great experience.

hsivonen · on Oct 29, 2019

There isn't a single IME setup experience on Linux.

In my experience, Debian is worse than Fedora, Ubuntu, and openSUSE. Fedora is better than Windows 10 when it comes to IMEs: Fedora installs the IMEs by default. Ubuntu, like Windows 10, installs IMEs when you request the addition of an IME-requiring language. OpenSUSE gives you an IME for the language you use at install time if you install openSUSE using an IME-requiring language.

I haven't tried Debian 10, but when I installed Debian 9 _in Japanese_ with a _Japanese keyboard layout_ chosen, the installer didn't bother to set up a Japanese IME!

Fedora comes with an OK set of Noto CJK fonts by default. Ubuntu comes with a minimal set of Noto CJK fonts by default, but when enabling Chinese, Japanese, or Korean, Ubuntu drops more language-appropriate fonts on the system, like Windows 10.

munmaek · on Oct 30, 2019

I didn't mean to use italics here. Oops.

Pxtl · on Oct 29, 2019

> For decades I've been saying that all text input should go through OS-level IMEs (input method editors).

Which is why it's infuriating when the built-in widgets of a platform don't include simple operations for obvious workflows like filtering characters, masked input, etc. That's what leads developers to have to roll their own implementation using keydown/keyup.

I can't tell you how many GUI frameworks have forced me to roll my own numeric input widget.

ygra · on Oct 30, 2019

Wouldn't it be enough if there was an obvious event for actual text input? WPF for example has that and it's the obvious and IME-friendly choice, since of course not every keystroke results in a character (dead keys exist too, after all).

polm23 · on Oct 29, 2019

Totally agree. The best use case here is hooking up your keyboard to a personal database to autocomplete notes you saved, like locations, bookmarks, and so on.

I have a note-taking system that I use to store all this data, and after I set it up years ago I looked at implementing an IME to search my notes and get a link to an entry, but every platform has its own byzantine IME API and I never got anything working. I've been thinking it might be easier to just make a cross-platform program that interacts with the clipboard rather than being a true IME.

In general one effect I'd like to see from having flexible IMEs would be allowing logging and chat could be left to separate apps. Imagine a birdwatching (or whatever) keyboard that lets you pick the birds you saw, their age, behavior, and punch it in while you're in the field, and have it output in structured form into whatever app is handy.

saurik · on Oct 29, 2019

So that sort of makes sense, but then I realize I am looking for all of those behaviors to be different in different contexts (such as writing code vs. writing an essay).

TheSmiddy · on Oct 29, 2019

The OS knows the destination app (and with the correct apis the context within that app) so this would be trivial to implement once the base exists.

tempguy9999 · on Oct 29, 2019

> we can get great benefits from a system that expands abbreviations, ... emmet-style powers

I can't agree here (excepting perhaps the unicode char replacement which a non-english speaker needs to comment on the viability of, vs actually having a non-english keyboard (excepting huge alphabets like chinese where that is used already AFAIK)).

Writing is fundamentally a thinking process, the text entry for a touch typist is relatively quick. If you are to propose expanding abbreviations, how much time do you expect to save? I mean, actually measured it?

> runs little scripts and inserts the output

What is the purpose of this?

> gives you quick dictionary/thesaurus lookups

I use those perhaps once a week or less. If you use it say 3X a day, you'd save little time, perhaps a couple of minutes.

Dunno what emmet powers are though.

MS word & libreOffice does some of what you want and the first time I install them, I spend several minutes tracking down each setting and turning them off - they drive me bonkers. They think they know what I want but they don't. Touch typists can hit many keys a second and kind of pipeline their typing. Having the input modified automatically is rarely useful IMO.

Your idea may be good but like many other ideas such as graphical programming, except in restricted cases they don't work. Perhaps if you measured it I'd be convinced but I can't accept it now as an obviously great benefit.

TeMPOraL · on Oct 29, 2019

> Writing is fundamentally a thinking process, the text entry for a touch typist is relatively quick. If you are to propose expanding abbreviations, how much time do you expect to save? I mean, actually measured it?

Not quick enough. I think faster than I type, and I type fast. It's fine until I start getting impatient with myself. Call it micro-impatience, a flash of irritation in which you're suddenly conscious of not having finished typing in the thought. It's distracting.

Honestly, I like parent's idea; a good chunk of Emacs's awesomeness and the reason people like me use it as an operating system is because of that - unified, fully configurable and expandable text-based interface. I often wish to have something like it system-wide, because standard UIs are very far from optimum ergonomy-wise. But then again I wouldn't trust Apple or Microsoft to do it right; they'd quickly find a way to dumb it down, or restrict the extensibility in the name of security.

lioeters · on Oct 29, 2019

> the reason people like me use [Emacs] as an operating system is because of that - unified, fully configurable and expandable text-based interface. I often wish to have something like it system-wide..

I can totally imagine that. Underneath all the GUI layers, every operating system and application has (or has the potential of) a fully text-based interface. There's just no standard or integration, and tools that allow that (like a system-wide middleware) haven't caught on, I guess. Maybe in an alternate historical timeline, such a feature could have been a fundamental layer of an OS.

From the grandparent comment:

> a system [with powerful text input methods] that expands abbreviations, replaces easy-to-type sequences with proper Unicode chars, runs little scripts and inserts the output, gives you quick dictionary/thesaurus lookups, gives you emmet-style powers, etc., whenever you're writing and in any app.

Yes, yes - and the last point: in any app. I picture it like how TCL can script other programs, even ones that weren't designed to be "remote controlled".

madhadron · on Oct 29, 2019

> Underneath all the GUI layers, every operating system and application has (or has the potential of) a fully text-based interface.

Why do you say this? There is nothing fundamental about text. Is there something fundamental about text in Smalltalk? Or AmigaOS?

> There's just no standard or integration, and tools that allow that (like a system-wide middleware)

COM on Windows. Scripting interface to apps on MacOS. They're there.

lioeters · on Oct 29, 2019

Yeah, I had some vague doubts while I was writing that comment. I guess I meant "text-input based", or maybe better to say "keyboard based" with a system-wide/application-agnostic middleware of some kind.

madhadron · on Oct 30, 2019

"Accessible via a programming language" perhaps?

TeMPOraL · on Oct 30, 2019

Not that. More like, UX paradigm more fixed and forced on applications, but also being customizable and user-programmable externally to any given application. So that e.g. you could have a system-wide autocomplete/code completion, whether you're in a code editor or text editor or in a dialog box of some other program somewhere; that system-wide autocomplete would be configurable and trivial to extend or replace wholesale with another widget.

This is a reality within Emacs (which really is a 2D text OS running lots of applications inside, including a text editor), and being text-based does play a role. When it's very hard to draw arbitrary pixels on screen and most of all apps deal with text, it's easy to make a large set of very powerful interface tools, and it's easy to pull data out of an app and put data into it, whether the app intended it to happen or not.

In the back of my mind, I sometimes wonder how something like Emacs could be made with modern browser canvas, to enable cheap rich multimedia, while retaining the ability for inspection and user-programmability. Introducing arbitrary GUIs is hard, because next thing you know, half of the stuff is drawing to canvas directly and it's all sandboxed away from you.

lioeters · on Oct 30, 2019

> user-programmable externally to any given application

I think this is why it reminded me of TCL, specifically the "expect" command that can script apps that know nothing about it. From the Wikipedia page, the TCL Expect extension: "automates interactions with programs that expose a text terminal interface".

So how I imagine this "Emacs as an OS" paradigm you're describing, is that it mediates interactions with any and all apps that expose a text input/edit interface, to allow programmatic customizations.

Like I'd love to script my own shortcuts for Firefox (or other apps) - possibly with multiple steps, taking input from some config file, or sending a link to another app.. Or, as you mentioned, Emmet-style expansions that work in any input field or textarea..

tempguy9999 · on Oct 30, 2019

Excellent post, now I really see what you're getting at now.

tempguy9999 · on Oct 29, 2019

To address just one point, in emacs you've dabbrev-expand (bound to M-/). I like it and use it but it is not automatic. I have to invoke it myself which means it can't get in the way.

If you want larger clumps of code then you have various options such as skeleton mode, but again that's something the user has to ensure happens - again they remain in control.

> But then again I wouldn't trust Apple or Microsoft to do it right

Oh hell yes!

TeMPOraL · on Oct 29, 2019

> I have to invoke it myself which means it can't get in the way.

For the sake of completeness, you can always make it automatic. All it takes is to add a function to post-self-insert-hook, and make it e.g. call dabbrev-expand if you pressed space twice. So you can have it any way you like - manual, automatic, semi-automatic. You're in full control.

> If you want larger clumps of code then you have various options such as skeleton mode

Yes. I currently use yasnippets for code. Still, my favourite yasnippet is one I use in comments - it expands "todo" into: "TODO: | -- [my name], 2019-10-29.", and similarly for "note", "hack" and "fixme". | is where the caret ends after expansion.

That's the kind of flexibility I wish my OS had. Unfortunately, it goes against the commercial interest of mainstream OS providers.

chrisweekly · on Oct 29, 2019

IIUC the parent was suggesting an OS-level system that _supports_ these features natively, as the foundation layer for any number of userland tools to sit atop... vs your compelling arg for why said features must be straightforward to disable. I don't see a conflict.

tempguy9999 · on Oct 29, 2019

True, upvoted. My point was that these facilities are of questionable value (I'd like to see how much time they really save, or indeed even lose when triggered accidentally), and that they have to be in easy control of the user. With MS there's too much "we invented it so you're getting it", and bad designers (who always outnumber good) will do the same.

Actual example, was working with vis studio with another guy. Open a bracket and VS automatically added a closing bracket. That is fucking annoying and saves you a whole keystroke while breaking muscle memory and interfereing with our work. We had trouble turning that off.

maest · on Oct 29, 2019

I don't have strong feelings about what the GP says, but:

> I use those perhaps once a week or less. If you use it say 3X a day, you'd save little time, perhaps a couple of minutes.

Maybe you only use them so rarely because they're not convenient to use.

tempguy9999 · on Oct 29, 2019

I use them rarely because my vocabulary is reasonably large. Also, given a choice of words I'll prefer the more conventional one.

For people learning, perhaps it could be a good thing.

JadeNB · on Oct 29, 2019

This seems an awful lot like arguing that, since you are happy at the CLI, there's no need for a GUI. Widespread availability of rich OS-level IMEs wouldn't hurt anyone, and could help everyone. (Even for someone like you who wants to pass through raw hex, it's easier to tell that once to the OS rather than to have to argue with each app individually.)

tempguy9999 · on Oct 29, 2019

I was unclear. I'm not against anything, just saying let the user control it, just make sure some great idea actually works (user testing shows many great thing aren't; people are complex and so are their mental models) then not force it on users.

zbraniecki · on Oct 29, 2019

Hi! I work on Firefox! Would you be open to share your experience and file bugs on issues you encountered?

I'd love to help with the CJK support in Firefox!

ken · on Oct 29, 2019

I think the point is to get native behavior, you need to use the native functionality. When you try to emulate it, you'll always be missing something.

We see this in every corner of Firefox: the text editing, the toolbar, the context menus, the form controls, etc. Yet Firefox seems to be all about writing everything from scratch. There are many bugs filed for using native functionality, like [1], that have been open for decades with no activity. 20 years ago, it wasn't done because "it's a time thing", and since then it's racked up other bugs as dependencies because stuff's just broken.

This isn't a situation where you can tweak a couple little problems and call it done. This is a fundamental change in the Firefox architecture. Asking for more bug reports is not going to help.

[1]: https://bugzilla.mozilla.org/show_bug.cgi?id=34572

hsivonen · on Oct 29, 2019

Firefox uses native IMEs. Most of the time they aren't broken in obvious ways. To the extent they are broken in non-obvious ways, it's unhelpful not to say how on the level that would allow the problem to be reproduced.

Firefox can't just use native text edit controls. First, Firefox needs to support contenteditable, which doesn't map to an OS-supplied text box. Second, the multiprocess architecture leads to a situation where the UI process talks to the native IME API but the Web content process hosts the text being edited, so native text boxes don't work even for things that look similar to OS-supplied text boxes.

fouc · on Oct 29, 2019

OR better yet, use the base OS editor support.

vezycash · on Oct 29, 2019

Hi. Sent in feedback yesterday concerning addon privacy. Specifically, the option to grant or restrict addon access on per site basis.

I'd want to be able to right click an addon icon in the toolbar and click, "Don't run on this site." And have even more options in the extension detail page.

zapzupnz · on Oct 29, 2019

This isn't relevant to the thread, nor to what the Firefox dev was asking for.

innocenat · on Oct 29, 2019

> On Linux, it’s terrifying; I’ve never seen any app that allows input systems to work naturally, and after a week of use you get used to pressing space & backspace after finishing every Hangul word. The Unix-style composability they want (apps should work whether or not input methods are used - and looks like Linux users that use Latin characters don’t use any input methods (opposed to macOS where Latin characters are input by a Latin input system), so looks like this state will persist.

Living in Japan and using Linux almost all the time, I never remember having any problem with typing in Japanese whatsoever.

kbumsik · on Oct 29, 2019

Korean case is totally different from Japanese. I guess Korean is very unique in this area, because Korean Hangul characters consists of multiple small characters. Input systems needs to have a state system to allow type multiple small characters to complete a character. The problems usually come from improper state systems.

hsivonen · on Oct 29, 2019

Korean is different from Japanese, but as far as IME complexity goes, a Hangul IME should be extremely simple to develop compared to a Japanese IME. (The Hangul part of a Korean IME doesn't need any pop-ups like a Japanese IME does. As far as UI requirements go, the Hangul part of Korean IME can be as UIless as a Vietnamese Telex IME.) That e.g. on Ubuntu 18.04 the Korean IME is broken is not due to anything intrinsic to the writing system.

oefrha · on Oct 29, 2019

Yes, Korean (specifically Hangul) additionally suffers from NFC/NFD issues which is not experienced by Chinese or Japanese. I’ve had the privilege (/s) to work with Korean file names in the past and it was a nightmare.

int_19h · on Oct 29, 2019

It's not quite unique - there are many other scripts that behave in a similar way, e.g. for some Indian languages.

Izkata · on Oct 29, 2019

How is that any different from Japanese Kanji?

munmaek · on Oct 29, 2019

Hangeul is not an alphabet. It's an alphabetic syllabary. ㄱ is one character but it needs to be composed into a final glyph like 각, which require 2+ characters. 가, 각, 갉. 뷁.

Kanji (not to be confused with Hiragana or Katakana) are different because the characters are already composed.

Izkata · on Oct 29, 2019

See my other reply [0]; as far as I can tell, the typing experience is identical.

[0] https://news.ycombinator.com/item?id=21390039

munmaek · on Oct 29, 2019

Yeah I had a brain-fart earlier. I totally forgot about that. Typing hangeul is basically the same except there's no need to possibly choose different hanja/kanji. ...unless you press the hanja key after the glyph is typed but before moving onto the next glyph. (Usually F9 or F10, iirc Windows IME defaults to ctrl+space).

Qwertystop · on Oct 29, 2019

Kanji is logographic: each symbol is a complete word, phrase, or idea. Hangul is alphabetic-syllabic: each symbol is a segment (vowel-or-consonant), except that they're written in two- or three-letter blocks each representing a syllable.

carlmr · on Oct 29, 2019

I'm not sure it's that different IME wise. I only know Chinese pinyin IME, but I assume since Kanji in contrast to Chinese can map to multiple syllables you probably need to keep the state of the last few syllables as well and then let the user choose the appropriate Kanji (if available).

With pinyin input it's the same in Chinese. You enter the Latin characters and the IME gives you options to select from. Also even abstracting from the single syllables you often can narrow down the selection of compound words in Chinese IME if you continue typing. So again state is important.

Izkata · on Oct 29, 2019

Correct, this is why I'm wondering. Input is done through hiragana such as "わたし" (or even a step further removed, transliterated from "watashi") and then the IME is triggered to convert it to "私" (or other matches).

On a laptop, my experience is that romaji -> hiragana is as-you-type due to being unambiguous, while the default trigger for hiragana -> kanji is the spacebar, same as described for Korean. Hence my confusion as to how it's different - the individual characters certainly are, but it sounds functionally identical.

On my phone just now, typing this comment, the experience was switching to a Japanese keyboard and inputting the hiragana directly, then the kanji suggestions appeared where autocorrect suggestions normally would.

hsivonen · on Oct 29, 2019

Typing Hangul is just like typing Latin, Greek, Cyrillic, Hebrew, Arabic, etc., text: one alphabetic unit at a time. The only thing a Hangul IME does is it groups the typed jamo into syllables. The grouping is unambiguous, so there's no need for popups or space presses to guide the grouping.

(If there had been the kind of rendering technology that is used for Indic text today back when Korean text processing on computers started, chances are that the syllable grouping would be handled as rendering-time shaping and not as an input-time IME issue.)

Additionally, Korean IMEs have a feature to convert a word into Hanja, but it's something you need to take action to invoke as opposed to Japanese IMEs offering to convert to Kanji by default.

Izkata · on Oct 29, 2019

> The grouping is unambiguous, so there's no need for popups or space presses to guide the grouping.

...that's the opposite of the comments that triggered my question. Multiple people said space is needed to trigger it, then backspace to remove an erroneously-added space character.

Is the answer that they're using it wrong, and are actually inputting a space directly because the IME already acted for them?

cyborgx7 · on Oct 29, 2019

Learning japanese and full-time Linux user.

Getting Japanese writing to work was a pain in the ass and still doesn't work everywhere.

innocenat · on Oct 29, 2019

I use ibus and mozc (i.e. what come with Ubuntu) and never have any problem.

ptero · on Oct 29, 2019

This is interesting. Not doubting your story, but my personal experience, as a Cyrillic user, is opposite: at least in early versions of Windows apps I constantly struggled with a random mix of hardcoded assumptions for encodings, key presses, characters and data stored which often produced gibberish on screen.

When I switched to Linux everything just works out of the box: I can copy and paste text between gvim, xterm, etc. with no issues. I admit that this is likely due to app writers, not underlying OS. And my experience is only with single-byte characters. Just my 2c.

weeb_throwaway · on Oct 29, 2019

Another issue with CJK is vertical right-to-left writing: https://www.w3.org/International/articles/vertical-text/

It's pretty popular with japanese novels/manga and traditional chinese but really hard to do without bugs on the web.

L_Rahman · on Oct 29, 2019

Hadn't fully registered till this comment, the degree to which the modern web is anchored to horizontal (usually left-to-right) writing and the design patterns of vertical scrolling that come with that assumption.

rcthompson · on Oct 29, 2019

It's not just the web, it's all of computing, even down to the hardware design. A vertical scroll wheel is standard on all mice. A horizontal scrolling method of some sort is not, and even on mice that include one, it's usually not as good as the vertical one (e.g. leaning the wheel left and right).

hsivonen · on Oct 29, 2019

> I use macOS & Linux, and while the default text handling system called Cocoa Text System in macOS handles input methods well, almost all applications that implement it’s own, like big apps like Eclipse and Firefox, don’t get this right.

What specific IME problems do you have with Firefox on Mac?

> On Linux, it’s terrifying; I’ve never seen any app that allows input systems to work naturally, and after a week of use you get used to pressing space & backspace after finishing every Hangul word.

Do you mean you have to press space twice and erase the second space? With IBus?

dfcowell · on Oct 29, 2019

There are no spaces between words in Chinese or Japanese.

Pressing space confirms the current selection in the Japanese IME, which is expected behavior. Where some Linux implementations get it wrong is they also insert a space after the word, meaning the user has to select the desired word in the IME with the space bar and then remove the erroneous inserted space.

Edit: Correction based on feedback below. Previously stated that Hangul does not have spaces.

jfk13 · on Oct 29, 2019

No, Korean is normally written with spaces between words nowadays (perhaps that wasn't always the case?).

https://www.omniglot.com/writing/korean.htm

tasogare · on Oct 29, 2019

> There are no spaces between words in Hangul

Wrong, there are spaces between words in Korean. It’s in Japanese and Chinese that there isn’t. And in Vietnamese there are spaces between everyone syllables, even in words.

dfcowell · on Oct 29, 2019

Thanks for the fact check.

Not sure your comment on Vietnamese is accurate though. I work in a company with ~35% native Vietnamese speakers and I’ve seen plenty of multi-syllable words.

Are you talking about traditional Vietnamese (when it still used Chinese characters) or modern Vietnamese (post-French-colonialism) which uses the Latin alphabet with accents?

gdx · on Oct 29, 2019

It is accurate, there are spaces between each syllable in modern written Vietnamese, except in foreign words. The syllables can have as many as 7 characters, and you need an IME to type the tone marks. The written language looks like this: https://vi.wikipedia.org/wiki/Vi%E1%BB%87t_Nam

dfcowell · on Oct 29, 2019

Wow, you’re right.

I never noticed (even while studying the basics myself) that syllables are space-separated.

I always saw the words (e.g. “thanh pho” = city - don’t have the keyboard on my phone) as independent units. Didn’t even recognize the spaces.

Amazing how something can be right in front of you without noticing it.

So much makes sense now. Thanks again.

hsivonen · on Oct 29, 2019

> you need an IME to type the tone marks

The standard Vietnamese keyboard layout works without an IME layer. However, apparently most people who write Vietnamese _prefer_ to use an IME.

bicolao · on Oct 29, 2019

Vietnamese does not really belong in CJK group because it's written with Latin alphabet.

tasogare · on Oct 29, 2019

That was not my point. I mentioned Vietnamese because the spacing it uses is interesting.

Also like dfcowell said, Vietnamese used to be written with han & chu nôm characters (respectively Chinese characters and Chinese-like characters created by Vietnamese), a lot of which are encoded in Unicode. Hence the existence of the CJKV acronym.

kevin_thibedeau · on Oct 29, 2019

It is actually CJKV to deal with historical Vietnamese.

From Unicode spec:

> Although the term “CJK”—Chinese, Japanese, and Korean is used throughout this text to describe the languages that currently use Han ideographic characters, it should be noted that earlier Vietnamese writing systems were based on Han ideographs. Consequently, the term “CJKV” would be more accurate in a historical sense. Han ideographs are still used for historical, religious, and pedagogical purposes in Vietnam.

dfcowell · on Oct 29, 2019

Traditional Vietnamese (before French colonialism) used Chinese characters.

munmaek · on Oct 29, 2019

> On Linux, it’s terrifying;

Are you using ibus? I'm on debian and once a character is finished, it automatically moves onto the next character (unless you press space, enter, etc). All I have to do is ctrl+space to switch input methods.

I've had issues with terminals like the character building not working in alacritty. The only major annoyance I've found is having to install and configure ibus/ibus-daemon, and CJK fonts.

microcolonel · on Oct 29, 2019

> On Linux, it’s terrifying; I’ve never seen any app that allows input systems to work naturally

Kind of interested to hear what sort of input method you're using. Even my terminal emulator supports Japanese input methods well, through IBus. Maybe it's just that as a non-native I'm more often going to convert chunk by chunk anyway, I have noticed that some input methods do not do bulk conversion well; I must say I never thought hangeul would even warrant conversion other than composing the syllables, is it because you mix in hanja conversion? I think that your problems will mostly a matter of the quality of the input method, IBus and the input method system in GTK+ at least seem to be not preventing anyone from writing better input methods.

I feel it with Firefox though, then again Firefox is very poor quality software in my experience, almost everything is at least subtly wrong, and there seems to be more interest in niche feature work than basic work on product quality. I could in some ways say the same for Eclipse, every time an SDK I want to use is only documented in terms of their special Eclipse frontend, I get a bit depressed.

usr1106 · on Oct 29, 2019

> and looks like Linux users that use Latin characters don’t use any input methods

Right, I prefer slim systems and I typically uninstall everything input method related that my distro has chosen to preinstall. I cannot read or memorize a single CJK character, so why would I need that.

Aditionally programming and computers for me means English, although that is not my mother tongue. I would never install anything in my mother tongue.

Basically I use my mother tongue (and a couple of other European languages I speak) only in Email, chat or maybe some web form. I can feel your pain though, because 10+ years ago we had the same problem with the couple of non-ASCII characters you need in most European languages.

In order to have the situation in Linux improve there just need to be enough CJK contributors to fix existing bugs. And reviewers / unit test cases to make sure we Westeners don't break it again with our next commit.

goranmoomin · on Oct 29, 2019

> Right, I prefer slim systems and I typically uninstall everything input method related that my distro has chosen to preinstall. I cannot read or memorize a single CJK character, so why would I need that.

Yes, I'm exactly talking about this mindset. This is basically why Linux has such poor input method support. Because English has a special privilege of not needing input methods to be input in, combined with the fact that the majority of Linux application programmers use English only, that means basically all apps that don't consider i18n seriously are by default 'wrong', opposed to apps running on Windows/macOS which are by default 'right'.

> Basically I use my mother tongue (and a couple of other European languages I speak) only in Email, chat or maybe some web form.

Does that mean European languages are able to being input without special input methods?

> I can feel your pain though, because 10+ years ago we had the same problem with the couple of non-ASCII characters you need in most European languages.

The non-ASCII characters fit in the character array model that most western people think in, and as a plus they are fittable in the upper half of ASCII.

Asian CJK languages require a different model from the western ones.

> In order to have the situation in Linux improve there just need to be enough CJK contributors to fix existing bugs.

It's a failing fight. That only works on an ideal world where every program has enough contributors. Thats not true.

usr1106 · on Oct 29, 2019

>> Right, I prefer slim systems and I typically uninstall everything input method related that my distro has chosen to preinstall. I cannot read or memorize a single CJK character, so why would I need that.

>Yes, I'm exactly talking about this mindset. This is basically why Linux has such poor input method support.

Don't understand me wrong. That input methods are useless for me, because I know zero CJK characters, does not mean I think they are useless to everybody or Linux in general. How would I help the CJK users by having something installed I never use and I have no knowledge to use?

goranmoomin · on Oct 29, 2019

> That input methods are useless for me, because I know zero CJK characters, does not mean I think they are useless to everybody or Linux in general.

Yeah, but every app would 'just work' if we have a level of indirection with an IME by default, even for languages with Latin characters.

I mean, there is a reason why Windows & macOS all selects a similar architecture on text inputting.

usr1106 · on Oct 30, 2019

> I mean, there is a reason why Windows & macOS all selects a similar architecture on text inputting

Yes, there is a reason. Microsoft and Apple want to make money in CJK countries. And they have architects that make system-wide decisions.

That is not how Linux works. Companies contribute where their business is. That is server or embedded. After Canonical closed bug number 1 https://bugs.launchpad.net/ubuntu/+bug/1 no big player is interested on Linux on the desktop anymore. Individual contribute what they are interested in and what they know best. I fear most Westerners don't understand the challenges of CJK and other more "complicated" scripts. I myself "blame" Americans if they do it wrong and something accepts only 7 bit ASCII. I can fully understand if CJK or right to left people blame us "8 bit people" (character set, not coding) for doing it wrong. We just don't get it, that's a fact. But I don't think studying Korean etc. is a realistic solution. The only way to change it is to have more people and companies contribute that a) need it and b) really understand the user needs.

usr1106 · on Oct 29, 2019

> Does that mean European languages are able to being input without special input methods

I cannot talk about East-European languages or really small languages. But for the bigger West and Central-European languages the answer is yes.

Every character is either on the national keyboard or (if typing another language) can be typed using dead-key accent or AltGr. Sometimes the compose character is needed, but I need that so rarely that I forget the combinations...

hsivonen · on Oct 29, 2019

> Does that mean European languages are able to being input without special input methods?

Yes. Dead keys are logically like a tiny IME, but on Windows and Linux they aren't API-wise IME-related.

6gvONxR4sf7o · on Oct 29, 2019

>Aditionally programming and computers for me means English...

Linux isn't just for programming. I personally hate that I have to dual boot because ubuntu can't do what windows or mac can. Anyhow, English has plenty of words containing non-keyboard characters, they're just infrequent.

dvt · on Oct 29, 2019

This post reminds me of one of Jon Skeet's seminal SO answers[1] on subtracting dates. Things we take for granted are hard. Dates are hard, text is hard, graphics are even harder. We stand on the shoulders of giants. It's important to stop and smell the roses every now and then.

[1] https://stackoverflow.com/questions/6841333/why-is-subtracti...

gambler · on Oct 29, 2019

> Things we take for granted are hard. Dates are hard, text is hard, graphics are even harder. We stand on the shoulders of giants.

More like we stand on piles of broken shit. A lot of "hard" things in programming have very little fundamental complexity. They are hard because at some point in time someone made a compromise that no longer is relevant or makes any sense. It is possible to redesign those things to be simple, but modern developers are conditioned to fight this at every step. The idea that the first thing you should try to do when faced with extraordinary complexity is to bypass it is completely alien to people whose daily jobs consist of continuously banging their heads against convoluted frameworks and deficient APIs.

Time handling is a great example. Until you get into general relativity, time itself is extremely simple and uniform. But human representations of time are extremely convoluted, because historically they are based on astronomy. Here is the kicker: 99.99% of all software has nothing to do with astronomy, and yet people chose to inject those complexities in core structures that track time. (For example, Unix timestamps have leap seconds, which is a source of innumerable bugs and ridiculous convolutions in code to prevent those bugs.)

rovolo · on Oct 29, 2019

If by 'based on astronomy' you mean 'based on the sun' because the sun is a star, then I agree. But, plenty of software has lots to do with the sun because they have to interact with people, and people like to be synchronized with the sun.

If you don't want software to interact with humans, or you only care about duration rather than human datetime, then go ahead and use UT1. However, as long as your interacting with people, you're going to need to fix people first. You should start with these primers if you're going to fix how humans interact with time:

- https://qntm.org/abolish - https://qntm.org/continuous

stnmtn · on Oct 29, 2019

Are you arguing for trying to create a new method of tracking time instead of just including moment.js?

hrktb · on Oct 29, 2019

I read parent’s comment as a plea to be simple where it’s allowed.

For instance use straight timestamps where there is no need for calendar manipulations.

I am sympathic to this argument and I’d also totally join a movement to ignore DST or uncouple years from earth rotation

kokada · on Oct 31, 2019

This post sounds kinda like a satire: "we have problem X, did you try something.js to fix it?"

jblwps · on Oct 30, 2019

> Until you get into general relativity, time itself is extremely simple and uniform.

Just a little niggle, but I think you meant to say special relativity; special relativity is about the relationship between space and time, whereas general relativity (I believe) is more about gravitation and other things that I definitely don't understand.

AmericanChopper · on Oct 29, 2019

I remember trying to debug this exact “issue” in a java project years ago. I poked around the library to figure out what was going on, which only raised more questions, before I eventually stumbled upon that exact SO thread. Turns out dates have an enormous amount of complexity.

Joeri · on Oct 29, 2019

Date and time are complex because they are social and historical constructs. Everything about them is arbitrary and exists because of arcane historical reasons. No regular system of organizing time maps cleanly onto the movement of heavenly bodies because those movements are not regular (hence: leap seconds). You can describe time in a way that is precise, or in a way that is intuitive, but not both, and woe to anyone trying to convert between the representations.

Time zones are the worst because they are political tools, changed for political reasons, often with very little lead time. I think of them as a one way function mapping universal time onto the local political reality.

ggm · on Oct 29, 2019

I know its not the answer, but I always convert time into time in seconds since the epoch and perform the arithmetic.

epoch time understands negative numbers:

% date -ur -28501440617 Mon Oct 29 05:09:43 UTC 1066 %

eesmith · on Oct 29, 2019

It's easy for me to write something in Russian, I just hire a Russian translator for me.

That is, "convert time into time in seconds since the epoch" is the hard part. All you've done is make it someone else's problem, while the question Skeet answered asked specifically about the time difference between 1927-12-31 23:54:08 and 1927-12-31 23:54:09 in Singapore.

ggm · on Oct 29, 2019

Yes. This is beautifully expressed. And 'hiring a russian interpreter' is almost always the right way to fix this kind of problem. POSIX would seem to me to be the russian interpreter of choice here. Is java a language which has no word for the colour green? in the sense where the eponymous russian interpreter has to say: "concept which cannot adequately be expressed in russian" or something

The POSIX spec exists for this reason. This feels like a variation of "omg utf-8 is hard" which is why people embed utf handling in the language python2 -> python3

eesmith · on Oct 29, 2019

It seems less than graceful to respond to articles which point out the existence and difficulty of problems by saying it's fixed. The entire point is to show how it's more difficult than many think, not elide the issues.

What does POSIX have to do with anything? Eg, the tz database isn't POSIX.

ggm · on Oct 29, 2019

The POSIX spec of date time specified conversions into a neutral form. Maybe it's C centric thinking but it felt like a problem which has existed and been solved by writing a 'russian interpreter' solution: convert the values on the edge to a canonical form and use functions which manipulate the canonical form. The TZ files i only mention because within the limit of modern epoch Times and dates they do a pretty good job of handling the minutia of what a local time means in any given period.

I'm sorry if I came across as churlish. Do you feel this is a problem of concept in abstract, or in java, or in java implementation or what? I feel it's a problem which I solve, by standing on the shoulders of smarter people and using the code they wrote. I don't reimplement the wheel because I distrust my own wheelmaking skills.

I'm sure there are heaps of non-trivial corner cases I don't have to deal with and I do not mean to minimise work done to explain a problem.

ptx · on Oct 29, 2019

The problem would be, I think, when your business logic is concerned with some subtle aspect of Russian, you translate into canonical English and the detail you're interested in disappears.

When it comes to timestamps, how do you handle midnight for example? 24:00 today represents the same instant as 00:00 tomorrow. The difference might matter to the user but disappears when converted to Unix timestamps.

eesmith · on Oct 29, 2019

dvt's top-level comment was "Things we take for granted are hard."

Your response took things for granted.

dvt originally wrote "We stand on the shoulders of giants."

You commented "I feel it's a problem which I solve, by standing on the shoulders of smarter people and using the code they wrote."

It came across as totally missing dvt's point.

I picked Russian because I was thinking of an essay by Douglas Hofstadter. Quoting myself at https://news.ycombinator.com/item?id=16289783 :

> I think it was in Godel, Escher, Bach where he talks about the problem of translating a line from Russian, along the line of "He lived on B____ street." It's possible from the story, which takes place in a real city, to figure out which street that was. Let's say it was "Main Street". Does the English translation keep the original Russian word and initial? Does it translate to "Main Street" and replace the "B" with an "M"?

You can use a Russian-English translator to get a translation into English. But that doesn't help understand the hard problems involved, which was the point of Hofstadter's essay. (See, for example, his "The Shallowness of Google Translate" in The Atlantic; HN comments at https://news.ycombinator.com/item?id=16267363 and in dupes.)

chrisweekly · on Oct 29, 2019

This reminds me of Milan Kundera's "Testaments Betrayed", which expresses the profound challenge (near impossibility) of translations doing justice to original works.

Loranubi · on Oct 30, 2019

Popular instance of that example is R.A.B. from Harry Potter.

demurgos · on Oct 29, 2019

> I always convert time into time in seconds since the epoch

You should still be careful how you define a _second_. A POSIX second is not the same as an SI second. A POSIX second corresponds to 1/(24 * 60 * 60) of a day. An SI second is defined using atomic time measurement. It means that the duration of a POSIX second depends on the length of the day, and some days are longer than others due to leap seconds.

The worst part is that most tools rely on POSIX timestamps instead of TAI timestamps (based on the SI definition). One of the consequences of using POSIX timestamps is that taking the difference between two timestamp does not safely return the number of elapsed seconds.

haberman · on Oct 29, 2019

> A POSIX second corresponds to 1/(24 * 60 * 60) of a day. An SI second is defined using atomic time measurement.

I don't think that is quite accurate. Traditionally Unix Time seconds are SI seconds, that tick in sync with UTC and atomic clocks (modulo clock error). To account for leap seconds, Unix Time has traditionally been discontinuous when a leap second occurs. Each second that ticks is still an SI second, but the sequence of Unix Time around a leap second will be X-2, X-1, X, X, X+1, X+2, etc.

This discontinuity means that every Unix Time day has exactly 86,400 seconds cumulatively, even though some days tick 86,401 elapsed SI seconds. So it is true that computing a difference between two Unix Time values does not give you a reliable measure of how many SI seconds have elapsed, as you say. But the individual seconds are still all SI seconds.

More recently it has become popular to smear the leap second instead. Google pioneered this approach in 2008 (https://googleblog.blogspot.com/2011/09/time-technology-and-...) and now advocates making it standard (https://developers.google.com/time/smear). This approach does vary the actual length of a second during the 24 hours surrounding a leap second. But this doesn't come from POSIX, it's a departure from previous practice that is designed to avoid a discontinuity.

demurgos · on Oct 29, 2019

Yes, I definitely agree that the reality is even more complicated and there are different implementations. The recent time-smearing approach is closer to the POSIX standard [0]:

> How any changes to the value of seconds since the Epoch are made to align to a desired relationship with the current actual time is implementation-defined. As represented in seconds since the Epoch, each and every day shall be accounted for by exactly 86400 seconds.

Another fact that I find interesting is that the use of discontinuities or time-smearing of the leap days are actually modern approaches. Nowadays, POSIX and TAI differ by an integer amount of seconds on regular days, and a fractional amount when smearing the leap-second day. At the beginning, the leap second duration between POSIX and TAI was smeared over years from one leap second to the other. So the difference between both timestamps would change continually other these periods!

My main takeaway is that POSIX timestamps are based on calendar days while TAI timestamps are based on physics constants. POSIX timestamps are a tradeoff: they're not suitable for time computations (durations, or even uniquely identifying an instant when discontinuities are involved) but they don't require a leap seconds lookup-table to display time for humans... in UTC... Which means you still need lookup tables for timezones, countries, DST, calendar changes, etc. Basically my opinion is that POSIX time is neither the best to precisely keep track of time nor to format it for humans. It just sits in between these two use cases because it was good enough at the time and then too hard to change.

[0]: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1...

haberman · on Oct 30, 2019

> At the beginning, the leap second duration between POSIX and TAI was smeared over years from one leap second to the other. So the difference between both timestamps would change continually other these periods!

Do you have any reference to a system that was implemented this way? I have never heard of such a thing and I don't see how it could be implemented.

The insertion of leap seconds is unpredictable and occurs when the International Earth Rotation and Reference Systems Service determines that the difference between UTC and UT1 are approaching 0.9 seconds. Leap seconds are generally only decided six months or so before they occur. So you don't know at the beginning of the year how many leap seconds the year will have. So it doesn't seem possible to smear over a year.

demurgos · on Oct 30, 2019

I remember reading it from Wikipedia some times ago. I looked it up again and I was wrong: this was not applied for POSIX but UTC itself. The arrival of international networks seems to actually be the reason why they stopped smearing it continually and switched to a linear version with discontinuities. [0]

> UTC is a discontinuous time scale. From its beginning in 1961 through December 1971 the adjustments were made regularly in fractional leap seconds so that UTC approximated UT2. Afterwards these adjustments were made only in whole seconds to approximate UT1. This was a compromise arrangement in order to enable a publicly broadcast time scale; the post-1971 more linear transformation of the BIH's atomic time meant that the time scale would be more stable and easier to synchronize internationally.

Wikipedia sources it to a book [1].

[0]: https://en.wikipedia.org/wiki/International_Atomic_Time#Rela...

[1]: McCarthy, Dennis D.; Seidelmann, Kenneth P. (2009). Time: From Earth Rotation to Atomic Physics. Weinheim: Wiley-VCH Verlag GmbH & Co. KGaA. ISBN 978-3-527-40780-4.

bouke · on Oct 29, 2019

That’s not taking time zones into consideration, so might be off by an arbitrary amount of hours/minutes.

ggm · on Oct 29, 2019

Obviously you would set TZ before up-converting. I spoke in UTC about an input time I specified in UTC. POSIX date time models understand timezones and on well configured systems understand prior states of time zone behaviours at boundaries like summertime into historical past states. The 1990s USENET discussions did at one point veer off into rotational speed of the earth MYA ago, I'm not sure many people cared but astronomers maybe have to. (I don't think dinosoars used summertime)

bouke · on Nov 2, 2019

Historically correct timezone handling is one thing, but you also have to consider future timezone changes unknown at this time. For example some EU countries might decide to cancel daylight savings. Now your conversion from UTC is off by an hour.

coldtea · on Oct 29, 2019

That has its own tons of cans of worms...

See: https://infiniteundo.com/post/25326999628/falsehoods-program...

quickthrower2 · on Oct 29, 2019

One does not simply ... convert time

alkonaut · on Oct 29, 2019

How much is poor text input eroding the use of non-latin script worldwide? I.e. how many just give up instead of using poor input?

As an example: I use a Swedish keyboard. If I'm in a situation where ö is missing from the input (because the app is stuck in an en-US keyboard layout say, so I have to hit a modifier ¨+o to type an ö) then there is a very good chance my communication with my colleagues would just naturally be in english instead. I'd sigh and just give up using it. Rather than using an o for my ö I'd just type it in english instead.

Is this a thing in e.g. asia, israel, or the arab world? Do kids that speak english communicate more in english in cases where the input doesn't let them communicate easily using their preferred script? Are there new "hybrid" languages popping up in electronic communication where languages that use non-latin scrpit are written in latin in e.g. text messages? (You could argue that emoji is just that but the other way around I suppose)

Hermel · on Oct 29, 2019

It's the same for me when writing Swiss-style German. It's not only cumbersome to type the special characters like ö on a cell phone, also spell-check is a pain in the ass. For example, spell-checkers are not aware that in Switzerland, we do not use the ugly German ß and just write ss instead. So I get false corrections all the time. Furthermore, spell-checks usually seem to be designed for languages like English with only a limited number of flections. English only has singular and plural, but in German, the ending of a word can be bent in much more ways, making it rather unlikely that the spell-checker suggests the one you want. I suspect that most spell-checkers do not know that "car" and "cars" are the same word in a different form. Instead, they add each form to the dictionary individually. This works well for English, but for many other languages awareness of this would be helpful. Another detail is that expressions that are short in one language can be long in another and vice versa, so many direct translations of English words are much longer in German, taking more space on buttons and other UI elements, thereby screwing up the layout.

So yes, even when "localized", hardware and software is usually not fully adjusted to the language. And for German, which is relatively close to English, the problems are probably relatively small. I cannot even fathom how big all these issues for more distant languages must be.

And to answer you questions: yes, these pain-points lead me to sometimes prefer English and to avoid special characters. I even avoid spaces in file names as too many programs I worked with in the past struggle with that.

usr1106 · on Oct 29, 2019

> Furthermore, spell-checks usually seem to be designed for languages like English with only a limited number of flections. English only has singular and plural, but in German, the ending of a word can be bent in much more ways

Oh, and I always say German works still great. In Finnish words have many 100s of different endings (several concatenated + combinatorics explain how this is possible) and predictive inputs just don't work. Probably it could be improved with massive semantic support, I am not an expert in that field.

lisnake · on Oct 29, 2019

Weird, because spell-checker for Russian in MS Word definetely knows about different endings and can predict them pretty well in my experience

raverbashing · on Oct 29, 2019

I just tried adding the Swiss German keyboard on my phone. It correctly gives me Strasse without the Eszett.

bonoboTP · on Oct 29, 2019

For Hungarian, people stuck with an English layout usually just leave off the diacritical marks (áéíóöőúüű -> aeiooouuu). While this theoretically leaves some of the meaning ambiguous, and pedants can craft examples that may be ambiguous even with context, it works well enough in practice.

Switching to English is way overblown a reaction. Two Hungarians chatting in English (unless there are non-Hungarian speakers involved) seems extremely weird to me. It may be partially that English is really foreign for us, while it's pretty close linguistically to Swedish, both being Germanic.

alkonaut · on Oct 29, 2019

Hungarian is really foreign to EVERYONE :) Me switching to english is in the context of work, not chatting with peers in my free time. At work in tech everyone* (few exceptions) communicates in english (emails, bug reports etc) also between Swedish colleagues, so it's very natural. I wouldn't send a text message to my wife in English if I happened to struggle with the diacritics on the device I'm on.

amyjess · on Oct 29, 2019

> Switching to English is way overblown a reaction. Two Hungarians chatting in English (unless there are non-Hungarian speakers involved) seems extremely weird to me. It may be partially that English is really foreign for us, while it's pretty close linguistically to Swedish, both being Germanic.

For what it's worth, I have a Swedish friend who almost always posts in English, even when he's talking to his family and other people he knows in real life. He'll switch to Swedish occasionally (and Facebook's autotranslation is very understandable), but 90% of the time he uses English.

Vinnl · on Oct 29, 2019

> It may be partially that English is really foreign for us, while it's pretty close linguistically to Swedish, both being Germanic.

That wouldn't explain it, because we do the same in Dutch: just leave off the diacritics. Diacritics are pretty rare in Dutch, though.

alkonaut · on Oct 29, 2019

Yeah in swedish leaving them off isn't working. The diacritics aren't for accentuation, they are distinct letters. An ö is as different from o as u and e are.

bonoboTP · on Oct 29, 2019

Same in Hungarian. One funny example is "főkábel" (fő+kábel, main cable) vs. "fókabél" (fóka+bél, seal intestine). Still, the intended meaning is almost always easy to guess.

Hungarian is also redundant enough that you can even replace all vowels with just one and still be understandable. Most of the meaning is carried by the consonants. E.g. "Szia én vagyok Péter, hogy vagy?" -> Szii, in vigyik Pitir, higy vigy. (sounds obviously wrong, but very understandable) Retaining the vowels but collapsing all consonants to one would be more destructive to the meaning.

SiempreViernes · on Oct 29, 2019

Uh, what do you mean? Leaving them off and letting the reader guess the intended word works pretty well in practice and is what many organisations in Sweden resorted to when getting a domain name (ex: riksgälden, åhlens, företagarna).

krageon · on Oct 30, 2019

It works fine in practice, even if pedantic people will always correct you when you use a instead of å.

SAI_Peregrinus · on Oct 29, 2019

And 1337-speak was never readable to anyone in English, since numb3rs ar3 d1st1nc7 fr0m l3773r5.

Yessing · on Oct 29, 2019

> the arab world?

in Tunisia they write arabic using the latin script.

for example: ahla أهلا := hello

for sounds that are inexistant in latin script they use numbers:

3 == ع , for example 3andek, عندك := you have

5 == خ for example 5amej, خامج := dirty

7 == ح , for example n7ebek , نحبك := I love you

9 == ق, for example 9rib, قريب := near

.

lifthrasiir · on Oct 29, 2019

I can only say about CJK with an example: the web version of Twitter has frequently omitted the last character of CJK messages yet to be committed, and people seem to complain a lot but have adapted somehow. I think this is partly why it doesn't get fixed in time---it is frustrating but not a blocker. [1]

[1] And when the affected user base is small enough, even larger problem can take a lot to fix. I use a third-party IME that is relatively known among programmers, and early this year Google Chrome began to crash seemingly at random when you type Hangul. I tracked the root cause and I believe it is ultimately Chrome's fault, but I was too busy to file an issue at that time and the IME has adapted to Chrome to avoid triggering the crash next month. I believe Chrome still crashes when older versions of the IME are in use.

hsivonen · on Oct 29, 2019

> I use a third-party IME that is relatively known among programmers, and early this year Google Chrome began to crash seemingly at random when you type Hangul

I'm aware that third-party Chinese and Japanese IMEs are a thing, but what user-facing difference does the third-party IME you mention provide in the Hangul context?

lifthrasiir · on Oct 29, 2019

Examples:

- Support for less-popular or customized keyboard layouts. From time to time popular OSes had a wrong version of supported layouts as well.

- Deterministic autocorrection. For example, normally an initial jamo (e.g. ㄱ) should be followed by a medial jamo (e.g. ㅏ) and not vice versa, but many IMEs offer a feature that automatically swaps them. With a carefully designed layout this can catch lots of transposition typos.

- Custom candidates for special characters, as Korean IMEs show special characters as candidates when a jamo is being "converted" to Hanja (popularized by MS IME).

The IME in question [1] also allows an extremely customizable input system, to the extent that it forms a soft of wholesale DSL for Hangul IMEs.

[1] http://moogi.new21.org/en/ngs/index.htm

bszupnick · on Oct 29, 2019

In Israel I've found that Hebrew speakers exclusively write in Hebrew. Aside from letters, there's nothing else needed. One CAN add vowels, but only children's books have vowels so we're used to reading without them.

The Arab world uses a lot more of the Latin alphabet, but they don't write in English. For example مرحبا would turn into mar7aba. Here's a wikipedia article on the phenomenon. https://en.wikipedia.org/wiki/Arabic_chat_alphabet#Levantine...

trnglina · on Oct 29, 2019

I can speak a little on Chinese and Japanese. In neither of these countries, English (or sometimes even Latin) literacy is high enough to replace the domestic language. (And anyway, maybe it's just me, but reading romanized Chinese or Japanese is exhausting.)

With China, the market's so big that there's an entire ecosystem of homegrown (or Chinese-modified) software and websites, everything from office suites to web browsers, and I have to imagine those support Chinese input just fine.

With Japanese, generally the same software and sites are used, but Japanese support is also pretty good across common software.

Some niche or open source software struggle with IMEs in general, but in those cases, the solution is just to not use that software.

Legogris · on Oct 29, 2019

Not that this addresses your main question, but for most latin script languages you'll be fine with always using the altgr-intl variant of en-us.

As for mobile, you can generally find the language-specific ones when pressing and holding some character when you're on a normal qwerty keyboard.

For Scandinavian:

å: altgr + w

ä: algr + q

ö: altgr + p

æ: altgr + z

ø: altgr + l

Should be available on all OSs. I never have to switch layouts when switching between languages. Still struggling with finding an actually nice IME setup for Japanese and Chinese, though, and I've spent some time...

On Linux: setxkbmap -layout us -variant altgr-intl

Regardless, I really think input methods should be kept out of scope for all apps. There's nothing I hate more than app developers trying to "solve" it for me. There are already people who work full-time on this and users who have their systems set up as they want. There's no way an app developer is going to help more users than they break things for.

The only exception I've seen is Google's web CJK input for Translate, but they fall under the "huge amount of resources" category.

Context: I write in Swedish and Japanese on a daily basis, spent some time learning Chinese and sometimes have to retype things from other languages. Using vanilla iOS and Android on mobile, Linux on desktop. CJK has been the only struggle so far. There's enough tech-literate people with those native languages that there definitely are ways to get good configuration for any OS, for Linux you just have to figure out how, which can be tricky to find if you don't read the language.

I can imagine Arabic and Vietnamese can be worse.

alkonaut · on Oct 29, 2019

I don't even have an AltGr button on my keyboard (!). It would be very cumbersome to write swedish with AltGr though, as the characters are quite common.

Legogris · on Oct 29, 2019

Not even a right Alt key - should be the same if you set the layout..? That's a really minimal keyboard, all 60% keyboards I've seen and even some 40% keyboards have that... I'd actually be curious to see your bastard keyboard!

YMMV, but I got used to it pretty quickly and I type Swedish all the time, way less annoying than doing the Alt-shift dance to switch layouts whenever I go from coding to chatting at least ;)

usr1106 · on Oct 29, 2019

I always have an AltGr key, because I tend to program on Finnish keyboards. (Identical layout as Swedish, but because nobody came up with a term to group those 2 together, there seem to be always 2 options to chose that make no difference...) What I really hate about AltGr / this layout is the curly braces and square brackets. It makes programming so much harder to need to use AltGr for each of them.

That said in Linux/X11 you could configure everthing to your likings. I must admit I prefer complaining...

Legogris · on Oct 29, 2019

Yeah, this is exactly the reason I stopped using the Swedish/Finnish layout altogether. It must be hundreds or even thousands of time I accidentally typed :( instead of :) in IMs because someone at some point decided the parentheses had to be shifted one step...

Right-alting for åäö is just way less cumbersome than doing the same for @${}[]\~| or having to switch layouts when switching contexts.

AndriyKunitsyn · on Oct 29, 2019

What are the cases when your keyboard is stuck in English?

In Ukraine, no, we don’t. Kids don’t generally know English enough that they would want to speak it with peers, and there are simply no reasons to do so.

“Translit” (writing Ukrainian/Russian with English characters) was a thing in SMS days, when typing in Cyrillic would shrink your SMS length in half (because that’s how SMS encoding worked, I guess), but now practically nobody uses it anymore.

alkonaut · on Oct 29, 2019

> “Translit” (writing Ukrainian/Russian with English characters) was a thing in SMS days, when typing in Cyrillic would shrink your SMS length in half (because that’s how SMS encoding worked, I guess), but now practically nobody uses it anymore.

Ok that's exactly the type of thing I was thinking about. Funny it was born from the limitation of 140 chars rather than from actual input difficulty though.

aasasd · on Oct 29, 2019

A dude is still sending me messages in translit over the web every time he's in foreign lands, even though a language is added to the OS' switching set in a few clicks these days.

But back in the 90s to early 2000s, translit was pretty big on the internets due to the dozen of different Cyrillic codepages and some software supporting only 7-bit characters (specifically Fidonet exchange software and newsgroup nodes).

Just the fact that I still to this day run into mis-encoded diacritics in movie subtitles, shows how ‘US first’ ASCII caused a lot of headaches for many years.

(BTW: you probably have been told by now, but your nickname is a common slang word in Russia.)

HJain13 · on Oct 31, 2019

In India, it was definitely born because of input difficulty, its such a big thing that It has given rise to Hinglish (Mixing Hindi and English) plus translit is the only way 99% of people type here

boomlinde · on Oct 29, 2019

As a Swede I use a US keyboard layout and a compose key as you've described. alt gr followed by o and " for example. It was cumbersome to write Swedish fluently for a few days, but after that it felt natural. A good boon is that now I also don't have a problem typing a bunch of characters common in other Latin-like scripts like ç, ø or ß.

That said, I type mostly in English anyway. My reason for using a US layout is that it is more ergonomic for bracket/brace/semicolon heavy programming languages. Braces in particular are an annoying chord with Swedish keyboard layouts. Especially on OS X, which for whatever reason uses three part chords (alt+shift+8 or 9).

yosamino · on Oct 29, 2019

> Are there new "hybrid" languages popping up in electronic communication where languages that use non-latin scrpit are written in latin in e.g. text messages?

Yeah, transliterating arabic to english is a thing:

> With smartphones, the relationship between informal, chatty Arabic and formal written forms has become more complicated, giving rise to a hybrid known as “Arabeezy” – or Arabic written with Latin characters and numbers to represent letters that have no English equivalent.

--

https://www.thenational.ae/arts-culture/books/duolingo-s-ara...

Or see this wikipedia article: https://en.wikipedia.org/wiki/Arabic_chat_alphabet

wayne_skylar · on Oct 29, 2019

Is there no agreed upon way to write those letters with an English keyboard? I know for example in German

ä -> ae

ö -> oe

ü -> ue

At least that is what I was taught. When writing Spanish some of my friends have replaced ñ with gn as in Italian or ny as in Catalan.

hsivonen · on Oct 29, 2019

In Sweden and Finland people typically prefer to omit the diaeresis without adding a trailing e. Due to Germany getting their way, though, the American-readable part on passports uses the German-style conversion to ASCII.

goranmoomin · on Oct 29, 2019

> Do kids that speak english communicate more in english in cases where the input doesn't let them communicate easily using their preferred script?

As the input method of CJK languages are significantly different from English (compared to the relatively small difference), every app just allows the locale's input method.

Nobody tries to communicates with only the English keyboard, b.c that is outright impossible (while in Swedish it's possible in worst cases).

alkonaut · on Oct 29, 2019

As I said I wouldn't use Swedish language without a Swedish keyboard, even though it's quite possible (just 3 missing characters basically). What I'd do is switch to english language instead.

goranmoomin · on Oct 29, 2019

> What I'd do is switch to english language instead.

Hmm, I've never encountered situations when only English keyboard is available for communication (except for when someone is setting up a new Linux machine).

But even if there is such situation, I don't think that would happen, I'm not sure if Swedish is similar to English but English is cognitive load here. Even with people that are proficient in English, I cannot imagine myself communicating with the English language.

I remember myself resorting to online keyboards though(while setting up Linux).

alkonaut · on Oct 29, 2019

In industry in Sweden, especially in tech, english is the language used. If you have a team of 10 Swedish developers working on an app, they will most certainly write all user stories/bug reports/specs/code comments etc in 100% english. The 11th person to join might be an english speaker so you can't afford to have your backlog or code in Swedish. The step to using english for email/chat then is very small.

sundarurfriend · on Oct 30, 2019

> Do kids that speak english communicate more in english in cases where the input doesn't let them communicate easily using their preferred script? Are there new "hybrid" languages popping up in electronic communication where languages that use non-latin scrpit are written in latin in e.g. text messages?

Both, in my part of India. The latter used to be much more common in the feature-phone era, writing in Tanglish (Tamil words+English script), but seems to have become much less popular now. Not sure why, perhaps swype-type keyboards have made English so much easier to type.

The former certainly still happens a lot - I know a bunch of people (including myself) who would prefer to communicate in Tamil, but there's enough friction (not having a Tamil keyboard, having to get used to new keyboard layouts, the Tamil keyboard being buggy often enough, ...) to make it more appealing to go with English for normal conversation.

babuskov · on Oct 29, 2019

> I use a Swedish keyboard

AFAICT, Swedish keyboard has a dedicated key for ö. You just press it. I don't understand how an app can get stuck on keyboard layout if you set the system keyboard layout to Swedish. Could you give an example?

Or are you're actually using US keyboard with Swedish layout and there is no ö key on it?

alkonaut · on Oct 29, 2019

There are multiple things in play here: keyboard layout which is slightly different for a swedish keyboard https://en.wikipedia.org/wiki/QWERTY#Swedish and of course keyboard language which maps the key pressed to a character. You can mix and match those. Lacking the Ö key doesn't matter so long as pressing the key that is where the Ö key should be actually produces an Ö (which is the case e.g. with a US keyboard layout but a Swedish keyboard chosen in windows).

> Could you give an example?

It was a hypothetical, but in windows you have a per-app layout (so you can choose multiple keyboard layouts in windows and an icon in the systray will let you choose the active one) and that one often spuriously switches so suddenly you hit the Ö key and it produces a ';'. A better example is perhaps when I'm travelling and composing an email from someone elses computer at my office in Britain back to my office in Sweden. I'm not going to bother switching the windows settings just to write the email on that machine. Or if I'm logging on to a VM with only english layout so hitting my Ö key will produce a ; If I needed to type anything inside that VM desktop for whatever reason, I wouldn't bother trying to write Swedish even if the person I was typing to was Swedish.

> Or are you're actually using US keyboard with Swedish layout and there is no ö key on it?

Personally I actually do use a US keyboard wit ha Swedish layout, which works fine. If I hit the key labelled ';' it will show an Ö on screen. The reason is I wanted a specific type of keyboard that didn't exist with Swedish layout, and the layouts are almost equal. It's just one missing key between lshift and Z that doesn't exist in US layout.

big_chungus · on Oct 30, 2019

I can second that particular grief in windows. I use english input for most things but spanish for a few; it will randomly switch. This is doubly aggravating when I am in, say, vim and it messes everything up before I figure out why things aren't responding. Or, I go to put in a `[`, get nothing, press again and get `''`, or press a key and get `á`. Let alone if I'm trying to type programming-related chars in my spanish keymap (taking notes in markup syntax etc.); I then have to remember modified positions of keys (parentheses are the worst). Dealing with essentially all non-char keys getting hijacked is very confusing; just writing this, I have made several mistakes.

Honestly, English is uniquely suited to computer input. A narrowly-defined charset maps well to a situation where you can only fit so many keys on a keyboard or on a screen. I also sympathize with difficulties implementing languages that have ligatures, right-to-left reading, or other significant differences. Non-discrete chars just don't map well to a world of ones and zeroes, because representing all is the simplest option but has O(n!) complexity in such languages (assuming you can combine all, which is probably not possible, but you get the point). I have a great deal of sympathy for maintaining such complexity for what can be a very small part of your user-base (that is can communicate in only one foreign language).

pmjordan · on Oct 29, 2019

I assume this is about having to use other people’s devices, which aren’t configured how you’d configure your own.

I personally have muscle-memory for my own programming-optimised adaptation of Type 2 Dvorak, regular German layout, and UK/US/International variants of English keyboards, each in Mac and PC variants. And, I suppose, in mobile phone and tablet screen-keyboard variants.

It always takes a few seconds to adapt to whatever I’m sitting in front of, and I have a lot of practice with this compared to the average person.

At least in German, there are standard substitutions for when umlauts and sharp S aren’t available for technical reasons:

ä -> ae

ö -> oe

ü -> ue

ß -> ss (historically “sz“, which you still occasionally see, particularly where there is Hungarian influence; Swiss German tends to forego ß altogether)

kstenerud · on Oct 29, 2019

It happens to my wife's German USB keyboard in Windows 10. One day it's working fine, the next day pressing Z outputs a Y and Ä outputs a semicolon. I've had tons of problems with Japanese input in Linux and Mac (but not in Windows).

Multilanguage support is a function of how many customers demand it, and aside from English and Japanese, there hasn't been enough economic incentive to take it seriously.

LaxisB · on Oct 29, 2019

the windows issue might also be caused by accidentally pressing the hotkey (alt+shift i believe) to switch layouts. it's super easy to mis-trigger

raverbashing · on Oct 29, 2019

This is the correct answer, if you have two keyboards configured it's not uncommon to switch between them accidentally.

big_chungus · on Oct 30, 2019

It's win+space, and I regularly end up switched without pressing it. It triggers in error quite frequently.

hackerNoose · on Oct 29, 2019

"win key + space" lets you change the input method on windows

alkonaut · on Oct 29, 2019

I have done that hundreds of times. I have not done it deliberately even once. It's like playing that U2 album on my phone.

yk · on Oct 29, 2019

The keyboard sends a scancode, and something, either the text input of the OS, or the app themselves, translates that to an ö. Second, it is possible that the system just does not load the correct keyboard layout and defaults to US, so you loose your familiar keyboard layout precisely when you already have other problems.

hyperman1 · on Oct 29, 2019

Parts of the system hardware might also do scancode to text transcoding. When PS/2 to USB keyboard convertor cables were common, I've seen quite a few having trouble with keys not being correct for non-ASCII characters, especially when using Alt GR

kps · on Oct 29, 2019

There isn't text coding. The problem stems from the Microsoft-centric design of USB modifiers — there is no true AltGr code, so they have the convention (at the OS level) that left Alt is always Alt and right Alt is sometimes AltGr. A few early converters didn't grasp that left and right versions of the same modifier key could have different meanings, because nobody in their right mind would do that.

ahje · on Oct 29, 2019

He probably means using the ¨ modifier on a non-native keyboard. I do that daily after I picked up the habit of using Danish keyboards.

For the record, it's a habit you get used to rather fast; nowadays I use the ¨ modifier for ä and ö even when I'm using a Finnish keyboard.

swiley · on Oct 29, 2019

I know in Arabic people sometimes write in a way intended for telegraphy when they use phones and don’t want to or can’t switch IME (the main thing I remember seeing is 3 being used as a letter.)

swebs · on Oct 29, 2019

>(because the app is stuck in an en-US keyboard layout say,

How often does this happen? I use Ubuntu and it seems like every application respects the OS's keyboard settings.

defanor · on Oct 29, 2019

While spending a lot of time with monospace fonts and mostly ASCII characters (programming and writing, terminal emulators, IRC/mail/MUDs/feeds in Emacs) and working on a hobby project involving text rendering and selection (with potentially proportional fonts and Unicode), I keep wondering whether it's even worth all the trouble.

In addition to what's mentioned in those articles, on larger documents rendering speed also matters; word cache (in addition to mentioned glyph cache) helps somewhat, but complicates other things even further, and calculating total document height (for scrolling) still requires to calculate all the positions, wrap the lines, etc, which in turn requires to render all the text first. Perhaps one can introduce yet another mechanism to only estimate that at first, and then elaborate the estimation in background, but that's yet another opportunity for bugs to creep in (and additional complication, of course). On the contrast, the programs that just use character grids (and perhaps don't handle bidirectional texts well) can be both faster and simpler, while still are quite capable of rendering texts, even with various styles/decorations and occasional images, in many languages.

There seems to be plenty of accidental complexity even in handwriting, but once it is used for computing and text manipulations are involved, the task appears to be much harder than it has to be.

goranmoomin · on Oct 29, 2019

> While spending a lot of time with monospace fonts and mostly ASCII characters (programming and writing, terminal emulators, IRC/mail/MUDs/feeds in Emacs) and working on a hobby project involving text rendering and selection (with potentially proportional fonts and Unicode), I keep wondering whether it's even worth all the trouble.

Yeah... and that's why FireFox & Eclipse still doesn't get CJK character input right. The percentage of people that must use characters not included in ASCII is much bigger than people who do. It is worth the trouble, please consider users outside of the US & England.

j-pb · on Oct 29, 2019

I think it would have been better to have every body just learn english as their second (computer usage) language. I'm german and my english was horrible until I switches my computer and my websites to english ones.

Imagine a world where everybody speaks at least one common tongue, we could have so much more peace and understanding.

English becoming the esperanto of the world is an accident but I don't care which language it is, as long as it has less than 255 letters. (Hawaiian would have fit into a nibble which is cool)

tralarpa · on Oct 29, 2019

> better to have every body just learn english as their second (computer usage) language

Why not Chinese? It would have many advantages:

- Text layout would be much easier (Chinese fonts are mostly mono-space).

- In addition to the first point, Chinese characters can be easily used for vertical text. Looks much nicer in narrow, tall buttons than latin letters.

- Chinese characters are (mostly) pronunciation independent. No they/there/their anymore.

- It has been shown that dyslexic children have less problems recognizing Chinese characters than latin letters.

- Chinese characters are part of the culture heritage of billions of people (both Chinas, Japan, Korea) and not just something they learn at the university.

- Old-school C enthusiasts can now have meaningful variables names while keeping their self-imposed three-character limit (or was it four?)

- Finally, everybody will use wchar_t

- Edit: But there is one disadvantage. There is no Chinese character for "/s" which some people here seem to need

j-pb · on Oct 29, 2019

Logographic writing systems are just a pain in the ass. You have to create huge fonts, you waste ridiculous amounts of space, and you need phonetic descriptions of new words anyways.

Like I said, Hawaiian would be cool since it fits inside a nibble with 16 characters, although the words tend to get long.

K0SM0S · on Oct 29, 2019

Considering how the brain seems to process words when reading ('glancing in context' more than analyzing each letter), I'm pretty sure the length of words does not make reading much longer. Writing is another matter but with auto-predict you could probably shove half the typing off on average.

As a software engineer and cousin of Stitch, Hawaiian gets my vote! :handclap:

munmaek · on Oct 29, 2019

A /s is necessary when dealing with the latin alphabet hegemony ;-)

goranmoomin · on Oct 29, 2019

> I think it would have been better to have every body just learn english as their second (computer usage) language.

Hmm... it might not be well known to the western world, but AFAIK most parts of the world learns English as their second language. At least East Asia learns English pervasively throughout your life. We start learning English when we're five.

But still, English is not the only language, and we want to communicate with other people. Just that people can use English doesn't mean they prefer English & Latin characters to communicate. You can't just er... force people use Latin characters for the sake of programmers' comfort. Will you use your terminal if the terminal demands you to press space & backspace everytime when you're typing in English characters?

j-pb · on Oct 29, 2019

I know that many parts of the world including Asia learn english, thats why I think it should be used as the langua franca, and thats why I said, that I don't care which language it is, as long as it doesn't have weird ligatures or a large amount of characters. For what it's worth I'd remove all characters that are not from that "computer language" including the german ones.

This is more than programmer convenience, this is about system simplicity, absolute correctness (too many bugs because of unicode and complex layouting) and above all about _forcing_ people to use said lingua franca to become comfortable speaking it.

If you want your own writing system, fine, but put the effort in and build it yourself apart from the common bug free code base that the world shares, it should be less buggy overall anyways, than a system which tries to speak every language and support every writing system.

goranmoomin · on Oct 29, 2019

> This is more than programmer convenience, this is about system simplicity, absolute correctness (too many bugs because of unicode and complex layouting) and above all about _forcing_ people to use said lingua franca to become comfortable speaking it.

The reason the writing system in computers are simple is because the latin characters are simple, and the majority of early computer users (which the majority of was English speakers) didn't feel the need of sophisticated systems.

I (often) find that complexity that is inside the western culture is warranted, while complexity not in western culture is not.

While not a great analogy, think of TTS. English TTS is (at least from what I have heard) not that easy (encodable in logic, but not as simple as mapping characters to audio). That resulted in some sophisticated TTS systems, where it can handle lots of different phonetic structures.

Compare this to Hangul (the Korean character system) where every character is composed of several mini-characters which have a unique mapping to audio. Basically for a minimal viable product you can just convert text to decomposed format(NFD) and map the char points to audio.

Now, let's say that some company (an arbitrary choice, Facebook) decided to make a global TTS product, first in Korea. They just decided to map the char points to audio and post-process the audio. This TTS architecture obviously can't handle English without some major architecture changes. Then Facebook decides, this is a problem in English where the phonetic structure is unnecessarily complex, and won't provide TTS service to English users.

This is something similar to what CJK users face, that people just won't make products that work reliably with CJK.

j-pb · on Oct 29, 2019

As I already argued in a different comment, I find Hangul to be more elegant than latin, even though I think that the syllabic block notation would add unnecessary complexity to an implementation.

The thing with Korean is that "nobody" speaks it. Had early computers been developed in Korea and would everybody speak Korean I would argue now that everybody should just learn Korean. But tough luck, it didn't. Our best shot at a lingua franca and simple computer systems(simple in the sense; as little complexity as possible) is with english.

lifthrasiir · on Oct 29, 2019

I can understand (but don't fully agree to) a claim like that a F/OSS software should be primarily in English to maximize its community, as many programmers already speak English to some extent out of necessity anyway. But forcing English to ordinary people not using English? Plain absurd.

For starters, while young people may feel more comfortable with occasional English older people don't, and your proposal will make them much more uneasy. Smartphone penetration rate around the world [1] can easily reach 50% for many countries and that would include many people unable to understand English---I live in Korea and older people are technologically no inferior to younger people in that regard. Impossible when you force them to use English.

[1] https://en.wikipedia.org/wiki/List_of_countries_by_smartphon...

K0SM0S · on Oct 29, 2019

> I live in Korea and older people are technologically no inferior to younger people in that regard.

You guys truly are wonderful unicorns. How do you do it? (serious question in that regard)

On topic, I think the ultimate solution is to have whatever language input/output be essentially a placeholder, and then some general "switching" system which puts whatever language a user prefers. It should be able to switch on-the-fly totally independently of application state. (same thing for numbers, outputting decimal or hexadecimal should happen on-the-fly, same model value but different user view)

Commercial OS's tend to be pretty invasive these days, but the one thing where we need more of them, and better tooling, is definitely text i/o.

j-pb · on Oct 29, 2019

Then have local operating systems for them in Hangul (Hangul would also be a nice lingua franca writing system, too bad its very local).

I think you'd be surprised how well even older people can learn a new language when required to.

I doubt that understanding new technology is easier than learning a new language.

And all recent studies show that "adults don't learn languages well" is pretty much a myth.

https://medium.com/@chacon/mit-scientists-prove-adults-learn...

https://www.zdnet.com/article/how-semi-literate-children-in-...

http://www.hole-in-the-wall.com/docs/paper13.pdf

lifthrasiir · on Oct 29, 2019

Adults do learn languages well when required, sure. I question that we should. Making additional [1] billions of people learn a new language is not cost-effective compared to a (comparably) small number of software engineers wrestling with i18n (haha). It is sad that English knowledge is very valued among, for example, several non-English [EDIT] workplaces even when it is not required at all.

> Hangul would also be a nice lingua franca writing system, too bad its very local

By the way, Hangul was tailor made for Korean with its relatively simple CGVC syllable structure. I doubt it can be generalized much---Hangul is important because it is one of the first featural alphabets, not because it is a universal (or at least adaptable) featural alphabet where it isn't.

[1] English has about 2 billion speakers (400M native + 750M L2 + 700M foreign), so you need another billion speakers or two to make your proposal real.

[EDIT] was "CJK", but I'm not sure about Chinese and Japanese.

j-pb · on Oct 29, 2019

Instead of the total number of speakers the better metric would be the diversity of the speakers.

A language is a better candidate for a lingua franca if its spoken everywhere a little, than somewhere a lot, e.g. Mandarin.

And this is not about cost, but about Bugs in critical infrastructure. I'd get rid of smartphones and unicode in a heartbeat if it meant bug free command line applications, no malware, and medical and infrastructure that doesn't crash.

I think its great when non english workplaces require english. It gives people an incentive, because humans are a lazy species. And once you learned it you can use it on your next vacation, your next online encounter, and who knows where.

Having computers only do english would have given a similar incentive, and I'm sad we didn't use that opportunity.

Scarblac · on Oct 29, 2019

English is the lingua franca, but not more than that. Most communication is still done and will always be done in local languages, and computers should work for them.

j-pb · on Oct 29, 2019

Then have local computers, we need more software and hardware diversity anyways.

concordDance · on Oct 29, 2019

They wouldn't be forced, as if they want to use a non-ascii language then they can develop it themselves or pay the absurd amount of money needed to support their language. We have no obligation to give hundreds of thousands of dollars of our time to people for free.

hevi_jos · on Oct 29, 2019

You are highly biased as a german speaker.

English is a germanic language. You say "gut" instead of "good", "zu" instead of "to", "wir" instead of "we", "Wetter" instead of "Weather", "nichts zu tun" instead of "nothing to do".

You will not be doing as much effort as someone from a totally different native language. The CIA calculated that it requires over 4 times the effort to learn Chinese for an English speaker than learning French. It also applies back, a Chinese will find it very hard to learn proper English.

Learning English for a german student is as easy as taking the train and spending some days in England.

Now for most Chinese, or even Indian(they have more than 20 languages) people, or Arabic,it is not so easy.

The fact is that they are in a disadvantage,not in a leveled field and they don't like it.

They are the majority of the population in the world, so if they develop their economies they will force everybody else to learn their native language, not the other way around.

j-pb · on Oct 29, 2019

By then everybody will speak enough english that it won't matter. The economies of the network effect are quite simple.

And of course I'm at an advantage, the very fact that I'm german would put me at an advantage even if english was as far away as possible from german, because I have the resources, I have free education and I have free medicare, plus 400 years of the subjugation of the third world in my back. So what. Life isn't fair, the worlds top 30 People own as much as the bottom 3 Billion, thats unfair.

Instead of bickering about which language would be fairest to adopt you should help make everyone adopt SOME language so that they can organize and exchange. So that they can understand for themselves what other are saying instead of having to rely on potentially biased news. So that they can talk to each other, for sympathy and empathy and understanding.

My bet is on English for that one.

Plus it seems that it's not so hard after all:

http://www.hole-in-the-wall.com/Beginnings.html

wruza · on Oct 29, 2019

That would mean throwing away a heap of literature (books, songs, etc) and legal documents, which are hard to translate respectfully. And english isn’t an excessive language itself, for better or worse. Also, the idea that switching to esperanto-english subset would lead to more peace and understanding seems at least debatable to me.

At the bottomline I find it amazing that a human managed to draw symbols in columns or rows in any direction for centuries, but computers still have trouble with it. Maybe we should blame computer makers, not languages?

j-pb · on Oct 29, 2019

Not everything needs to be digitized. My life was happier when bookstores were still around. And when I had paper forms to fill for bureaucracy, because it meant that there was human compassion and wit involved, something that is required for fairness. A digital application of laws does the law no justice.

As for writing direction: Simple tools allow for complex creation. Complex tools allow you to only create simple things.

And boy are computers complex.

AnIdiotOnTheNet · on Oct 29, 2019

> I think it would have been better to have every body just learn english as their second (computer usage) language.

That seems a little extreme to me. It seems like it might have been better to just adapt writing systems for computer usage. Originally a lot of computer systems didn't have lower case letters, japanese tended to just use katakana, etc. Sure, it kinda forces everyone into an alphabet or syllabary, but language adapting to medium is hardly a new thing.

defanor · on Oct 29, 2019

While it sounds nice to me too (and neither am I a native English speaker), apparently there was a bit of misunderstanding in another comment, and possibly here too: I rather wonder about a hypothetical more efficient (and computing-oriented) writing system, not just/necessarily/only using monospace latin characters. It doesn't seem very realistic that something like that would happen (judging by the mess that are date formats and units of measurement, which seem much easier to fix), but observing the complexity that doesn't come from fundamental constraints, it's hard to not wonder about that.

j-pb · on Oct 29, 2019

Hawaiian has 16 Characters, Hangul was designed to be easy to learn, Lojban is based on Predicate logic.

What they all lack is reach, and thats the most important feature when it comes to a common tongue. (That and having a phonetical system ^^')

int_19h · on Oct 29, 2019

Right or wrong, what the grandparent suggests supports a lot more than ASCII and English. It's not even limited to Western scripts.

be5invis · on Oct 29, 2019

The story about line wrapping is mainly that, you:

  · Break the text into runs.
  · Calculate BiDi at paragraph level.
  · Shape every run to get its length and possible line breaking points.
  · Arrange the runs (now only have length, height, text isnide it and some breaking properties — glyphs and glyph positions are dropped) into lines.

Such process is called “measuring”. And the real text display is delayed until a line become visible: you fetch out the text runs in the line and do the real display.

In most times (like appending text) you do not to do much recalculation.