Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Coordination with the real world is and always has been the pain point with STM. It's fine if your STM universe lives entirely in a subset of your data on a single machine [1], or within a single database [2], or with a single decision-making unit [3].

Most languages have extremely poor support for separating interactions with the real world from interactions that occur entirely within one serializable context (a thread working with thread-local memory, for example). Can you safely replay arbitrary C, C++, etc.? No, because side-effecting code could run at any time and occur in any context. So that is one problem that has to be solved first.

Suppose you've solved that problem. Well, now for a distributed system, you need to make STM talk to STM. Make one module talk to another over a network, or a filesystem, or another database, or a client browser. Do you have STM working in the JavaScript running on client's machines? And even if you managed that feat, do you have end-to-end STM from your datastore to your client's actions?

Distributed STM that wasn't painful to use, either to write or in terms of performance, would be a sort of holy-grail of distributed computing. I don't think any language or toolchain is there yet.

[1] Haskell, Clojure, et al STM engines.

[2] SQL-compliant relational databases such as DB2, Oracle, SQL Server, as well as distributed databases like HyperDex that support true transactions.

[3] Paxos, Raft, and other decision-making algorithms only ever externally appear to be consistent, but are internally complex and might have an internal tug-of-war.



> Can you safely replay arbitrary C, C++, etc.? No, because side-effecting code could run at any time and occur in any context. So that is one problem that has to be solved first.

I spend my time solving this problem. You can do it with a programming model especially designed for it. Functional programming is not required, though can be convenient.

> Well, now for a distributed system, you need to make STM talk to STM. Make one module talk to another over a network, or a filesystem, or another database, or a client browser. Do you have STM working in the JavaScript running on client's machines? And even if you managed that feat, do you have end-to-end STM from your datastore to your client's actions?

STM is the wrong way of thinking about this problem, mainly because replay is not an intrinsic part of the paradigm (instead, users manage that themselves). Rather, go back further to Jefferson's virtual time/Time Warp system [1], which was designed specifically in the context of distributed systems.

[1] http://dl.acm.org/citation.cfm?id=3988

There are also many related systems like Concurrent Revisions, LVars, Bloom, and so on...


> Can you safely replay arbitrary C, C++, etc.? No, because side-effecting code could run at any time and occur in any context. So that is one problem that has to be solved first.

With few restrictions, it's actually possible to replay arbitrary assembly. Namely, don't care about timing and no system calls that can't be replayed and you basically have what "checkpoint" does in gdb.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: