Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>These general data models start to become useful and interesting at around a trillion edges

That is a wild claim. Perhaps for some very specific definition of "useful and interesting"? This dataset is already interesting (hard to say whether it's useful) at a much tinier scale.



It was a widely observed heuristic going back to the days when the Semantic Web was trendy. The underlying reason is also obvious once stated.

Almost every non-trivial graph data model about the world is a graph of human relationships in the population. If not directly then by proxy. Population scale human relationship graphs commonly pencil out at roughly 1T edges, a function of the population size. It is also typically the highest cardinality entity. Even the purpose isn’t a human relationship graph, they all tend to have one tacitly embedded with the scale implied.

If you restrict the set of human entities, you either end up with big holes in the graph or it is a graph that is not generally interesting (like one limited to company employees).

The OP was talking about generalizing this to a graph of people, places, events, and organizations, which always has this property.

It is similar to the phenomenon that a vast number of seemingly unrelated statistics are almost perfectly correlated with GDP.


This is not a "general purpose data model", though. A better example would be Wikidata which at about 100M nodes and 1B edges (so orders of magnitude less than that 1T claim) is already enabling plenty of useful queries about all sorts of publicly-available data and entities.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: