I'm new to Pebble and have been excited about joining the community; I have a Pebble Time 2 on preorder. I will certainly cancel the pre-order unless Rebble affirmatively says they are satisfied with the arrangement.
I'm in the exact same position. It's beyond belief that the new (hardware) company wouldn't see itself in long-term collaboration with the community organization (providing services/platform).
Indeed, it bodes rather poorly for the sustainability of Core if they're already behaving like owning everything is critical to satisfying some hypergrowth checkbox. I kind of thought the whole point of the new organization was not to be another startup and to rather to be more like a scaled cottage industry player, making a niche product for nerds and selling it directly to them for a reasonable upfront profit margin rather than depending on collecting rent from a closed app ecosystem to pay the bills.
I thought this [1] New Yorker profile of the chief justice of Brazil's Supreme Court was a fascinating and thoughtful analysis of how tech giants interact with less-powerful countries. Surely we all agree that free speech is not absolute (e.g. we could probably agree that there should exist some boundary with respect to libel, threats/violent speech acts, national security, corporations as legal persons with free speech rights, the right or duty of platforms to regulate content, influence of money in politics...) and that therefore states have a legitimate interest in regulating free speech.
The "free speech" of tech platforms also comes with colonial power structures in which the tech company makes these decisions and imposes them on countries.
I agree that it can be helpful to think of identity as a trajectory shaped by interactions along the way. However, we also continually shape our environments in large and small ways. TFA ignores this completely. Can this be effectively modeled in RL?
Over 130 years ago, Dewey [1] criticized the model of psychology which looked at human behavior in terms of stimulus -> internal processing -> response. Stimuli don't just come to us; we seek them out and modify the world around us to cause them to occur. Dewey and other pragmatists proposed reframing stimulus/response in terms of "acts" or "habits," or changes to the unified agent+environment. Popper was getting at the same entanglement of agent and environment in "Three Worlds" and Simon in "The sciences of the artificial."
I see RL as an elaboration of the stimulus/response paradigm: the agent is discrete from the environment. Does RL work well in an environment like Minecraft, where the real game is modifying the relationship between actions and future states? What about in contexts like Twitter, where you're also modifying the value function (e.g. by cultivating audiences or by participating in a thread in a way which conditions the value function of future responses)?
"I agree that it can be helpful to think of identity as a trajectory shaped by interactions along the way. However, we also continually shape our environments in large and small ways. TFA ignores this completely. Can this be effectively modeled in RL?"
You don't need to. All that is necessary for an attraction basin to emerge is an iterative system. If you prefer to model the human being and their entire environment rather than the human being and their input, you'll still get attraction basins. You'll just get two views on the same reality, suitable for different uses and different understandings, but it's not like "ah, if we model a human iterations we get these attraction basins but if we include environmental interactions suddenly we get a uniformly random distribution of personalities across the total personality space, it's all totally different once you consider the environment as part of the iterative system too".
Thanks; I agree--both that you could train an agent in these situations, and that "You'll just get two views on the same reality, suitable for different uses and different understandings." I think the latter seriously undercuts the article's attempt to explain these trajectories in terms of personality; they could just as easily be attributed to the power of culture or social structure.
Heh, well, another lesson from chaos mathematics is that in iterative systems, you don't really get "explanations" the way we humans like to think of them... the answer to "what caused X" for any X than has taken a long time to develop is "everything". So rather than culture "or" social structure, I'd say "and", "and" also a lot of other things, and also the culture and social structure are themselves affected by the very personality structures we're trying to discuss.
Determining "causes" isn't as hopeless as that makes it initially sound, but you need something more sophisticated than the normal human concept of "cause" to even approximate useful answers. The good news is, this isn't impossible; we all live in an iterative world and we operate in it even so, which requires us to have certain models that conform to the world. It's one of those cases where I don't really love the "humans are just horribly irrational" gloss; our instincts and intuitions often have greater rationality than we realize, because they were formed in this iterative world, and sometimes it is in fact the particular naive concept of "rationality" we are trying to measure them by that is deficient, whereas if you use a more sophisticated one we look less bad.
(But sometimes humans just act suboptimally, no question about that.)
Another thing that helps is that you aren't generally interested in modelling the entire system. For considering myself and whether I may want to, as the article discusses, make changes in myself, I can take my culture and environment more-or-less as a given; I need some flex to consider options like "well what if I just up and moved to another country?", but I don't need to consider my own effects on society very much because they are some complex combination of "tiny" and "utterly unpredictable". While society is chaotic, the time frame of the impact on society from me changing from excessively introverted to somewhat less introverted is way, way past my horizon for making decisions.
I agree that the discussion in the blog post is incomplete because it does not consider that we shape the environments that shape us, though it does briefly touch on the fact that other RL agents (people) try to shape us, and we them. But it is certainly more than that.
One thing I've wondered for a while: Is there a principled reason (e.g. explainable in terms of embedding training) why a vector's magnitude can be ignored within a pretrained embedding, such that cosine similarity is a good measure of semantic distance? Or is it just a computationally-inexpensive trick that works well in practice?
For example, if I have a set of words and I want to consider their relative location on an axis between two anchor words (e.g. "good" and "evil"), it makes sense to me to project all the words onto the vector from "good" to "evil." Would comparing each word's "good" and "evil" cosine similarity be equivalent, or even preferable? (I know there are questions about the interpretability of this kind of geometry.)
Some embedding models are explicitly trained on cosine similarity. Otherwise, if you have a 512D vector, discarding magnitude is like discarding just a single dimension (i.e. you get 511 independent dimensions).
This is not quite right; you are actually losing information about each of the dimensions and your mental model of reducing the dimensionality by one is misleading.
Consider [1,0] and [x,x]
Normalised we get [1,0] and [sqrt(.5),sqrt(.5)] — clearly something has changed because the first vector is now larger in dimension zero than the second, despite starting off as an arbitrary value, x, which could have been smaller than 1. As such we have lost information about x’s magnitude which we cannot recover from just the normalized vector.
Well, depends. For some models (especially two tower style models that use a dot product), you're definitely right and it makes a huge difference. In my very limited experience with LLM embeddings, it doesn't seem to make a difference.
Magnitude is not a dimension, it’s information about each value that is lost when you normalize it. To prove this normalize any vector and then try to de-normalize it again.
Magnitude is a dimension. Any 2-dimensional vector can be explicitly transformed into the polar (r, theta) coordinate system where one of the dimensions is magnitude. Any 3-dimensional vector can be transformed into the spherical (r, theta, phi) coordinate where one of the dimensions is magnitude. This is high school mathematics. (Okay I concede that maybe the spherical coordinate system isn't exactly high school material, then just think about longitude, latitude, and distance from the center.)
There's something wrong with the picture here but I can't put my finger on it because my mathematical background here is too old. The space of k dimension vectors all normalized isn't a vector space itself. It's well-behaved in many ways but you lose the 0 vector (may not be relevant). Addition isn't defined anymore, and if you try to keep it inside by normalization post addition, distribution becomes weird. I have no idea what this transformation means for word2vec and friends.
But the intuitive notion is that if you take all 3D and flatten it / expand it to be just the surface of the 3D sphere, then paste yourself onto it Flatland style, it's not the same as if you were to Flatland yourself into the 2D plane. The obvious thing is that triangles won't sum to 180, but also parallel lines will intersect, and all sorts of differing strange things will happen.
I mean, it might still work in practice, but it's obviously different from some method of dimensionality reduction because you're changing the curvature of the space.
The space of all normalized k-dimensional vector is just a unit k-sphere. You can deal with it directly, or you can use the standard inverse stereographic projection to map every point (except for one) onto a plane.
> triangles won't sum to 180
Exactly. Spherical triangles have the sum of their interior angles exceed 180 degrees.
> parallel lines will intersect
Yes because parallel "lines" are really great circles on the sphere.
So is it actually the case that normalizing down and then mapping to the k-1 plane yields a useful (for this purpose) k-1 space? Something feels wrong about the whole thing but I must just have broken intuition.
So I first learned about cosine similarity in the context of traditional information retrieval, and the simplified models used in that field before the development of LLMs, TensorFlow, and large-scale machine learning might prove instructive.
Imagine you have a simple bag-of-words model of a document, where you just count the number of occurrences of each word in the document. Numerically, this is represented as a vector where each dimension is one token (so, you might have one number for the word "number", another for "cosine", another for "the", and so on), and the magnitude of that component is the count of the number of times it occurs. Intuitively, cosine similarity is a measure of how frequently the same word appears in both documents. Words that appear in both documents get multiplied together, but words that are only in one get multiplied by zero and drop out of the cosine sum. So because "cosine", "number", and "vector" appear frequently in my post, it will appear similar to other documents about math. Because "words" and "documents" appear frequently, it will appear similar to other documents about metalanguage or information retrieval.
And intuitively, the reason the magnitude doesn't matter is that those counts will be much higher in longer documents, but the length of the document doesn't say much about what the document is about. The reason you take the cosine (which has a denominator of magnitude-squared) is a form of length normalization, so that you can get sensible results without biasing toward shorter or longer documents.
Most machine-learned embeddings are similar. The components of the vector are features that your ML model has determined are important. If the product of the same dimension of two items is large, it indicates that they are similar in that dimension. If it's zero, it indicates that that feature is not particularly representative of the item. Embeddings are often normalized, and for normalized vectors the fact that magnitude drops out doesn't really matter. But it doesn't hurt either: the magnitude will be one, so magnitude^2 is also 1 and you just take the pair-wise product of the vectors.
> the reason the magnitude doesn't matter is that those counts will be much higher in longer documents ...
To be a bit more explicit (of my intuition). The vector is encoding a ratio, isn't it? You want to treat 3:2, 6:4, 12:8, ... as equivalent in this case; normalization does exactly that.
Dunno if I have the full answer, but it seems in high dimensional spaces, you can typically throw away a lot of information and still preserve distance.
The J-L lemma is at least somewhat related, even though it doesn't to my understanding quite describe the same transformation.
When I dabbled with latent semantic indexing[1], using cosine similarity made sense as the dimensions of the input vectors were words, for example a 1 if a word was present or 0 if not. So one would expect vectors that point in a similar direction to be related.
I haven't studied LLM embedding layers in depth, so yeah been wondering about using certain norms[2] instead to determine if two embeddings are similar. Does it depends on the embedding layer for example?
Should be noted it's been many years since I learned linear algebra, so getting somewhat rusty.
Would it be fair to think about this as a shim whose scope of responsibility will (hopefully) shrink over time, as command line utilities increasingly support JSON output? Once a utility commits to handling JSON export on its own, this tool can delegate to that functionality going forward.
It would but I can still see somebody launching this with great enthusiasm and then losing the passion to fix Yet Another Parsing Bug introduced on a new version of dig
`jc` author here. I've been maintaining `jc` for nearly four years now. Most of the maintenance is choosing which new parsers to include. Old parsers don't seem to have too many problems (see the Github issues) and bugs are typically just corner cases that can be quickly addressed along with added tests. In fact there is a plugin architecture that allows users to get a quick fix so they don't need to wait for the next release for the fix. In practice it has worked out pretty well.
Most of the commands are pretty old and do not change anymore. Many parsers are not even commands but standard filetypes (YAML, CSV, XML, INI, X509 certs, JWT, etc.) and string types (IP addresses, URLs, email addresses, datetimes, etc.) which don't change or use standard libraries to parse.
Additionally, I get a lot of support from the community. Many new parsers are written and maintained by others, which spreads the load and accelerates development.
Would you recommend any of Child's books to a vegetarian home cook today? (If not, would you happen to have other recommendations for French vegetarian cooking, leaning toward seasonal produce, for the soft of home cook/baker who prefers to weigh their dry ingredients?)
This is an insightful comment. I have been thinking about why I prefer CLI-based tools over GUI tools; one important difference is that with CLI the affordances are more in my head than in the tool (though the distinction is fuzzy). I just got a typewriter for my birthday, and have been reflecting on how different writing feels. When I write on my laptop, I process language less in my head and more on the screen--I type scattered fragments and then clean them up into sentences and paragraphs. Writing with the typewriter is slower and much harder to edit, so I need to do more composition in my head. I lose some of the affordances of the external medium (including whatever AI might contribute), but necessarily devote more attention to interrogating and composing the ideas.