Hacker Newsnew | past | comments | ask | show | jobs | submit | shahbazac's commentslogin

Please provide an example on the first page.


I’ve often wondered about languages like APL/k, are the programmers actually able to think about problems more efficiently?


As a kdb+/Q programmer I would say it depends on the type of problem.

For example, when working with arrays of data it certainly is easier to think and write “avg a+b” to add two arrays together and then take the average.

In a non-array programming language you would probably first need to do some bounds checking, then a big for loop, a temporary variable to hold the sum and the count as you loop over the two arrays, etc.

Probably the difference between like 6ish lines of code in some language like C versus the 6 characters above in Q.

But every language has features that help you reason about certain types of problems better. Functional languages with algebraic data types and pattern matching (think OCaml or F#) are nicer than switch statements or big if-else-if statements. Languages with built-in syntactic sugar like async/await are better at dealing with concurrency, etc.


Well no, not in a non-array programming language. In any language that has a semi-decent type/object system and some kind of functional programming support, `avg a+b` would just be `avg(a, b)`, which is not any easier or harder, with an array type defined somewhere. Once you make your basic array operations (Which they have to be made in q anyways, just in the stdlib), you can compose them just like you would in q, and get the same results. All of the bounds checking and for-loops is unnecessary, all you really need are a few HKTs that do fancy maps and reduces, which the most popular languages already have.

A very real example of this is Julia. Julia is not really an array-oriented programming language, it's a general language with a strong type system and decent functional programming facilities, with some syntactic sugar that makes it look like it's a bit array oriented. You could write any Q/k program in Julia with the same complexity and it would not be any more complex. For a decently complex program Julia will be faster, and in every case it will be easier to modify and read and not any harder to write.


Why would it be avg(a, b)?

What if I want to take the average difference of two arrays?


mean(a - b)


I don't know what you mean by the q array operations being defined in the standard library. Yes there are things defined in .q, but they're normally thin wrappers over k which has array operations built in.


I don't consider an interpreted language having operations "built-in" be significantly different from a compiled language having basic array operations in the stdlib or calling a compiled language.


Hmm, why not? Using K or a similar array language is a very different experience to using an array library like numpy.


It is syntactically different, not semantically different. If you gave me any reasonable code in k/q I'm pretty confident I could write semantically identical Julia and/or numpy code.

In fact I've seen interop between q and numpy. The two mesh well together. The differences are aesthetic more than anything else.


There are semantic differences too with a lot of the primitives that are hard to replicate exactly in Julia or numpy. That's without mentioning the stuff like tables and IPC, which things like pandas/polars/etc don't really come close to in ergonomics, to me anyway.


Do you have examples of primitives that are hard to replicate? I can't think of many off the top of my head.

> tables and IPC

Sure, kdb doesn't really have an equal, though it is very niche. But for IPC I disagree. The facilities in k/q are neat and simple in terms of setup, but it doesn't have anything better than what you can do with cloudpickle, and the lack of custom types makes effective, larger-scale IPC difficult without resorting to inefficient hacks.


None of the primitives are necessarily too complicated, but off the top of my head things like /: \: (encode, decode), all the forms of @ \ / . etc, don't have directly equivalent numpy functions. Of course you could reimplement the entire language, but that's a bit too much work.

Tables aren't niche, they're very useful! I looked at cloudpickle, and it seems to only do serialisation, I assume you'd need something else to do IPC too? The benefit of k's IPC is it's pretty seamless.

I'm not sure what you mean by inefficient hacks, generally you wouldn't try to construct some complicated ADT in k anyway, and if you need to you can still directly pass a dictionary or list or whatever your underlying representation is.


> None of the primitives are necessarily too complicated, but off the top of my head things like /: \: (encode, decode), all the forms of @ \ / . etc, don't have directly equivalent numpy functions. Of course you could reimplement the entire language, but that's a bit too much work.

@ and . can be done in numpy through ufunc. Once you turn your unary or binary function into a ufunc using food = np.frompyfunc, you then have foo.at(a, np.s_[fancy_idxs], (b?)) which is equivalent to @[a, fancy_idxs, f, b?]. The other ones are, like, 2 or 3 lines of code to implement, and you only ever have to do it once.

vs and sv are just pickling and unpickling.

> Tables aren't niche,

Yes, sorry, I meant that tables are only clearly superior in the q ecosystem in niche situations.

> I looked at cloudpickle, and it seems to only do serialisation, I assume you'd need something else to do IPC too? The benefit of k's IPC is it's pretty seamless.

Python already does IPC nicely through the `multiprocess` and `socket` modules of the standard library. The IPC itself is very nice in most usecases if you use something like multiprocessing.Queue. The thing that's less seamless is that the default pickling operation has some corner cases, which cloudpickle covers.

> Im not sure what you mean by inefficient hacks, generally you wouldn't try to construct some complicated ADT in k anyway, and if you need to you can still directly pass a dictionary or list or whatever your underlying representation is.

It's a lot nicer and more efficient to just pass around typed objects than dictionaries. Being able to have typed objects whose types allow for method resolution and generics makes a lot of code so much simpler in Python. This in turns allows a lot of libraries and tricks to work seamlessly in Python and not in q. A proper type system and colocation of code with data makes it a lot easier to deal with unknown objects - you don't need nested external descriptors to tag your nested dictionary and tell you what it is.


Again, I'm not saying anything is impossible to do, it's just about whether or not it's worth it. 2 or 3 lines for all types for all overloads for all primitives etc adds up quickly.

I don't see how k/q tables are only superior in niche situations, I'd much rather (and do) use them over pandas/polars/external DBs whenever I can. The speed is generally overhyped, but it is significant enough that rewriting something from pandas often ends up being much faster.

The last bits about IPC and typed objects basically boil down to python being a better glue language. That's probably true, but the ethos of array languages tends to be different, and less dependent on libraries.


Which is why C# is the giant ever increasing bag of tricks that it is (unkind people might say bloat…) ;-) Personally, I’m all for this; let me express the problem in whatever way is most natural.

There are limits, of course, and it’s not without downsides. Still, if I have to code in something all day, I’d like that “something” be as expressive as possible.


For some classes of problems that are easily vectorized, using an array-focused language can certainly make thinking about them and their solutions more efficient, since you can abstract over the data structure and iteration details.

As a quant, I used kdb+/q quite a bit for 5+ years for mid-frequency strategies, but as I moved towards higher frequency trading that required calculations on the order book that couldn't be easily or efficiently vectorized, then continuing to use array-focused languages would have only complicated reasoning about those problems.


What did you switch to after that?


I went to this tech talk on Dyalog (a modern APL-like language), and the speaker makes the argument that the notation allows certain idioms to be recognized more easily:

https://youtu.be/PlM9BXfu7UY?si=ORtwI1qmfmzhJGZX&t=3598

This particular snippet was in the context of compilers, but the rest of the talk has more on Dyalog and APL as a system of mathematical notation. The underlying theme is that optimizing mathematical expressions may be easier than optimizing general code.


Hillel Wayne writes about it on his newsletter every once in a while. He's convinced me that he does in fact think through some problems better in array languages but I still can't really conceive of what that experience is like.


there are several open-source K environments available, some which even run in the browser:

http://johnearnest.github.io/ok/index.html

if it's something you're interested in trying i'd be happy to point you toward more resources, and i'm sure there are plenty of other arraylang tinkerers reading this thread who could help, too


one nice thing about the array language style is that it's possible to talk about variations on algorithms where the relevant code snippets, being a few characters, fits inline into the discussion; more traditional vertically-oriented languages that take handfuls or dozens of lines to say the same things need to intersperse code display blocks with expository prose


"More efficiently"? Maybe. It opens up a new way to think about solutions to problems. Sometimes those solutions are more efficient, and sometimes they are just different.

It's a useful thing to learn though. And dare I say it, fun. Even if there was zero benefit to it, it'd still be fun. As it turns out, there really are benefits.

For me, the biggest benefit is when I'm working with data interactively. The syntax allows me to do a lot of complex operations on sets of data with only a few characters, which makes you feel like you have a superpower (especially when comparing to someone using Excel to try to do the same thing).


I've found that the challenge is to "think in vector operations" rather than of iterating over the same data. The tricky part is figuring out how to get an operator to do the right thing over an array of stuff on the left hand side and this list/bag/etc of arguments on the right


Is there a reference which describes how the current architecture evolved? Perhaps from very simple core idea to the famous “all you need paper?”

Otherwise it feels like lots of machinery created out of nowhere. Lots of calculations and very little intuition.

Jeremy Howard made a comment on Twitter that he had seen various versions of this idea come up again and again - implying that this was a natural idea. I would love to see examples of where else this has come up so I can build an intuitive understanding.


Roughly:

1) The initial seq-2-seq approach was using LSTMs - one to encode the input sequence, and one to decode the output sequence. It's amazing that this worked at all - encode a variable length sentence into a fixed size vector, then decode it back into another sequence, usually of different length (e.g. translate from one language to another).

2) There are two weaknesses of this RNN/LSTM approach - the fixed size representation, and the corresponding lack of ability to determine which parts of the input sequence to use when generating specific parts of the output sequence. These deficiencies were addressed by Bahdanau et al in an architecture that combined encoder-decoder RNNs with an attention mechanism ("Bahdanau attention") that looked at each past state of the RNN, not just the final one.

3) RNNs are inefficient to train, so Jakob Uszkoreit was motivated to come up with an approach that better utilized available massively parallel hardware, and noted that language is as much hierarchical as sequential, suggesting a layered architecture where at each layer the tokens of the sub-sequence would be processed in parallel, while retaining a Bahdanau-type attention mechanism where these tokens would attend to each other ("self-attention") to predict the next layer of the hierarchy. Apparently in initial implementation the idea worked, but not better than other contemporary approaches (incl. convolution), but then another team member, Noam Shazeer, took the idea and developed it, coming up with an architecture (which I've never seen described) that worked much better, which was then experimentally ablated to remove unnecessary components, resulting in the original transformer. I'm not sure who came up with the specific key-based form of attention in this final architecture.

4) The original transformer, as described in the "attention is all you need paper", still had a separate encoder and decoder, copying earlier RNN based approaches, and this was used in some early models such as Google's BERT, but this is unnecessary for language models, and OpenAI's GPT just used the decoder component, which is what everyone uses today. With this decoder-only transformer architecture the input sentence is input into the bottom layer of the transformer, and transformed one step at a time as it passes through each subsequent layer, before emerging at the top. The input sequence has an end-of-sequence token appended to it, which is what gets transformed into the next-token (last token) of the output sequence.


Thank you for this summary! Very well explained. Any tips on what resources you use to keep updated on this field?


Thanks. Mostly just Twitter, following all the companies & researchers for any new announcements, then reading any interesting papers mentioned/linked. I also subscribe to YouTube channels like Dwarkesh Patel (interviewer) and Yannic Kilcher (AI News), and search out YouTube interviews with the principles. Of course I also read any AI news here on HN, and sometimes there may be interesting information in the comments.

There's a summary of social media AI news here, that sometimes surfaces something interesting.

https://buttondown.email/ainews/archive/


karpathy gave a good high-level history of the transformer architecture in this Stanford lecture https://youtu.be/XfpMkf4rD6E?si=MDICNzZ_Mq9uzRo9&t=618


Can someone answer CS 101 questions about this please.

I know there are other methods related to matrix factorization, but I’m asking specifically about quantization.

Does quantization literally mean the weight matrix floats are being represented using fewer bits than the 64 bit standard?

Second, if fewer bits are being used, are CPUs able to do math directly on fewer bits? Aren’t CPU registers still 64 bit? Are these floats converted back to 64 bit for math, or is there some clever packing technique where a 64 bit float actually represents many numbers (sort of a hackey simd instruction)? Or do modern CPUs have the hardware to do math on fewer bits?


This is for GPUs, not CPUs. GPUs do have lower precision ALUs to do math on fewer bits. Though not 2 bits - I believe there’s support for 1, 4 and 8 bit computation in modern Nvidia cards.

But even without such support there’s a benefit of model size compression so that bigger models can fit in GPU memory, eliminating costly CPU/GPU data transfers.


Yes but no. The actual values represented by the quantized bits don't use a representation akin to IEEE floating point, but they are able to act like floating point values due to mathematical transformations during propagation. The floating point values a quantized value corresponds to are chosen using some kind of precomputation depending on the quantization method


I’ve tried to understand causal inference several times and failed. Tutorials seem unnecessarily long winded. I wish authors would give simple, to the point examples.

Say I have a simple table of outdoor temperatures and ice cream sales.

What can the machinery of causal inference do for me in this situation?

If it doesn’t apply here, what do I need to add to my dataset to make it appropriate for causal inference? More columns of data? Explicit assumptions?

If I can use causal inference, what can it tell me? If I think of it as a function CA(data), can it tell me if the relationship is actually causal? Can it tell me the direction of the relationship? If there were more columns, could it return a graph of causal relationships and their strength? Or do I need to provide that graph to this function?

I know a wet pavement can be caused by rain or spilled water or that an alarm can go off due to an earthquake or a burglary. I have common sense. I also understand the basics of graph traversal from comp sci classes.

How do I practically use causal inference?

To the authors of future articles on this (or any technical tutorial), please explain the essence, the easy path, then the caveats and corner cases. Only then will abstract philosophizing make sense.


> Say I have a simple table of outdoor temperatures and ice cream sales. What can the machinery of causal inference do for me in this situation?

Not much. Causal inference works over networks of variables, specifically a DAG. But usually you know more than one variable association, so this is more an issue of pedagogy than the tool itself.

Probably the shortest, most persuasive example I can give you is a logical resolution to Simpson's Paradox: when the correlation between two variables can change depending on whether you consider a third variable or not.

The classic example is gender discrimination in college admissions. When looking at admissions rates across the entire university, women are less likely to be accepted than men. But when (in this example) you break that down into departments, every department favors women over men. This is a paradoxical contradiction, and worrying in that your science is only as good as the dimensions your data captures. Worse, the data offers no clean way to say which is the correct answer: the aggregate or the total. Statisticians stumbled for a long while on this, and it's kind of wild that we were able to declare smoking causes cancer without a resolution to this.

Pearl wrote a paper on how bayesian approaches resolve the paradox[1], but it does presume familiarity with terms like "colliders," "backdoor criterion" and "do-calculus." His main point is that causal inference techniques give us the language and tools to resolve the paradox that frequentist approaches do not.

[1]: https://ftp.cs.ucla.edu/pub/stat_ser/r414.pdf


When looking at admissions rates across the entire university, women are less likely to be accepted than men. But when (in this example) you break that down into departments, every department favors women over men.

If every department favored women then the entire university would also favor women. Parity is guaranteed in that scenario. What happened in the Berkeley case is that not every department favored women, and women applied disproportionately to the departments with lower admissions rates (including some that didn't favor them), while men did the opposite.


Yes, apologies, what I meant by "favored" was that in every department, women applicants were more likely to get an admission than men. But I'm pretty sure the admission rate can still be lower for women overall than men overall, using exactly the same scenario you described. If the sociology department admits 10 percent of applicants and the physics department admits 90, it seems very easy for gender bias in applications to shift women towards 10 and men towards 90, even if the rate is a few percent higher for women.


I get your point now. You're quite right that you can construct scenarios that arbitrarily favor men in the aggregate but women in specific departments, given the right ratio of applicants.


> Or do I need to provide that graph to this function?

You need to do that, and the math can help you measure how much each arrow contributes. The idea that you need to provide your model of the world is strangely not a key part of most introductions, but it’s crucial.

> outdoor temperatures and ice cream sales

That’s too simple: a simple regression can handle that. Causal inference can handle cases with three variables, assuming you provide an interaction graph. Say: your ice cream truck goes either to a fancy neighborhood or a working-class plaza. After observing the weather, you decide where to go, so know that wealth and weather influence sales, but sales can’t influence the other two. Assuming you have data all for cases (sunny/poor, sunny/rich, rainy/poor, rainy/rich), then you can separate the two effects.


> > outdoor temperatures and ice cream sales > That’s too simple: a simple regression can handle that.

Not quite. Regression by itself will not answer the causal (or equivalently, the counterfactual) question.

I strongly suspect you already know this and was elaborating on a related point. But just for the sake of exposition, let me add a few words for the HN audience at large.

Let me give an example. In an email corpus, mails that begin with "Honey sweetheart," will likely have a higher than baseline open rate. A regression on word features will latch on to that. However, if your regular employer starts leading with "Honey sweetheart" that will not increase the open rate of corporate communications.

Causal or counterfactual estimation is fundamentally about how a dependent variable responds to interventional changes in a causal variable. Regression and relatedly, conditional probabilities are about 'filtering' the population on some predicate.

An email corpus when filtered upon the opening phrase "Honey sweetheart" may have disproportionately high email open rates, but that does not mean that adding or adopting such a leading phrase will increase the open rate.

Similarly, regressing dark hair as a feature against skin cancer propensity will catch an anti-correlation effect. Dyeing blonde hair dark will not reduce melanoma propensity.


Your model needs to introduce a third piece of information: whether an email is a corporate communication—or a deliberate intervention.


My understanding (that might be out of date) is that the tools are weak. Ideally you would have tabular data and it would give you a digraph for the causal structure between variables. You can try this but the tools don't work reliably yet. Otherwise everyone would use them.


Agreed. Afaict, in practice, you setup your own casual graphs and test them. This seems very academic 1950s.

Interestingly, folks are finally doing more realistic experiments in the casual equiv of arch search, and genAI is giving these efforts a second wind. Still feels like at the toy stage or for academics & researchers with a lot of time on their hands, vs relevant for most data scientists.

I'm still on the sidelines, but keep checking in in case finally practical for our users..


Same here, I check in every year or so because it would be fantastic to have.


> Say I have a simple table of outdoor temperatures and ice cream sales.

You have more than that! You have knowledge about the world!

> What can the machinery of causal inference do for me in this situation?

Well, (I’m being purposefully pedantic here) you haven’t really asked a question yet. The first thing it can do is help you while you’re formulating one. It can answer questions like, “how can I anticipate how things I have and havent measured will the estimates I’m interested in/making?”

> If it doesn’t apply here, what do I need to add to my dataset to make it appropriate for causal inference? More columns of data? Explicit assumptions?

The first thing you need to do is articulate what you’re actually interested in. Then you need to be explicit about the causal relationships between things relevant to those questions. The big thing (to me) is that particular causal structures have testable conditional independence structures and by assessing these, you can build evidence for or against particular diagrams of the context.


Judea Pearl's The Book Of Why gives you more practical and easy to understand examples, I recommend that.


It's pretty simple. You cannot infer casualty from observational data. No matter how sophisticated your statistical tools are.

You need to perform a properly controlled experiments to infer casualty. And even then it's hard.

Inferring casualty from observational data is cargo cult science.


TL;DR: Causal inference is a complex topic, not a simple tool.

How's the ice cream example better than the sugary snacks example given in the article?

Here's the part about needing to add more columns to the data:

> When dealing with a causal question, it’s crucial to include variables known as confounders. These are variables that can influence both the treatment and the outcome. By including confounding variables, we can better isolate and estimate the true causal effect of the treatment. Failing to add or account for confounding variables may lead to incorrect estimates.


> How's the ice cream example better than the sugary snacks example given in the article?

Not the OP, but because that fails to explain how the basic hypothetical example works(!)

You want to know how much your sales would be in a parallel world where kids were stuck with bland snacks compared to your sweet treats. This is where causal inference steps in to provide the solution. (nice graph follows)

So how is that done?


> TL;DR: Causal inference is a complex topic, not a simple tool.

The simple version using graphical models and joint probabilities isn't difficult to explain or teach. The issue is that to do anything useful with it at scale you either need MCMC or variational inference and that's an entirely different bag of worms all together. For medical datasets you rarely have "scale", instead you have very few sample cases and a large expert model (the doctor/specialist).


Can someone help me understand where tools like this fit?

Are they Tableau dashboard replacements?

Are the better than a standard bootstrap admin theme?


What a great way to say it!


A website which allows you to practice SQL, without having to install databases on your machine: http://sqlforever.com/

I’m eventually planning on o do more with it, but need some free time in my life.


FIX Parser: https://fixparser.targetcompid.com/

A website which allows people in financial trading companies to more easily understand the FIX protocol.

Obviously this is a very niche app, but very useful! It is somewhat well known in the industry (among the type of people who use FIX).

Amusingly, recently a friend forwarded me a website, run by a prestigious financial software company, which is CLEARLY a copy of my website! They are marketing their site on LinkedIn and, I’m sure, other places.

I keep thinking of developing this firther. I have several ideas, just lack the time.


Unfortunately most of the comments are about site reliability.

This used to be an absolutely fantastic forum. I was a young comp sci graduate who somehow finished school without taking any programming language theory courses. I used to read this every single day. At one point I had every book ever written on ML (ocaml, sml, etc) and most written about various lisps. To this day I love how TAPL was written (Types and Programming Languages by Pierce). I loved the expansive nature of Concepts, Techniques, and Models of Computer Programming by Van Roy. Some books were discussed so often that they were simply referred to by their abbreviations.

There were serious academics, PHD students, industry folks and newbies like myself who could not even understand most abstracts, much less the full papers.

I once asked if a new forum could be created for novices like myself so I could ask my dumb little questions. I was instead encouraged to ask my questions in the main forum :)

For a short while there was a related user group in NYC where people would discuss type theory at random diners.


It was SO good, and no doubt will be again in the future :-) I have so much respect for Ehud Lamm and the other people who run it.

Sadly, two of its best commentators have died recently - John Shutt (famous in some circles for writing about fexprs, and also a brilliant mind on several other topics including quantum mechanics and history of mathematics) and Thomas Lord.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: