More

iflp · on Nov 5, 2024

The seed is computed using the last 4 tokens, but to get the predictive distribution they should still need the full history?

iflp · on Aug 9, 2023

> The superconducting-like behavior in LK-99 most likely originates from a magnitude reduction in resistivity caused by the first-order structural phase transition of Cu2S. [...] It is important to note that this first-order structural transition differs significantly from the second-order superconducting transition.

iflp · on July 13, 2023

This seems to be about easier classification tasks with not too many samples, for which TF-IDF also works well (Table 3). But more generally gzip for text modeling might make sense. Quoting http://bactra.org/notebooks/nn-attention-and-transformers.ht... :

> Once we have a source-coding scheme, we can "invert" it to get conditional probabilities; we could even sample from it to get a generator. (We'd need a little footwork to deal with some technicalities, but not a heck of a lot.) So something I'd really love to see done, by someone with the resources, is the following experiment:

> - Code up an implementation of Lempel-Ziv without the limitations built in to (e.g.) gzip; give it as much internal memory to build its dictionary as a large language model gets to store its parameter matrix. Call this "LLZ", for "large Lempel-Ziv".

> - Feed LLZ the same corpus of texts used to fit your favorite large language model. Let it build its dictionary from that. (This needs one pass through the corpus...)

> - Build the generator from the trained LLZ.

> - Swap in this generator for the neural network in a chatbot or similar. Call this horrible thing GLLZ.

> In terms of perplexity, GLLZ will be comparable to the neural network, because Lempel-Ziv does, in fact, do universal source coding.

Maybe someone on HN will have resources for such an experiment?

ailef · on July 13, 2023

This is really interesting. How would this compare in terms of performance/resources necessary for training/inference w.r.t. neural networks?

Would this be leaner and run on less or would it reach the same complexity eventually?

iflp · on April 27, 2023

Content aside, this is terrible editing on BBC’s part:

> However, the regulator hit back, saying: "It is the CMA's job to do what is best for the people, businesses and economy of the UK, not merging firms with commercial interests."

> But the Competition and Markets Authority (CMA) said its job was not to serve the interests of merging firms.

lastangryman · on April 27, 2023

Yep also:

> Mr Smith said the CMA's decision market "probably the darkest day in our four decades in Britain".

I assume they meant to say "marked" rather than "market". The editorial standards are frankly a disgrace at the BBC these days, these errors are common.

ta1243 · on April 27, 2023

That's what happens when output increases massively, budgets drop by 50%, and there's a constant pressure to be first

red_trumpet · on April 27, 2023

The first sentence even appears again further down.

ChrisMarshallNY · on April 27, 2023

I think that was a quote from Brad Smith. That often happens, when they parse a quote for the lede, then expand upon it later.

I don’t have any opinions on the merits (or demerits) of the issue at hand.

iflp · on March 14, 2023

These are all good reasons, but it’s really a new level of openness from them.

iflp · on Jan 18, 2023

If you only care about identically distributed test data, test set overfitting doesn't happen that fast: if you evaluate M models on N test samples, the overfitting error is on the order of sqrt(log M / N). And even as this error becomes more noticeable, the relative ranks among the models are even more stable, as you can apply the small variance bounds. This is actually verified on models proposed for CIFAR-10.

rsfern · on Jan 18, 2023

That’s cool, is that from https://proceedings.mlr.press/v97/feldman19a.html ?

I hadn’t seen that result before, definitely interested in related work

iflp · on Jan 18, 2023

No. I was referring to the "standard concentration bound" in that paper, which applies when you have separate validation and test sets. I think the argument can usually be improved by applying small-variance inequalities such as Bernstein's, to excess risk-like quantities such as l(f_hat(x), y) - l(f_ref(x), y), to show that accuracy difference / relative rank enjoys better guarantees. For ImageNet we can use the 01 loss and set f_{ref} to a SoTA classifier which, while having its loss bounded away from 0, is "mostly similar" to most f_hat's, and thus leads to a small excess risk.

The CIFAR experiments I mentioned were https://arxiv.org/pdf/1806.00451.pdf. It doesn't contain this argument (unfortunate wording) but appears to support it well.

iflp · on Nov 24, 2022

StartAllBack

iflp · on Aug 19, 2022

surface go?

sbisson · on Aug 19, 2022

I recently put Ubuntu one one. A nice little device!

iflp · on July 24, 2022

The wake-sleep algorithm is designed to provide generative models with necessary sleep, at least in their developmental stages.

(Sorry, can’t resist.)

iflp · on May 24, 2022

The equation example is artificial. In practice there will be no curly brackets around these single-token sub/superscripts, nor should the \left / \right present in this example. With properly added whitespaces this equation becomes quite legible.

For more complex equations people use line breaks and indentations, and/or macros.

jasomill · on May 24, 2022

As an example taken at random from my undergrad days, see

https://jasomill.at/metric.pdf

produced with

    \begin{align*}
      f^*(dx& \otimes dx + dy \otimes dy + dz \otimes dz)\\
      =&\ f^*(dx \otimes dx) + f^*(dy \otimes dy) + f^*(dz \otimes dz)\\
      =&\ d(f^*x) \otimes d(f^*x) + d(f^*y) \otimes d(f^*y) + d(f^*z) \otimes d(f^*z)\\
      =&\ d(x \circ f) \otimes d(x \circ f) + d(y \circ f) \otimes d(y \circ f) + d(z \circ f) \otimes d(z \circ f)\\
      =&\ df^1 \otimes df^1 + df^2 \otimes df^2 + df^3 \otimes df^3\\
      =&\ \left(\pdFOneU du + \pdFOneV dv\right) \otimes \left(\pdFOneU du + \pdFOneV dv\right)\\
      +&\ \left(\pdFTwoU du + \pdFTwoV dv\right) \otimes \left(\pdFTwoU du + \pdFTwoV dv\right)\\
      +&\ \left(\pdFThreeU du + \pdFThreeV dv\right) \otimes \left(\pdFThreeU du + \pdFThreeV dv\right)\\
      =&\ \left(\gOneOne\right) du \otimes du\\
      +&\ \left(\gOneTwo\right) \left(du \otimes dv + dv \otimes du\right)\\
      +&\ \left(\gTwoTwo\right) dv \otimes dv.
    \end{align*}

(note that, using my editor defaults, none of this equation's lines are truncated or wrapped)

The \g... and \pdF... are trivial ad hoc macros defined in the document. Producing the same document by repeatedly copy/pasting the tensor components and partial derivatives would have been considerably more time-consuming and error-prone.

Also notable is the align environment: type \\ for a manual line break and & at each point that is to be aligned.

I just tried to reproduce this with the "WYSIWYG" equation editor in Word, and I can't figure out how to do it — right-clicking on plus and equal signs gives an "Align at this Character" option, but this appears to be a special case, as the option doesn't appear when clicking on anything else in the example.

In particular, had the first line involved (implied) multiplication instead of addition, there apparently wouldn't be any acceptable point to declare alignment!?!

I'm also not sure what it means to align "at" a character, as characters have width, and characters to be aligned aren't necessarily the same width. Does it align the left sides? The right sides? The center?

Moreover, the default key bindings in the Word equation editor are counterintuitive: the usual binding for "manual line break" instead splits the equation in two and there doesn't appear to be any default binding for the "Insert Manual Break" equation editor command at all.

Sure, LaTeX has a learning curve, but it's not at all obvious to me that Word is any better in this respect.

dash2 · on May 25, 2022

To be clear, the above is meant to be an example in favour of TeX? You look at that and think "well, it's obvious what's going on here"?