> The superconducting-like behavior in LK-99 most likely originates from a magnitude reduction in resistivity caused by the first-order structural phase transition of Cu2S. [...] It is important to note that this first-order structural transition differs significantly from the second-order superconducting transition.
This seems to be about easier classification tasks with not too many samples, for which TF-IDF also works well (Table 3). But more generally gzip for text modeling might make sense. Quoting http://bactra.org/notebooks/nn-attention-and-transformers.ht... :
> Once we have a source-coding scheme, we can "invert" it to get conditional probabilities; we could even sample from it to get a generator. (We'd need a little footwork to deal with some technicalities, but not a heck of a lot.) So something I'd really love to see done, by someone with the resources, is the following experiment:
> - Code up an implementation of Lempel-Ziv without the limitations built in to (e.g.) gzip; give it as much internal memory to build its dictionary as a large language model gets to store its parameter matrix. Call this "LLZ", for "large Lempel-Ziv".
> - Feed LLZ the same corpus of texts used to fit your favorite large language model. Let it build its dictionary from that. (This needs one pass through the corpus...)
> - Build the generator from the trained LLZ.
> - Swap in this generator for the neural network in a chatbot or similar. Call this horrible thing GLLZ.
> In terms of perplexity, GLLZ will be comparable to the neural network, because Lempel-Ziv does, in fact, do universal source coding.
Maybe someone on HN will have resources for such an experiment?
Content aside, this is terrible editing on BBC’s part:
> However, the regulator hit back, saying: "It is the CMA's job to do what is best for the people, businesses and economy of the UK, not merging firms with commercial interests."
> But the Competition and Markets Authority (CMA) said its job was not to serve the interests of merging firms.
> Mr Smith said the CMA's decision market "probably the darkest day in our four decades in Britain".
I assume they meant to say "marked" rather than "market". The editorial standards are frankly a disgrace at the BBC these days, these errors are common.
If you only care about identically distributed test data, test set overfitting doesn't happen that fast: if you evaluate M models on N test samples, the overfitting error is on the order of sqrt(log M / N). And even as this error becomes more noticeable, the relative ranks among the models are even more stable, as you can apply the small variance bounds. This is actually verified on models proposed for CIFAR-10.
No. I was referring to the "standard concentration bound" in that paper, which applies when you have separate validation and test sets. I think the argument can usually be improved by applying small-variance inequalities such as Bernstein's, to excess risk-like quantities such as l(f_hat(x), y) - l(f_ref(x), y), to show that accuracy difference / relative rank enjoys better guarantees. For ImageNet we can use the 01 loss and set f_{ref} to a SoTA classifier which, while having its loss bounded away from 0, is "mostly similar" to most f_hat's, and thus leads to a small excess risk.
The CIFAR experiments I mentioned were https://arxiv.org/pdf/1806.00451.pdf. It doesn't contain this argument (unfortunate wording) but appears to support it well.
The equation example is artificial. In practice there will be no curly brackets around these single-token sub/superscripts, nor should the \left / \right present in this example. With properly added whitespaces this equation becomes quite legible.
For more complex equations people use line breaks and indentations, and/or macros.
(note that, using my editor defaults, none of this equation's lines are truncated or wrapped)
The \g... and \pdF... are trivial ad hoc macros defined in the document. Producing the same document by repeatedly copy/pasting the tensor components and partial derivatives would have been considerably more time-consuming and error-prone.
Also notable is the align environment: type \\ for a manual line break and & at each point that is to be aligned.
I just tried to reproduce this with the "WYSIWYG" equation editor in Word, and I can't figure out how to do it — right-clicking on plus and equal signs gives an "Align at this Character" option, but this appears to be a special case, as the option doesn't appear when clicking on anything else in the example.
In particular, had the first line involved (implied) multiplication instead of addition, there apparently wouldn't be any acceptable point to declare alignment!?!
I'm also not sure what it means to align "at" a character, as characters have width, and characters to be aligned aren't necessarily the same width. Does it align the left sides? The right sides? The center?
Moreover, the default key bindings in the Word equation editor are counterintuitive: the usual binding for "manual line break" instead splits the equation in two and there doesn't appear to be any default binding for the "Insert Manual Break" equation editor command at all.
Sure, LaTeX has a learning curve, but it's not at all obvious to me that Word is any better in this respect.