More

sean_pedersen · 2025-10-30T21:58:54 1761861534

By this reasoning aphantasiacs should be incapable of drawing anything from their mind.

Jordan-117 · 2025-10-30T22:09:32 1761862172

They can, but the representations are much simpler, often lacking visual detail and leaning on written labels:

https://www.biorxiv.org/content/biorxiv/early/2019/12/05/865...

sean_pedersen · 2025-07-30T13:44:29 1753883069

Link to the OG paper: https://arxiv.org/abs/2503.01781

sean_pedersen · 2025-05-09T00:59:44 1746752384

Maybe fight fire with fire and respond as default with a non-sensical question and see if the bug reporter responds genuinely confused or happily tries to engage in a non-sense convo…

sean_pedersen · on Oct 28, 2024

I am working on a visual search & exploration engine: https://digger.lol

The goal is to create beautiful and useful maps of interesting data, empowering the user to explore more intuitively guided by semantic similarity. No user data needs to be tracked for this to work, the data speaks for itself.

This roughly works by translating semantic (visual or textual) similarity into spatial proximity. Diggers major features are: semantic mapping, text search and image search. The text and image search works bidirectionally, allowing to search for images (e.g. product images) using text and for text (e.g. books) using images.

sean_pedersen · on Oct 20, 2024

https://github.com/stanford-oval/WikiChat

sean_pedersen · on Sept 30, 2024

I thought import names and PyPI names are not always equal, thus this can not work reliably, right?

jonathrg · on Sept 30, 2024

It uses this file which maps import names to package names, but it's only 1152 packages and I'm not sure how it was generated. https://github.com/bndr/pipreqs/blob/master/pipreqs/mapping

quietbritishjim · on Sept 30, 2024

In fairness that only seems to list those packages where the import name doesn't match the package name (e.g., it doesn't include numpy) so its overall coverage is a lot larger than that.

qwerty1793 · on Sept 30, 2024

Even worse, there are some different PyPI packages with the same import name. For example, `import snappy` is probably referring to the compression library https://pypi.org/project/python-snappy/ but it could be this maths package https://pypi.org/project/snappy/

sean_pedersen · on Aug 6, 2024

I agree a loading indicator (spinner f.e.) is needed, I was confused when nothing happened. Also big yes on for a reset / delete feature.

sean_pedersen · on July 18, 2024

I agree in that a perfectly consistent dataset won't completely stop statistical language models from hallucinating but it will reduce it. I think it is established that data quality is more important than quantity. Bullshit in -> bullshit out, so a focus on data quality is good and needed IMO.

I am also saying LMs output should cite sources and give confidence scores (which reflects how much the output is in or out of the training distrtibution).

rtkwe · on July 18, 2024

I think the problem is you need an extremely large quantity of data just to get the machine to work in the first place. So much so that there may not be enough to get it working on just "quality" data.

antihipocrat · on July 19, 2024

How would confidence scores work? Multiple passthroughs and a % attached to each statement according to how often it appeared in the generated result?

If so, building this could be quite complex depending on the domain. In the legal field even one simple word that is changed can have large consequences.

bbor · on July 19, 2024

What’s a non-statistical language model?

And I think looking to the training data for sources is a little silly - that’s the training data for intuitive language use, not true statements about the world. If you haven’t checked it out yet, two terms you’d love are “RAG” and “Manuel De Landa”

sean_pedersen · on July 18, 2024

I wrote up this blog post in 30 mins, that's why it reads a little rough. I could not find explicit research on the impact of contradicting training data, only on the general need for high-quality training data.

May be it is a pipe dream to drastically improve on hallucinations by curating a self-consistent data set but I am still interested in how much it actually impacts the quality of the final model.

I described one possible way to create such a self-consistent data set in this very blog post.

sean_pedersen · on July 9, 2024

I like pixi (https://pixi.sh/latest/). Let's me pin python version, install packages from conda and PyPI. And also written in Rust.

krageon · on July 9, 2024

It looks really interesting but it is hard to really invest in yet another ecosystem that tells you to curl and pipe into bash and then tells you to eval arbitrary command output.

networked · on July 9, 2024

For what it's worth, you can install pixi with Cargo. The current invocation is:

  cargo install --locked --git https://github.com/prefix-dev/pixi.git pixi

I try new versions of pixi from time to time because I have a project that depends on LAVIS and EasyOCR. My default project-management tool, Poetry, has problems with PyTorch. Right now, I use pip-tools for the project. While Conda worked, I didn't like the tooling that much. What is currently blocking me from using pixi is the PyPI integration (https://github.com/prefix-dev/pixi/issues/1295). I can evaluate pixi in earnest when it is solved.

dr_kiszonka · on July 9, 2024

Thanks for the link. Is it faster than conda?

ksquarekumar · on July 11, 2024

It's orders of magnitude faster than conda

mvelbaum · on July 9, 2024

KolenCh · on July 9, 2024

I find pixi great. If anyone uses conda, pixi is a drop-in replacement where the environment is associated with the git/project directory, similar to devbox/devenv/flox.

The story is a bit complicated. There was conda by the anaconda company written in Python. Then the open source ecosystem conda-forge is a conda channel with CI build bots. Then mamba being in the same umbrella under conda-forge is a drop-in replacement of conda written in C++ (this is actual drop-in that `alias conda=mamba` should work.) Then now conda uses libmamba as the solver to speed it up.

Then the author of mamba spin it off to pixi, a rewrite in rust with different philosophy on how environments should be located and activated, with full compatibility with conda environments.

Conda always supports installing packages from PyPI via pip (when it isn’t available from conda channels for example.) and pixi support PyPI packages via uv. That makes pixi fast. (There are other optimizations done outlined in their blog post making it much faster than even mamba.)

If anyone uses any non-pure python packages, then conda is the way to go. Package manager choice (conda/mamba/pixi) is secondary.

The problem with PyPI is the lack of gate keeping. That coupled with lack of standard way to package non pure python packages makes environments leaking (see comments on errors encountered in exotic or old environments), and/or non-reproducible (especially when people is distributing source only and doing crazy things in setup.py to bootstrap their environments including compilers.)

In conda land, the conda-forge channel has pretty good gate keeping to ensure quality, such as being constrained properly, licensed properly (PyPI maintainers sometimes didn’t include the necessary license file in the distribution), environment isolated properly, etc. it’s not bullet proof as there is official bot that maintainers can use to auto-merge changes from PyPI that has wrong version constraints for example.

The problems that no tools can solves right now are centered around PyPI: deal with packages not available in conda, and releasing packages virtually mandates releasing on PyPI first.

When installing packages available on PyPI only through conda, there are some of its dependencies still available through conda. AFAIK, no package manager will use conda packages to fulfill the PyPI package dependencies. You can manually add the conda packages to resolve dependencies, risking not subjecting it to the right version constraints.

And when you author an open source python package, even if your setup relies on conda channels only, you most probably would/need to release it on PyPI first (releasing on conda-forge channel virtually mandates a presence at PyPI first). Then you need non-conda tools to help you. This is why Rye would still be useful to people like me, and worth checking out.