Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Can someone explain why the weights are posted via a Bittorrent magnet link? I have no way to check the size at the moment, but isn't that a bit unusual? There's also only 21 seeders right now according to https://checker.openwebtorrent.com/



Because Bittorrent is an outstanding tech for delivering large files, more I think about it the more I'm surprised it wasn't taken advantage of more.


it's been criminalized to hell by IP holders and hollywood. Such a shame they killed the best tech of the previous decade. Could have revolutionized how we distribute content, approach CDN and even streaming.


So many countries still have data caps, even the supposedly 1st world countries.

Even without hollywood people wouldn't have wanted to use any platform that uses their data and battery/power to share to everyone else.

It's the main reason why torrenting platforms have introduced ratios and made it hard to get invited.


In what way is the bittorrent protocol criminalized?


scheme 1: agents for copyright holders continuously scan for IP addresses who host copyrighted content and start legal actions.

scheme 2: criminal groups infect copyrighted content with malware to exploit downloaders of such content.


Neither of those are against the bittorrent protocol itself. Lots of software like Ubuntu is legally available on bittorrent, and I've never seen anything done to restrict that.


It may become a tradition since weights are so large. Perhaps it started when the Llama torrent link leaked. Then, Mistral decided to release their weights using bittorrent.


I'm not sure why you wouldn't tbh. That's a lot of bandwidth.


[flagged]


Lol what


Distributing 300GB via torrent is cheaper than direct, assuming even a few other people seed


> Can someone explain why the weights are posted via a Bittorrent magnet link?

I think the best way to get an answer to that question is to try to host it yourself and see what happens.


Spreads the burden/cost of distributing a 300+GB file.


How else could/should it be done?


I would have assumed they could just upload it to Github. If it has restrictions on file size I'm sure they could make multiple part compressed files.

Torrents can unfortunately die after a period of time if no one continues seeding it or if they don't use a permanent web based seeder, which doesn't appear to be the case.


GitHub have a soft repository size limit of 5GB, documented here: https://docs.github.com/en/repositories/working-with-files/m...

Soft size limit means "If your repository excessively impacts our infrastructure, you might receive an email from GitHub Support asking you to take corrective action." - I know people who have received such emails.

Most model releases happen through Hugging Face which does not have such a size limit.


They'd probably just charge you for it. They sell "data packs" for LFS.

https://docs.github.com/billing/managing-billing-for-git-lar...


It would be super expensive to use LFS to distribute this:

> Each pack costs $5 per month, and provides 50 GiB of bandwidth and 50 GiB for storage

So they would need to pay for 6 data packs (or $30) for every 300gb download.

(https://docs.github.com/en/billing/managing-billing-for-git-...)


I'd bet Hugging Face would be happy to have hosted these canonically too, so not sure why that doesn't happen more.


The model is also at https://huggingface.co/xai-org


The great thing about torrents is that you (or anyone else who cares) can single-handedly solve the problem you're complaining about by seeding the torrent.


No git would be impossible. I’ve never seen a repo even a few GB in size, if you are uploading non code files you really should not be using git. Git is a version management software for code. I often see repos which images and even videos checked in, please don’t, there are so many far better and more performant solutions out there.

The other approach would be to use AWS S3 or other cloud providers which would cost them money every time someone downloads their code, which is not their prerogative to pay for when they are releasing something for free. Torrents seems like the only good solution, unless someone hosts this on the cloud for free for everyone.


Scott Chacon (github cofounder) mentioned in a recent talk that the Windows repo is 300GB https://youtu.be/aolI_Rz0ZqY?si=MOo2eS6dsKKAxmsP


Interesting, had no idea git had a VFS or that MS was a Monorepo. I guess git is much more capable than I thought but the average user really should just be uploading code into github


Huggingface will disagree with impossible as their models are available via git, sometimes broken up in pth files.

Still, as far as sentiment goes, yeah git for model weights is an impedance mismatch for sure!


> No git would be impossible. I’ve never seen a repo even a few GB in size, if you are uploading non code files you really should not be using git

It's not actually a limitation in git itself, especially if you use Git LFS. People use Git for Unreal projects and big ones can be half a terabyte or more in size.


Others have pointed out that GitHub doesn't allow that, but

> Torrents can unfortunately die after a period of time if no one continues seeding it or if they don't use a permanent web based seeder, which doesn't appear to be the case.

So to can web links, especially when they are 300 GB and egressing out of AWS at $0.09/GB or worse (in non-US regions). Each full download would cost $27 at that rate. 10,000 downloads would cost $270,000.

Sure you could go for something with a better cost model like R2, but you can't beat using one or two unmetered connections on a VPN to constantly seed on Bittorrent, your pricing would be effectively free and reliability would be higher than if you just exposed a HTTP server on the Internet in such a way.


> and egressing out of AWS at $0.09/GB

There's a lot of seeders on the torrent that are actually AWS ips too, all with similar configurations which makes me believe that it's probably xAI running them

> on a VPN

That's unnecessary, you don't need a VPN?


No you don't, but if you wanted to host it from your gigabit office IP, you probably would want to.


Why?


GitHub may choose to throttle downloads or remove the files simply because they're taking up too much bandwidth.

A torrent is less likely to go down in the short term.


This is not some crappy DVD rip on The Pirate Bay. It will be seeded as long as its relevant.

Twitter/X has their own massive infrastructure and bandwidth to seed this indefinitely.


Yeah, they can just leave some server running somewhere and just let it seed forever


my optimistic explanation is we are going back to the 2000s internet , but probably we are not


Let's hope so.


Its likely over 100GB of data, so I wouldn't say its necessarily unusual to spread out the bandwidth across multiple hosts.


Thanks! I searched and searched for a tool that would show me info via the web about a magnet link but nada


Why not? Mistral was first to do it, it has become tradition.


BitTorrent is just an objectively superior method of delivering a lot of data to a lot of people.


I believe it was Llama 1 that notoriously got leaked with a torrent on 4chan.


It wasn't much of a leak. Facebook was pretending to keep it private for PR reasons but putting approximately zero effort into actually keeping it private.


Mistral did it too when they released their first open model. They just posted a magnet link on Twitter.


I don't understand why you're being downvoted for asking a legitimate question. People not familiar with model weights might be surprised that they are often in tens of gigabytes and in this case even more.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: