Pijul: Commutation and Scalability

dooglius · on Dec 21, 2020

The articles defines its sense of commutation as "meaning that if the two changes could be written without knowledge of each other, they can be applied in any order"

But, this seems to require semantic level understanding of the code, which generally can't be done. For instance, say Alice adds a call to foo() while Bob changes foo to require some precondition that doesn't hold in Alice's case. That's going to appear to commute as these changes would touch completely separate files, but actually the resulting code won't work. In a normal git flow, this would be a case of whoever gets the commit in and passes CI first "wins", and the second commit fails CI and requires the issue to be fixed. This method seems to rely inherently on the fact that git commits are ordered. How would this work with a pijul-based flow?

pmeunier · on Dec 21, 2020

It is actually defined as "you get the same file contents and directory structure, regardless of the order in which you apply the changes", which is quite good.

> But, this seems to require semantic level understanding of the code

No: you may very well have two different orders yielding the same output, even though the output is incorrect.

> This method seems to rely inherently on the fact that git commits are ordered. How would this work with a pijul-based flow?

Good question. Our CI over at https://nest.pijul.com/pijul/pijul/ci works in the same way. Patches are ordered in Pijul too, but the order is local. In your example, if Alice wins, Bob will pull her patch, fix his code, which will produce another patch. He will then send it to CI again (or directly to main if he's careless), and Alice will pull both Bob's original patch and the fix.

In the end, they will have different histories, but the exact same contents.

vlovich123 · on Dec 21, 2020

If there’s no global history to the changes, how do you support signing off on changes with crypto keys? How do you do a post Morten analysis of the order of events that led to a problem?

pmeunier · on Dec 21, 2020

There are global states in Pijul, using cryptographic primitives to handle the commutative nature of Pijul history.

I explained it in a previous blog post: https://pijul.org/posts/2020-11-07-towards-1.0/#version-iden...

ddevault · on Dec 21, 2020

This web server seems to be broken.

https://web.archive.org/web/20201220135644/https://pijul.org...

nix23 · on Dec 21, 2020

Ah thanks, same problem here though it's probably uBlock, Privacy Badger or LocalCDN.....or Firefox ;)

Ygg2 · on Dec 21, 2020

Hi pmeunier, very nice article I quite enjoyed it.

     > The complexity of apply is in O(p c log⁡∣H|), where p is the size of the change, c is the size of the largest conflict in which p is involved, and H is the number of edits made since the start of the repository.

Is it possibly to lazy fetch the repository so you can keep H as low as possible? Like only look at H that are in conflict or something?

pmeunier · on Dec 21, 2020

> Is it possibly to lazy fetch the repository so you can keep H as low as possible? Like only look at H that are in conflict or something?

Something like that is announced at the end of the post ;-) It will hopefully be possible within a few days. The design is mostly done (at least for the basic feature), but my solution changes the implementation quite a bit, so I would expect temporary breakages (we're still in alpha, after all!).

Ygg2 · on Dec 21, 2020

Not gonna lie. Pijul sounds revolutionary. It basically combines best of both worlds. Distributed nature of Git with partial checkout of SVN.

antpls · on Dec 21, 2020

I tried Pijul last weekend, on debian stable.

It installed successfully, I did "pijul init", I recorded some changes, but then "pijul log" did show nothing. I tried all the combinations of add/record, but pijul log stayed empty. I stopped to try because I didn't have much time.

It would be nice to have a better first experience, I'm not sure what I did wrong

pmeunier · on Dec 21, 2020

On the other hand, it's still in alpha. What version did you try?

Also, pijul log still depends on less, and there are some problems related to that. This is going to be fixed soon.

antpls · on Dec 21, 2020

There is no disclaimer on the front/download page or installation page saying in big bold "this software is alpha stage", so what should we expect? :-)

I don't remember the exact version, but I installed it following the instructions on the pijul website for Debian, with Nix.

I had to use a different set of options to install pijul through Nix than the ones on the website (after some Googling and luck, because I had no idea how to use Nix), maybe it had an impact on the installation.

Maybe a simpler way to distribute the software would be a simple zip with a static binary that works on most linuxes without installation. A docker image is also an option, though maybe overkill (but more people knows docker than nix)

Anyway, I will try again in some months, the project is definitely interesting.

dilap · on Dec 21, 2020

Very cool stuff! I think built-in excellent support for large files & large repos could be a huge driver of adoption vs git (which, of course, has poor support for those things).

(At work we use git, but split into many repos, which is a pain, and with git-lfs, which is also a (mild) pain.)

IshKebab · on Dec 21, 2020

Git doesn't have bad support for large files. It just doesn't make special exceptions for not recording their history like Git LFS does.

I don't see anywhere where Pijul would be different to be honest.

dilap · on Dec 21, 2020

True, you can put large files in git, but after a while your .git will become huge, clones will become extremely slow, and (if you're hosted on github), github will threaten to delete your repo. ;-)

Then you have to use something like git lfs, which isn't terrible, but also isn't 100% smooth either.

Something that handled this as part of the native experience out of the box would be much nicer.

Edit: Pijul would be different by not having clone become slow, as I understand the blog post.

IshKebab · on Dec 23, 2020

Sure but all of those things are a consequence of Git doing its job of recording all history forever. Git LFS is only "better" because it lets you delete history later and it enforces getting shallow clones.

Maybe Pijul solves that magically somehow (e.g. can you change history without changing commit hashes?) but they haven't explained how IMO.

dilap · on Dec 24, 2020

Well, I can just report my experience with a git repo w/ a bunch of binary files in it.

It got really really slow to clone (like multiple days -- we gave up on cloning and just copied the checked-out folder around); git lfs fixed this. I should note most of the binary assets had no changes and were not deleted; it seemed to be something inherent in git's handling of the binaries that made it slow, not just downloading the data. (I.e., I don't believe most of the wins in git-lfs were simply in doing a "shallow clone" as it were.)

So as I see it the situation is as follows:

- Even for a full clone, git becomes inefficient with a lot of large binaries

- For doing a partial clone, things like git-lfs work, but are clunky because git-lfs is not natively integrated with git

- The partial clone abilities of git-lfs are strictly "time-based" -- you go to a particular checkout, and then it goes off and downloads the files. There's no way to say, "I have a big monorepo with a bunch of binaries, but I'm only interested in some subset w/o the binaries."

So these are all areas where I think git can be improved on.

IshKebab · on Dec 24, 2020

That sounds like a bug to be honest. If the files never changed then you'll only have one copy of them and Git really shouldn't have to do any extra work to deal with them.

dilap · on Dec 24, 2020

Certainly possible. & of course git is always improving. But still, I do think ability to do path-based partial checkouts and have something like git-lfs integrated natively remain compelling potential features.

digikata · on Dec 21, 2020

I saw some mentioned ways to represent binary patches, but wonder if there is there also some sort of chunk-level deduplication?

pmeunier · on Dec 21, 2020

Not yet, but this is just by lack of time for implementing the diff algorithm. There's nothing hard there, but there are other priorities at the moment.

j0e1 · on Dec 21, 2020

Conflict resolution is one of my least favorite parts of Git but I don't complain because there ain't no such thing as free lunch. Does Pijul reduce the number of conflicts that would otherwise be faced by a Git VCS? And if so, to what extent?

pmeunier · on Dec 21, 2020

Yes it does, because Pijul has an internal model of conflicts. In particular, conflicts happen between two patches, and are solved by a patch.

This means for example that when you solve a conflict on a branch, and pull from the same remote, the conflict won't reappear (no `git rerere` needed).

Or that you can push a conflict resolution to a remote, even if that remote is in another state: as long as the remote has the conflicting patches, the resolution will work.

rstarast · on Dec 21, 2020

With files not identified by file name, I'm curious whether pijul is able to reconcile two independent additions of the same file. I suppose there'll have to be some custom logic to deal with this kind of conflict? E.g.:

Alice creates change

A: add file /README (identified by hash x)

Bob creates change

B: add file /README (identified by hash y) C: adds a copyright header to README

Now Bob pulls from Alice, and wants Alice's README contents (change A). Does pijul notice conflicts between A & B? If Bob records a change resolving the conflict between A & B by keeping A, will it be possible to apply C on top of this?

pmeunier · on Dec 21, 2020

Pijul will treat that as a conflict indeed, in particular because it has no way to tell that Alice and Bob meant the same thing when adding that file. A README is a simple example, but what about a module split from a larger module during a refactoring? Even if the filenames added by Alice and Bob match, how can we know they're meant to be the same?

olau · on Dec 21, 2020

This seems (somewhat?) orthogonal to what you're doing with Pijul at the moment, but I feel that auto-merging to avoid manual handling of obvious cases is somewhat underdeveloped in modern version control systems.

Perhaps that's just because people aren't talking about it - I've actually never looked at git's interface for automerging. Even though I have a couple of examples of spurious conflicts that come up regularly.

pmeunier · on Dec 21, 2020

There are two separate things:

- Git does have spurious conflicts coming back for no reason. The problem is so real that there's even a `rerere` command to take care of it.

- Pijul has "minimal" conflicts, in the sense that these are conflicts you can't possibly make sense of without human guidance. You get the guarantee that fixed conflicts don't come back. Unfortunately, "obvious" cases are never completely obvious: let's say Alice and Bob both add a case to a C function, and both change `#define NUMBER_OF_CASES 42` to `#define NUMBER_OF_CASES 43`.

jeffbee · on Dec 21, 2020

Hrmm. I was interested enough to try to compile this, but it doesn't build using the given `cargo` invocation. Has anyone any real world experience with this tool?

pmeunier · on Dec 21, 2020

Did you try the following:

cargo install pijul --version 1.0.0-alpha.24

Difficulties to compile have been reported on Windows, we're still trying to figure out a Windows CI pipeline.

jeffbee · on Dec 21, 2020

That gets rid of all but one of the problems. The directions on your site are to use alpha.17.

The remaining problem is the use of a recently stabilized Rust features, which means distro-packaged cargo can't install it.

pmeunier · on Dec 21, 2020

I fixed that on the site, thanks.

About recent features, I used to be very careful about using a stable Rust that was at least a few weeks old, or one generation behind. But the latest release got an unexpected number of contributors to the project, and we didn't even have a CI system!

We do have a basic CI now (https://nest.pijul.com/pijul/pijul/ci), so that's one thing we definitely want to check.