Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I use claude sonnet for coding and it's better than GPT4 most of the time. Something I am realising is that LLMs doesn't have any moat. Today OpenAI, tomorrow someone else.


I agree. My personal experience is that 80% of the time Opus is better than GPT-4 on coding.

Honestly the only thing that keeps me to sometimes prefer GPT-4 now is the UI. I like being able to Edit my messages, and to Stop the model if I gave it the wrong prompt. Please improve Claude's UI!

The interoperability between LLMs right now is amazing. When I write a program I can quickly test it with each of GPT, Claude and Gemini to see which work better for what I'm doing. Here's to hoping nobody figure out how to create a moat any time soon!


The Claude Ui of holding files though is far superior.

Each of them do some things better.


Now we just need an ensemble model.


Tomorrow my desktop computer hopefully


I really doubt it.

"Tomorrow" your desktop computer might be twice powerful but at the same time the "good model of tomorrow" will be four or ten times larger - I'd expect that the gap between what can be done locally versus what is offered as a service will grow, not shrink.


The diminishing returns from model scale means that if your personal computer improves twofold in the same time a datacenter improves fivefold, you may still have lessened the gap in terms of quality.

That doesn't mean you'll be able to run the best model, but I'm relatively optimistic about the gap not growing out of control.


Well, sure, one thing is that the absolute numbers do increase, so for any given notion of "good enough", every device at some point will reach the level where it will be able to run it.


i run a home media server, can't wait to be able to add my own LLM service. it's just a matter of time for it to be something i can install over the weekend with proper hardware


Have you tried https://ollama.com/ ? You may find you already can


  git clone https://github.com/ggerganov/llama.cpp
  cd llama.cpp
  make
  ./server -m models/7B/ggml-model.gguf -c 2048
I don't think it'll take you the whole weekend :)


One moat might be access to recent training data, as evidenced by the NYT lawsuit and recent deals in that space.


Do you have an IDE integration? What I find so great about Copilot/GPT-4 is how it's integrated into VSCode/Jetbrains and can use the context you're in. Like knowing what line you highlighted, what documents you have open, etc. Do you copy paste into whatever chatbot you're using?


I had the same problem of copying and pasting code into LLM web UI so I built a small tool to streamline the process and add source code into the prompt: https://prompt.16x.engineer/

You can't rely on IDE auto context since the entire codebase is too large to feed into LLM (maybe Claude 3 200k context token can take it, but too expensive). And RAG is not smart enough to figure which part of code is relevant.


> You can't rely on IDE auto context since the entire codebase is too large to feed into LLM

It's not feeding in the entire codebase but whatever you selected or whatever files you have open, at least that's what the UI suggests. So I ask it "what does this line do" and I get an answer that uses the whole file to explain what the line does.


Yeah for context within the single file, GitHub Copilot is good, I use it all the time.

But if you use it to do something across multiple files (DB schema, service, controller, HTML, JavaScript), then it becomes less accurate (precision vs recall problem) as like you said it uses your open windows or some heuristics to decide what's the context.

With an IDE as an interface, it is just not intuitive, UX wise, to "open tabs" to signal to GitHub Copilot that the files should be included in the context.


That works, but for more complex questions taking into account different files and the whole architecture of the app, CoPilot fails. I've been trying to RAG my repos to accomplish this, but the parent comment said that's not possible.


I believe https://aider.chat/ is working on RAG for codebase.


Is github copilot using GPT-4 or 3.5? I've tried to find out for sure but I can't seem to find the information anywhere


I think 3.5, that was the last official note.

Copilot Chat uses 4, but it's suspiciously free of confirmation that is also used in the more contextual Copilot (no-chat).


GitHub Copilot uses OpenAI Codex, which is a much older model fined-tuned on GPT-3.

Definitely not GPT-4, otherwise it would not be less than $10 a month for constant usage.


The chat part (mostly) uses GPT-4, you can also see which model is called in the request logs. Here is the official announcement: https://github.blog/changelog/2023-11-30-github-copilot-nove...


Okay thanks for pointing that out.

I figure if they do this, they have to throttle or nert it somehow since it is cheaper than ChatGPT Plus which also gives access to GPT-4.


It won’t answer questions that are not somehow related to code or computing, I usually don’t need anything else so I didn’t really test the limits of that so far.


I'm sure one can just ask Claude to code the integration, it has to be so good.


Do you have any tips? I find copilot so much worse when trying to use it in VCCode, even with the integration.

It just seems to do a much worse job than pasting your code into the chat UI.

Like, it's answers are just profoundly bad in comparison.


Using copilot in both vsc and vs2022 . I see vast differences but usually down to the language I'm working with.

I've noticed that in visual studio (IDE) copilot gives better answers if I physically view an interface or implementation, then I get "okay" results. But it stuggles with larger more abstract projects.

Vscode is better for sure, but usually I'm working in smaller projects or interpreted languages.

The vim copilot extension is probably the better bit again I'm not working with dotnet in vim.


You have contextual IDE integration with the main LLMs including OpenAI and Claude within Cursor (a VSCode "AI first" derivative), though I haven't tried it but heard good things.


> Something I am realising is that LLMs doesn't have any moat. Today OpenAI, tomorrow someone else.

I think you are correct, for chat. But for audio, video, 3D stuff, it will never be that easy for a newcomer.


Is generative ai easier (which isn't to say easy) than I assumed it'd be, and it's more limited by training data and training hardware than model complexity?


Yes

Edit: although on some level the training only gets you the general capabilities of the model. How you fine tune it to specifically be a useful bot is a very important element. That’s not really model complexity so much as design thinking and experimentation



The last paragraph gets to this, but the ways engineers and scientists imagine the mind works are nothing like how it actually works.


We spent >$10K last month with OAI.

Their moat right now is developer tooling. They allow fine tuning + an easy API to use their llm.

Noone else does that right now. By the time others do, so much tooling and infrastructure has been built around OAI that the switching costs will be a lot.

It will get to the point that if you want your llm to beat oai on the market it won't be enough that you are as good or even better. You need to be vey very significantly better than OAI. For an extreme example of this see Windows. The network effects keeping it together are so strong that the platform becoming an abandoned adware hasn't been enough to push users to significantly better platforms like Mac.

Now I've fine tuned the hell out of gpt-3.5 and I'd love to see how my app would performed on a fine tuned Opus. I went to their website and I can't seem to fine tune their model yet. Meh. My guess is that by the time they make it available I won't have a strong reason to even try anymore.


Not in my experience. Sonnet gets to the point quicker, but hallucinates more than GPT4. I'm keen to try Opus.


how do you use it for coding? I'd like my own little app that doesn't share my stuff online.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: