The RLHF is happening too late i think. I think the reinforcement learning needs...

		samrus 5 months ago \| parent \| context \| favorite \| on: I watched Gemini CLI hallucinate and delete my fil... The RLHF is happening too late i think. I think the reinforcement learning needs to be during the initial next token prodiction. On that note we need something to represent a complex world state than just language.