Speaking from my own experience, which may be different from the grandparent comment: I’ll ask ChatGPT (on GPT4) for some analysis or factual type lookup, and I’ll get back a kinda generic answer that doesn’t answer the question. If I then prompt it again, aka a “please look it up” type message, the next reply will have the results I would have initially expected.
It makes me wonder if OpenAI has been tuning it to not do web queries below some certain threshold of “likely to help improve reply.”
I’d say ChatGPT’s replies have also gotten slowly worse with each passing month. I suspect as they try to tune it for bad outcomes, they’re inadvertently also chopping out the high points.