Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sounds like point 2 was a negative seo attack. It could be that your /?s page is being cached and getting picked up via crawlers.

You can avaoid this by no caching search pages and applying noindex via X-robots tag https://developers.google.com/search/docs/crawling-indexing/...



Cache has nothing to do with this

But yes just noindex search pages like they already said they did


I think the question is “how are the behavior of random spammers on your search page getting picked up by the crawler”? The assumption with cache is that searches of one user were being cached so that the crawler saw them. Other alternatives I can imagine are that your search page is powered by google, so it gets the search terms and indexes the results, or that you show popular queries somewhere. But you have to admit that the crawler seeing user generated search terms points to some deeper issue.


You just link to that page from a page that Google crawls. Cache isn't involved unless you call links caching


Ah that makes sense, thanks for clarifying.

Not sure how search result pages can be crawled unless they are cached somewhere?


If I'm reading correctly, it's not that your search results would be crawled, it's that if you created a link to www.theirwebsite.com/search/?q=yourspamlinkhere.com or otherwise submitted that link to google for crawling, then the google crawler makes the same search and sees the spam link prominently displayed.


Yikes.

What could Google do to mitigate?


You noindex search pages or anything user generated, it's really that simple


Not enough. According to this article (https://www.dr.dk/nyheder/penge/pludselig-dukkede-nyhed-op-d... you probably need to translate) its enough to link to an authorative site that accepts a query parameter. Googles AI picks up the query parameter as a fact. The artile is about a danish compay probably circumventing sanctions and how russian actors manipulate that fact and turn it around via Google AI


Yeah all pages should have a proper canonical which would solve this too


In this case, all i had to do was let the crawler know not to index the search page. I used the robots noindex meta tag on the search page.


I don't know what you mean by cache but you aren't using it correctly...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: