Maybe instead of hard-coding these preferences in the search engine, or having i...

bscphil · on Nov 17, 2020

I like this idea! I think the biggest difficulty with it - which is also probably the most important reason that engines like Google and DDG are currently struggling to return good results - is that the search space is just so enormously large now. The advantage of the suggestion in the blog post is that you trim down the possible results to a handful of "known good" sources.

As I understand it, you'd want to continue to search the whole "unbiased" web, then apply different filters / weights on every search. I really do like the idea, but I imagine we'd be talking about an increase in compute requirements of several orders of magnitude for each search as a result.

Maybe something like this could be made a paid feature, with a certain set of reasonable filters / weights made the default.

Spooky23 · on Nov 17, 2020

I disagree; the search space is shrinking as more and more stuff moves to walled gardens like Facebook and Twitter.

retsibsi · on Nov 17, 2020

This may be a very dumb question, but could the filtering be done client-side? As in, DDG's servers do their thing as normal and return the results, then code is executed on your machine to weight/prune the results according to your preferences.

Maybe this would require too much data to be sent to the client, compared to the usual case where it only needs a page of results at a time. If so, would a compromise be viable, whereby the client receives the top X results and filters those?

bscphil · on Nov 18, 2020

This would work if you had a blacklist of domains you didn't want to see. But the idea in the post is closer to a whitelist: the highest priority sites (tier 1) should be set manually, and anything after that should be weighted by how often it's referenced by the tier 1 sites. For a lot of searches you're going to have to pull many results to fill a page with stuff from a small handful of domains, and in fact you might not be able to get them at all. And that's before you start dealing with the weighting issue, which would require quite a bit of metadata to be sent with each request.

hobs · on Nov 17, 2020

Back in the day you'd have webrings - groups of sites that linked each other in clear association.

visarga · on Nov 17, 2020

I have had a similar idea, what you're proposing is essentially a ranking/filtering customisation. The internet is a big scene, and on this scene we have companies and their products, political parties, ad agencies and regular users. Everyone is fighting for attention, clicks. Google has control over a ranking and filtering system that covers most searches on the internet. FB and Twitter hold another ranking/filtering sweet spot for social networks.

The problem is that we have no say in ranking and filtering. I think it should be customisable both on a personal and community level. We need a way to filter out the crap and surface the good parts on all these sites. I am sure Google wouldn't like to lose control of ranking and filtering, but we can't trust a single company with such an essential function of our society, and we can't force a single editorial view on everyone.

As we have many newspapers, each with its own editorial views, we need multiple search engine curators as well.

absolutelyrad · on Nov 18, 2020

Would you pay $10/yr for this feature?

jbay808 · on Nov 18, 2020

Unfortunately I suspect that if it were a premium feature, not enough groups would volunteer the requisite time into compiling and maintaining the site ranking lists. This sort of thing really has to become a community effort in order to scale, I think.

brundolf · on Nov 17, 2020

This is a great idea. It's like a modern reboot of the old concept of curated "link lists", maintained by everyone from bloggers to Yahoo. Doing it at a meta level for search-engine domains is a really cool thought.

mech422 · on Nov 17, 2020

This would be awesome! I'm so tired of google ignoring what I tell it, and trying to 'guess' what I want.

I'd also love to be able to specify I want results from the last year without having to set it everytime.

wstrange · on Nov 17, 2020

This doesn't really seem immune from spam.

I got signed up for goodreads (book review site), and I get tons of spam. It's not quite the same as your idea, but it is a currated list. I don't know how you stop spammers from adding bogus links in the python interest list (to use an example).

This is a hard problem..

EDIT: Clarified goodreads reference!

vorpalhex · on Nov 17, 2020

Like any other list, it depends on who maintains it. You basically want to find the correct BDFL to maintain a list, much like many awesome-* repositories operate.

AsyncAwait · on Nov 17, 2020

This is actually a great idea and something I can see working rather well.

nolanhergert89 · on Nov 17, 2020

As a hack until then, I've found Google's Custom Search Engine feature to work well enough for my use cases. I just add the URLs that are "tier 1" for me. https://programmablesearchengine.google.com/cse/all

867-5309 · on Nov 17, 2020

> to guess for you based on your search history, you can opt-in to download and apply such lists of ranking modifiers to your user profile

pro-privacy does not sit well with terms such as search history and user profile

jbay808 · on Nov 17, 2020

You might have misread. My proposal is an alternative to inferring user preferences based on their search history.

867-5309 · on Nov 18, 2020

any type of profiling, opt-in or not, may be used to identify users

jbay808 · on Nov 18, 2020

I mean, hacker news can probably also identify users based on which articles they click on, and how often they jump straight to the comments. I hope they don't.

But a system such as I'm describing is probably the only one that can be entirely consistent with the two disparate requirements of fully anonymizing users, and being useful to both programmers and ophiologists studying different things called "python".

867-5309 · on Nov 18, 2020

what you are describing is relevance based on user input (be that cookies, search history, interests, a preference for x over y) that may be used as identifying information, which vastly de-anonymises the service. if a search query is too ambiguous then it can be refined. if the user knows they want a programming language and not a snake, they can let the search engine know themselves. don't sacrifice their anonymity for perceived usefulness

jbay808 · on Nov 18, 2020

Presumably, the most common search preference lists would be used by very large numbers of people -- for example, almost all programmers would rather see Python (language) queries over Python (snake) queries and would probably all be using whichever search preferences become the most popular and well-maintained, like "mit_cs_club.json". A subset of those would also be into anime and enable their anime search preferences (probably more particular), and some of them will also like mountaineering, pottery, and baking, and will have such preferences configured as well. Yes that might be enough to identify you (just like searching for your own name would be) but those preferences don't need to be attached to you, just your query, and you could disable or enable any of them at any time.

It would basically be like sending a search query in this form:

"Python importerror help --prefs={mit_cs_club, studioghiblifans_new, britains_best_baking_prefs, AlpineMountaineersIntl}"

If you like baking, anime, and mountaineering, it's probably convenient to leave all those active for your searches, even your purely programming-focused searches. But you could toggle some of them off if articles about "helping to protect imported mountain pythons" are interfering with your search results, or if you want to be more anonymous. If you're especially paranoid you could even throw in a bunch of random preferences that don't affect your query but do throw off attempts to profile you. You could pretty easily write a script that salts every search with a few extra random preference lists, for privacy or just for fun, and make that an additional feature. The tool doesn't need to maintain any history of your past activity to cater to your search, so I think it would be a good thing for privacy overall.

867-5309 · on Nov 18, 2020

> Presumably, the most common search preference lists would be used by very large numbers of people

the more anonymous among us tend to opt for common IP addresses and common user agents to become the tree among the forest. adding a profile to that would, well, only add to a digital fingerprinting profile

> those preferences don't need to be attached to you, just your query

that's not how it works. preferences are by their nature personal. every transaction would have your interests and hobbies embedded, on top of metadata

> mit_cs_club, studioghiblifans_new, britains_best_baking_prefs, AlpineMountaineersIntl

you have voluntarily made yourself the birch among the ebony

> If you're especially paranoid you could even throw in a bunch of random preferences that don't affect your query but do throw off attempts to profile you

how would they not affect the query? they complement the query. or rather, unnecessarily accompany the query. your results depend on your input. it doesn't matter what colour glove you wear to pull the trigger if you bury the gun with the body

> write a script that salts every search with a few extra random preference lists, for privacy or just for fun

just the latter. fuzzing would be pointless since the engine will have already identified you by now

it sounds like an annoying browser extension at best. to label it a pro-privacy tool would be ludicrous