Maybe instead of hard-coding these preferences in the search engine, or having it try to guess for you based on your search history, you can opt-in to download and apply such lists of ranking modifiers to your user profile. Those lists would be maintained by 3rd parties and users, just like eg. adblock blacklists and whitelists. For example, Python devs might maintain a list of search terms and associated urls that get boosted, including stack exchange and their own docs. "Learn python" tutorials would recommend you set up your search preferences for efficient python work, just like they recommend you set up the rest of your workflow. Japanese python devs might have their own list that boosts the official python docs and also whatever the popular local equivalent of stackexchange is in Japan, which gets recommended by the Japanese tutorials. People really into 3D printing can compile their own list for 3D printing hobbyists. You can apply and remove any number of these to your profile at a time.
I like this idea! I think the biggest difficulty with it - which is also probably the most important reason that engines like Google and DDG are currently struggling to return good results - is that the search space is just so enormously large now. The advantage of the suggestion in the blog post is that you trim down the possible results to a handful of "known good" sources.
As I understand it, you'd want to continue to search the whole "unbiased" web, then apply different filters / weights on every search. I really do like the idea, but I imagine we'd be talking about an increase in compute requirements of several orders of magnitude for each search as a result.
Maybe something like this could be made a paid feature, with a certain set of reasonable filters / weights made the default.
This may be a very dumb question, but could the filtering be done client-side? As in, DDG's servers do their thing as normal and return the results, then code is executed on your machine to weight/prune the results according to your preferences.
Maybe this would require too much data to be sent to the client, compared to the usual case where it only needs a page of results at a time. If so, would a compromise be viable, whereby the client receives the top X results and filters those?
This would work if you had a blacklist of domains you didn't want to see. But the idea in the post is closer to a whitelist: the highest priority sites (tier 1) should be set manually, and anything after that should be weighted by how often it's referenced by the tier 1 sites. For a lot of searches you're going to have to pull many results to fill a page with stuff from a small handful of domains, and in fact you might not be able to get them at all. And that's before you start dealing with the weighting issue, which would require quite a bit of metadata to be sent with each request.
I have had a similar idea, what you're proposing is essentially a ranking/filtering customisation. The internet is a big scene, and on this scene we have companies and their products, political parties, ad agencies and regular users. Everyone is fighting for attention, clicks. Google has control over a ranking and filtering system that covers most searches on the internet. FB and Twitter hold another ranking/filtering sweet spot for social networks.
The problem is that we have no say in ranking and filtering. I think it should be customisable both on a personal and community level. We need a way to filter out the crap and surface the good parts on all these sites. I am sure Google wouldn't like to lose control of ranking and filtering, but we can't trust a single company with such an essential function of our society, and we can't force a single editorial view on everyone.
As we have many newspapers, each with its own editorial views, we need multiple search engine curators as well.
Unfortunately I suspect that if it were a premium feature, not enough groups would volunteer the requisite time into compiling and maintaining the site ranking lists. This sort of thing really has to become a community effort in order to scale, I think.
This is a great idea. It's like a modern reboot of the old concept of curated "link lists", maintained by everyone from bloggers to Yahoo. Doing it at a meta level for search-engine domains is a really cool thought.
I got signed up for goodreads (book review site), and I get tons of spam. It's not quite the same as your idea, but it is a
currated list. I don't know how you stop spammers from adding bogus links in the python interest list (to use an example).
Like any other list, it depends on who maintains it. You basically want to find the correct BDFL to maintain a list, much like many awesome-* repositories operate.
As a hack until then, I've found Google's Custom Search Engine feature to work well enough for my use cases. I just add the URLs that are "tier 1" for me. https://programmablesearchengine.google.com/cse/all
I mean, hacker news can probably also identify users based on which articles they click on, and how often they jump straight to the comments. I hope they don't.
But a system such as I'm describing is probably the only one that can be entirely consistent with the two disparate requirements of fully anonymizing users, and being useful to both programmers and ophiologists studying different things called "python".
what you are describing is relevance based on user input (be that cookies, search history, interests, a preference for x over y) that may be used as identifying information, which vastly de-anonymises the service. if a search query is too ambiguous then it can be refined. if the user knows they want a programming language and not a snake, they can let the search engine know themselves. don't sacrifice their anonymity for perceived usefulness
Presumably, the most common search preference lists would be used by very large numbers of people -- for example, almost all programmers would rather see Python (language) queries over Python (snake) queries and would probably all be using whichever search preferences become the most popular and well-maintained, like "mit_cs_club.json". A subset of those would also be into anime and enable their anime search preferences (probably more particular), and some of them will also like mountaineering, pottery, and baking, and will have such preferences configured as well. Yes that might be enough to identify you (just like searching for your own name would be) but those preferences don't need to be attached to you, just your query, and you could disable or enable any of them at any time.
It would basically be like sending a search query in this form:
"Python importerror help --prefs={mit_cs_club, studioghiblifans_new, britains_best_baking_prefs, AlpineMountaineersIntl}"
If you like baking, anime, and mountaineering, it's probably convenient to leave all those active for your searches, even your purely programming-focused searches. But you could toggle some of them off if articles about "helping to protect imported mountain pythons" are interfering with your search results, or if you want to be more anonymous. If you're especially paranoid you could even throw in a bunch of random preferences that don't affect your query but do throw off attempts to profile you. You could pretty easily write a script that salts every search with a few extra random preference lists, for privacy or just for fun, and make that an additional feature. The tool doesn't need to maintain any history of your past activity to cater to your search, so I think it would be a good thing for privacy overall.
> Presumably, the most common search preference lists would be used by very large numbers of people
the more anonymous among us tend to opt for common IP addresses and common user agents to become the tree among the forest. adding a profile to that would, well, only add to a digital fingerprinting profile
> those preferences don't need to be attached to you, just your query
that's not how it works. preferences are by their nature personal. every transaction would have your interests and hobbies embedded, on top of metadata
you have voluntarily made yourself the birch among the ebony
> If you're especially paranoid you could even throw in a bunch of random preferences that don't affect your query but do throw off attempts to profile you
how would they not affect the query? they complement the query. or rather, unnecessarily accompany the query. your results depend on your input. it doesn't matter what colour glove you wear to pull the trigger if you bury the gun with the body
> write a script that salts every search with a few extra random preference lists, for privacy or just for fun
just the latter. fuzzing would be pointless since the engine will have already identified you by now
it sounds like an annoying browser extension at best. to label it a pro-privacy tool would be ludicrous