Uncertain Future for Marginalia Search

theobeers · on April 30, 2022

This is a great search engine. I entered "Persian transliteration," since that's what I was working on today. It sent me to the readme for a program written in 1996,[0] which takes a Latin-script transliteration of some Persian text, and generates ASCII art that resembles the way that text would be written in Persian script. Useful? Eh… Delightful? 100%. It would never have occurred to me that such a program would exist.

Best wishes to you, Marginalia developer.

[0]: http://www.payvand.com/gerdsooz/README.html

imiric · on April 30, 2022

I've been following the project for a while now, and while I don't use it yet, we need it and more like it to succeed if we ever hope to loosen Google's chokehold on the web.

Best of luck to you and to the project!

I'm curious about a few things:

1. What's your (planned) business model?

2. Have you tried asking for sponsorships, either from companies or individuals? You should have an easy way for people to donate. I'm sure you'd have some support there, especially if your day job situation is unstable.

3. Is it just you working on it right now? Have you considered open sourcing it to get community contributions, or hiring more devs (once donations pick up or maybe someone would be willing to work on it on their free time as you do)? I can imagine that writing a search engine is a gargantuan effort, and doing it alone must be close to impossible.

marginalia_nu · on April 30, 2022

> 1. What's your (planned) business model?

Dunno. In general I don't have a lot of faith in the profitability of search engines. Ads can work if you're Google-scale, the other option is subscriptions, but in that case, you need to be really good and my search engine just isn't, outside of some areas. That's actually one of my bigger design problems, how to let people understand which queries are likely to be useful. It looks like Google, and people assume it has the affordances of Google. It doesn't, and if you go in with those assumptions, you'll be disappointed.

My model, as far as I've planned one, is just to keep the operation as cheap as possible and subsist on donations and maybe partnerships with other search engines. A big part of what I'm exploring is ways of doing as much as possible with low power hardware. I think rather than indexing 1 billion documents, 90% of which are garbage that will never be a good search result for any query ever, if I can index 100 million 50% of which are potentially good hits, then maybe that goes a decent way.

> 2. Have you tried asking for sponsorships, either from companies or individuals? You should have an easy way for people to donate. I'm sure you'd have some support there, especially if your day job situation is unstable.

I haven't really been fishing for this. I honestly didn't see having to change jobs as I am right now. I do have a donations page from before, but all of this was fairly sudden, so I haven't really gone over that whole process all too much yet.

> 3. Is it just you working on it right now? Have you considered open sourcing it to get community contributions, or hiring more devs (once donations pick up or maybe someone would be willing to work on it on their free time as you do)? I can imagine that writing a search engine is a gargantuan effort, and doing it alone must be close to impossible.

It's been just me up until now. Solo work can be ridiculously efficient when beginning a new project, especially when doing the sort of exploratory programming this has been. I also haven't felt I have had enough bandwidth to manage an open source project. But I am approaching a point where it's becoming a bit much to do all by myself, especially given this isn't my only project.

So I am considering open sourcing it or bringing more people in, just need to think a bit about a good format for such a collaboration. It's relatively high maintenance and requires manual operations to keep going. As it stands, a lot of the code isn't trivially testable, running it (even with few documents) requires large language models and so on.

20after4 · on May 1, 2022

If you do end up open sourcing it I'm still interested in contributing. Both in form of code and also infrastructure.

djbusby · on May 1, 2022

There is a tasteful spot between FOsS engine and ad-support that it feels like marginila could capture. Would likely need a "ceo" type who wasn't also an MBA type to make this work.

mitchbob · on May 1, 2022

Are there services that are at that "tasteful spot" now? Examples would be helpful!

RistrettoMike · on May 1, 2022

I’ve really enjoyed using Marginalia search over the last few months whenever I’m looking for honest-to-goodness “small-web” content. It’s always a breath of fresh air, provided I don’t look too niche.

I took the opportunity to pitch in a few dollars to the Patreon each month. I hope this time & a bit of extra funds from folks is helpful in finding what direction you feel driven to take search in.

Thanks for making a really neat web thing in a web less and less full of neat things!

marginalia_nu · on April 30, 2022

Hopefully this will turn out to be a good thing. Maybe having some time to work on the project full time is exactly what's needed to push it forward.

Still a bit uncomfortable how sketchy it feels in the longer term. But whatever. All I can do about it is do a good job.

O_H_E · on April 30, 2022

This might be intentional on your part, but I couldn't find your Patreon linked anywhere from the blog.

This might be a good time to start linking that in obvious places.

Fwiw it was very easy to find it through Google, but ironically not through marginalia.

I hope you the best in your endeavors.

marginalia_nu · on April 30, 2022

Yeah I have it linked from the search engine as a top link[1], but I can only have 2-3 of them so I haven't linked to it anywhere in the blog.

Haven't really been a priority to get donations since I've had more than plenty income.

Maybe I should look over the design.

[1] https://memex.marginalia.nu/projects/edge/supporting.gmi

kumarsw · on April 30, 2022

Using Marginalia always reminds me just how much we have lost since the golden age (2000-2010) of the internet. Thanks for bringing it back in a small way.

SemanticStrengh · on April 30, 2022

Couldn't HN do a fundraising campaign ? HN according to the wikipedia page has been created as a place to preserve the eternal september [1] https://en.wikipedia.org/wiki/Eternal_September

Marginalia is therefore something we must protect at all cost

cookie_monsta · on May 1, 2022

> HN according to the wikipedia page has been created as a place to preserve the eternal september

Sorry, could you point out the bit where the wiki page says anything like that?

SemanticStrengh · on May 1, 2022

I was referring to the HN wiki page https://en.wikipedia.org/wiki/Hacker_News#:~:text=Graham%20s...

cookie_monsta · on May 1, 2022

Oh, ok thanks because you linked to another page. But still the HN wiki page says

> Graham stated he hopes to avoid the Eternal September that results in the general decline of intelligent discourse within a community

How does that make it a place created to preserve the Eternal September?

SemanticStrengh · on May 1, 2022

I actually meant to preserve the pre-eternal september state.

t-3 · on May 1, 2022

Eternal August?

1vuio0pswjnm7 · on May 1, 2022

Users of non-mainstream browsers are now being required solve CAPTCHAs.^1 If hitting the Cloudflare CAPTCHA for HTML results, it appears JSON results are still available. Rate limit unpublished.

   curl https://api.marginalia.nu/public/example.com

1. Or users of mainstream browsers that turn off Javascript. Yikes.

Hope the author finds a new job that satisfies his requirements. Maybe he will get some help from fans of marginalia.nu.

It was nice to see an on premises, non-commercial search engine project. Proof of what's possible and the quality of results that can be achieved on a relatively small budget.

marginalia_nu · on May 1, 2022

The CAPTCHAS should be limited to only some IP ranges, I saw a spike in botspam a while back and had to dial up the bot mitigation a bit to be able to maintain service. Will reduce it when they give up.

sundarurfriend · on May 1, 2022

What do bots do with a search engine?

I understand botspam on the content side where they just generate keyword-"rich" pages trying for SEO, but it sounds like they're running searches on your search engine in this case, and I don't understand the purpose of that.

marginalia_nu · on May 1, 2022

I'm honestly not entirely sure what they are trying to do.

My best guess is possibly it's an attempt at manipulating a suggestions algorithm, but I don't even persist query data so it's a bit of a waste. But maybe they think I'm backed by Google or Bing?

vosper · on May 1, 2022

> it can be difficult to find the sort of queries that produce interesting and worthwhile results.

This is a hard problem to solve. I’ve worked at a couple of places that have built products on top of fulltext search provided by Elasticsearch or MySQL. Both companies struggle with the interface - users find it hard to understand, and hard to craft good searches. What’s the state of the art here? Is it Google with a magic text box? What if people need more control?

marginalia_nu · on May 1, 2022

I think design of search is still sort of stuck in 1998.

The problem is that an empty search box does nothing to tell you what could fruitfully be typed into it. Good type-ahead suggestions can help assist the query formulation a bit, but that's a band-aid at best.

You could reformulate basically any incidence of a search engine returning bad results as the visitor entering a bad query. The emphasis has been almost entirely on improving the capabilities of the search engine to interpret bad queries, rather than working with the interface design to offer the user an intuition for what is a good one.

Realistically you need to approach it from both ends, but I think the work has been fairly lopsided in search and discovery.

vosper · on May 2, 2022

Yeah I think I agree with you. The other place I've run into difficulty is explaining results. When you have complex searches and they're excluding or including certain results, you often want to know which part of the search is responsible. Because often searches are long-lived (outside of Google) and people need to tune them over time, but it can be really hard to figure out what to change.

IIRC for ES, at least, "explain" exists, but it's very hard to interpret.

daxfohl · on April 30, 2022

Surprised Elon bought that dumpster fire instead of something like this.

marginalia_nu · on April 30, 2022

Yeah, it would be far cheaper too :-/

NeutralForest · on April 30, 2022

Very important project, I hope you'll be able to settle into something comfortable!

georgehill · on May 7, 2022

I am a google search fan but I recently started falling back to Marginalia because some technical search queries yield far very good search results. I don't want this search engine to die.

Why not try to raise some money?

ianbutler · on May 1, 2022

As someone developing alternative search engines myself, I hope the project pulls through. Good luck finding another job that gives you similar freedoms!

benwills · on April 30, 2022

In a very different way, I'm also involved in a search-related project. (edited to add: also going solo on my project as well) If you ever want to bounce ideas around, I'd totally be up for that.

Related: you mention other sources than Common Crawl for WARC data. Is there a list of those somewhere?

marginalia_nu · on April 30, 2022

Sure, my email is in my profile if you want to chat.

Some WARCs that go into IA get published on archive.org, not all of them, but some: https://archive.org/search.php?query=warc

It's also an all-around useful format as you can produce it from wget and other common tools. But the big reason I'm moving toward something relatively homomorphic to WARCs is to be able to (in the future) publish my own crawls.

benwills · on April 30, 2022

Thanks for that link. I've done a bit of work with the Common Crawl data (and proposed moving to ZSTD with a proof of concept and performance metrics in C a few years ago).

I'll send you an email later this weekend to connect.

ColinHayhurst · on April 30, 2022

I wish you well and we welcome what you are doing with marginalia. As you know search needs a shakeup. One vital approach to a real shakeup is true independence of crawler and index. If it's any encouragement, Marc our founder started Mojeek as a hobby project back in 2004.

marginalia_nu · on April 30, 2022

Thanks, man.

yuhong · on April 30, 2022

When PCs still needed spinning rust to boot from. Even SATA SSDs are not as good as NVMe.

jmclnx · on April 30, 2022

I never heard of it, but looks good. I hope they can succeed. And good luck to you too.

hahnchen · on April 30, 2022

I just use google to search hn, “site:news.ycombinator.com <query>”