WikiSearch, a Wikipedia Search Engine

Its origins were deceitful, duplicitous, and downright dumb but Google needs a competitor and maybe it’s not such a bad idea.

I’ve been focused on Wikipedia lately, after my article published by the Institute of New Economic Thinking, INET, Wikipedia’s Ties to Big Tech, here.

That put me in touch with lots of senior Wikipedia people, past and present. None of them are Wikimedia people who, as I’ve mentioned, repeatedly ignored questions. Before my article, they asked me to submit them by email then proceeded to entirely blow them off, transparency apparently being more aspirational than operational. After publication, Jimmy Wales responded with a not-so-small freakout that he’d never said something he had indeed said, in writing, that was quoted in the same article.

Still, that article led me to virtually meet some interesting and clearly very smart past and present senior Wikipedia editors, people who know Wikipedia and Wikimedia well.

Some definitions are needed. Wikipedia is the website we know and sometimes love, the sprawling “encyclopedia” that’s virtually always the first or second term after entering a noun into Google. Wikimedia is the parent non-profit that owns Wikipedia. Repeating the financials of my first story, Wikimedia has at least $180 million in reserves plus another $90 million in an “endowment” held by the non-profit Tides Foundation. There’s lots more to this and I’d urge anybody reading this to read the INET article if you haven’t already.

The people who write and maintain the various Wikipedia websites — each language has its own — is a group of volunteers collectively referred to as “the community.” Individual community members are informally called Wikipedians, a common enough term my spell checker knows it. Little of the almost $300 million pot of gold Wikimedia has collected from donations flows to Wikipedians.

Speaking with the longer-term Wikipedians, one element that came up repeatedly is a turning point in Wikipedia’s history when the firm decided to create a Knowledge Engine. That’s a project cooked up by former Wikimedia head Lila Tretikov. It looks like a search engine probably because that’s exactly what it clearly was.

Consistent with the type of transparency I’ve seen, Lila and Wikipedia co-founder and figurehead Jimmy Wales not only kept the project secret but repeatedly obfuscated it to the community. That’s a bad idea in any organization, a worse idea in a non-profit, a terrible idea in a non-profit that sets transparency as a core operating principle, and a godawful inexcusably horrible idea in a non-profit where a not-so-small army of volunteers create 100% of the product that anybody cares about.

Lila secretly applied for and received a grant from the Knight Foundation, first asking for millions then, after being rejected, less. They were eventually granted $250,000 to create a study. As that progressed, the entire project came to light. The two apparently lied that the grant application was secret, due to donor privacy, until a Wikipedian contacted Knight who clarified not only wasn’t it secret but they preferred it to be out in the open. When the lies became clear, much drama ensued as both community members and Wikimedia employees felt betrayed.

Compounding an already bad situation is a feeling I’ve heard repeatedly that Lila picked up people management skills in her native USSR with a more authoritarian tone than many were accustomed to. After a number of defections at both Wikimedia and Wikipedia, Lila resigned with a golden parachute on February 25, 2016 effective March. Shortly before, in January 2016, Jimmy set out to create an endowment to raise $100 million, an amount that seems like it’d be a good start for a limited specialized search engine.

In Lila’s place, Wikimedia promoted then head of public relations Katherine Maher to the top job. Maher has an NYU degree in Middle Eastern and Islamic studies she earned in 2005. After that, she worked with the UN, HSBC bank, wrote about the Arab Spring, and interned at the Council of Foreign Relations where she remains a life member. Somehow, those jobs prepared her for a job as head of PR then CEO at Wikimedia less than a decade after graduation.

Fast forward five years. Today, April 15, 2021, is Maher’s last day. Wales’ endowment, which Wikimedia promised to move to a separate charity from Tides when it reached $33 million, is near or past its $100 million goal and remains at Tides. Questions about the endowment were met with “we’ll get back to you” then, days later, crickets. Finally, there’s a new for-profit Wikimedia Enterprise offering to streamline data feeds to Big Tech because Google, the undisputed king of parsing web pages, apparently needs help electronically reading Wikipedia pages.

This last point cross-references to something Wikimedia’s new CTO, Grant Ingersoll, said in their open house a couple weeks ago when discussing Wikipedia Enterprise … he’s never talked to Google, Apple, or Amazon about the offering. Lane Becker, head of the project, also said he’s never spoken to them about their needs or how much they’d be willing to pay. As I’ve pointed out, Wikimedia has repeatedly refused to answer questions about how much time staffers are spending supporting Big Tech. Wired Magazine wrote “point people” are in discussions with Big Tech, which is the opposite of what the point people said when asked. It’s almost like Wired’s information came from a public relations firm rather than from people tasked with doing the work. You’d almost think press releases are being “rebranded” as news, an idea that once upon a time would’ve repulsed reporters.

Obviously, Google is more than able to electronically read the web, Wikipedia and all the rest. Google needs help reading and parsing Wikipedia pages like Simone Biles needs help walking a balance beam. Amazon and Apple aren’t far behind. There is no strain at all on Wikipedia’s servers as evidenced by the low web hosting fees Wikimedia pays (about $2.4 million/year to host a top-ten website). Coupled with the blank stares from those in the know I’m going to take a wild guess about how much Wikimedia pays supporting Google, Amazon, Apple, and the rest… nothing, or close to it.

I’m sure there’s some collaboration: hallway conversations at conventions, informal meetings … maybe the occasional lunch in San Francisco. But I don’t think there is any substantive work, any genuine drag on Wikimedia resources.

However, I do think the idea that there is work, a substantive out-of-pocket subsidy, is a great motivator to enrage and rally the community. They could, would, and should demand that Wikimedia not subsidize Google, Amazon, and Apple. The idea that very rich Big Tech is making seemingly small Wikimedia subsidize them is a good way to rally up Wikipedians. Except I’m pretty sure it’s nonsense. It’s Wikimedia that’s arguably taking advantage of the Wikipedians by creating an enormous cash hoard. Google and the rest just get the fruits of that arrangement; they’re freeloaders, not leaches.

This leads me to a conclusion I suspect will be unpopular: maybe Wikimedia should create Jimmy and Lila’s search engine assuming they’re not already doing so. Instead of a Google killer it’d return 10-20 accurate, relevant hand-curated results for the more common searches. A googol, that Google takes their names from, is a one with 100 zeros after it, the number of pages Google hoped to eventually read and index. But what if the vast majority of those results aren’t all that applicable to the mass of searchers? What if, instead, that enormous number of pages actually gets in the way of more casual searchers? What if a vastly smaller number of hand cultivated pages was better for the vast majority of searchers?

And, of course, it’s obvious that Wikipedia — with the content, community of content builders, links to various external interesting articles, userbase, and brand — could release an offering exactly like that at far lower development costs than anybody else. Leading to the question of why shouldn’t Wikimedia/pedia create a search engine where the top result for any noun is either the noun’s webpage, if they have one, the second page is Wikipedia, and the next two pages are hand-curated high-quality open-content websites? Do we really need the vast majority of searches to return hundreds of millions of results? Quoting Monty Python, “What’s wrong with a kiss, boy, hmm?” Oftentimes, less is more.

Would this satisfy every search? No, and it shouldn’t. It should produce a small number of good results, arguably as good or better than Google, for the vast majority of common searches. Would it work for me, a professional researcher? No. But most noncustomers aren’t professional or even amateur researchers; they just want some basic, easy to understand information.

Ironically, there was a search engine like that way back when the web first launched, Yahoo. It was a hand-curated listing of web pages and, difficult as it is to believe now, it one day ruled the web. Eventually, the yahoos at Yahoo decided to professionalize and the company hired Hollywood hacks who promptly burnt the place to the ground. They charged sites to be listed in the directory and, not long after, let them bid their way to the top. This eroded credibility, search quality, and eventually the ability of Yahoo search to provide meaningful value.

Interestingly, the strategy that obliterated Yahoo isn’t all that different than Google’s current strategy. They frequently fill top results with ads and internal links to other parts of Google. A notable exception is the ever-present link to Wikipedia which, conveniently for Google, doesn’t support a competing ad platform to lure advertisers to eyeballs beyond Google’s reach.

Obviously, if WikiSearch didn’t answer what a person was looking for they could failover to Google, which is largely how Google got its start at Yahoo in the first place. And, just as obviously, Google would pay for the placement, possibly more money than Wikimedia takes in altogether for that service alone. But, with the hand-curated results, most people wouldn’t need to failover. They’d get what they’d need curated by regular intelligence, not the artificial flavor.

What about gaming the system to manipulate results? Like I wrote in my original article, Wikipedia manipulation is a very real problem. Undisclosed conflicts of interest and state actors are especially troubling in addition to the secretly paid editors and self-promoting blowhards. Then again, Google has the same problem. There’s a whole army of “search engine optimization” specialists who exist to game Google search results. Wikipedia at least pays lip service that these behaviors are unacceptable whereas Google outright supports them by allowing them to run advertisements.

The idea that Lila and Jimmy may have been on to something worthwhile is anathema to most Wikipedians and, honestly, it doesn’t sit well with me either. I don’t know Lila but, in my experience, Jimmy’s a jerk with his baseless ad hominem attacks. Still, obnoxious narcissists are sometimes right and this might be one of those times. Maybe WikiSearch would be a good strategic move to challenge Google’s dominance in search.

Would Google mind that? I’m not sure. On one hand, if WikiSearch went well it might give them breathing room against antitrust regulators focused on their current search monopoly. On the other, a genuine competitor that takes meaningful traffic might not be what they had in mind. There’s a chance for a classic disruptive move: the barely good enough offering that eventually comes to dominate and overtake the high-cost incumbent.

Finally, there’s the ethics of the whole thing. WikiSearch is only possible with more Wikipedians spinning the gerbil wheel for free while vast amounts of Wikimedia money piles up in reserve accounts. Wikimedia argues they can’t collect too much from for-pay businesses because of charity rules but this is nonsense: plenty of charities are dominated by single donors (we’re looking at you, Firefox). They’d just need their accountants and lawyers to do that voodoo they do so well.

Leave a Reply

Your email address will not be published. Required fields are marked *