Centralized versus Distributed Web

Founder of SocialToo Jesse Stay says the web is no longer open. He states that only a few large entities own the flow of information through both social and searchable web. DeWitt Clinton a Google software engineer responded with an eloquent description of what open means to him, and how even small budget businesses could construct a highly functional search engine (I keep bugging DeWitt and others about open semantic processing tools and interfaces).

I spent a few cycles thinking about the way businesses currently monetize the web. Companies that build better indexes of social and stand alone content win. While I celebrate the free market, I see a flaw in centralizing giant silos of social and web information. I don't even think monetization is anywhere near where it could be if we embraced a different model. I explain why in the comment I left to both Jesse and DeWitt:

If owning the web comes down to knowing what's in it Jesse, it's not a stretch to create a decentralized index. Trusted distributed crawling :)
While we rely on massive datacenters (Goog, Bing, etc) and web crawling bots now, there will come a time when each node will be part of layers of distributed indexes based on privacy/ownership. Any attempts to restrict information flow simply get routed around, and most of the revenue generated from web businesses has a short lifetime (decade/two tops?). I predicted the end of big social centralized services within ten years. The same forces that enable all of us to own our own social data (and lease it out to our favor) are the same that will enable all of us to own our own content (and lease it out to our benefit). Sharing all of my data publicly is of course my choice and then it's "fair game" to remote entities to use.

Social is already implemented with protocols like Push over open source StatusNet. I see that Laconi.ca users can subscribe to public Buzz feeds with Push, this is a great beginning.

As developers we want clear standards, but defacto standards generated without external collaboration usually benefit the parent organization over others. For me social simplifies as one to one, one to many, and many to many relationships. Knowledge of the data index for the web is a one to many relationship today, I suspect that won't be so for long.

There's a lot more to both sides of the discussion so to find out more I highly suggest reading both posts. If you're short on time here's an excerpt from Jesse and DeWitt's different perspectives.
Here's a quote from Jesse capturing what he sees as a problem:

So we’ll soon have 3 ways of identifying our websites on the “open” web. I can identify my site through Facebook, as you see by the Facebook Connect login buttons scattered around. I can identify myself in the Google SocialGraph APIs, which, if you view the source of this site you’ll see a ‘rel=”me”‘ meta tag identifying my site so Google can search it. Who knows what Twitter will provide to bring my site into its network. Each network is providing its easiest ways of identifying your site within their own Social Graph, and calling it “open” so other developers can bring their stuff into their networks easily, without rewriting code.

I think it’s time we stop tricking ourselves into thinking the web is open at all. Google is in control of the web – they have it all indexed. Now that we are seeing that he who owns the Social Graph has a new way of controlling and indexing the web, which we are seeing by Facebook’s massive growth (400+ million users!), I think Google feels threatened. They’ll play every “open” term in the book to gain that control back. Of course the new meta tags are beneficial – is it really beneficial to “everybody” though? I argue the one entity it benefits most is Google. Yeah, it benefits developers if we can get everyone to agree on what “open” is, but that will never happen. I think it’s time we accept that now that the web is controlled and indexed by only a few large corporations, it is far from “open”. ”Open” is nothing more than a marketing term, and I think we can thank Google for that. No, that’s not a bad thing – it’s just reality.

DeWitt Clinton, a Google software engineer who works on and supports Buzz (google's latest social network) responded to Jesse's post:

1) Open protocols and formats mean two specific things to me:

The first is licensing of the protocols themselves, with respect to who can legally implement them and/or who can legally fork them. This involves patent and copyright licenses (and sadly yes, lawyers). While a small number of us are always debating the finer details of how it works, eventually there's a binary aspect to it: a protocol has to be formally licensed for reuse for it to be open.

The second is the license by which the data itself is made available. (The Terms and Conditions, so to speak.) The formal definitions are less well established here (thus far!), but it ultimately has to do with who owns the data and what proprietary rights over it are asserted.

In an ideal, interoperable, and decentralized world, implementors can both clone and/or fork the protocols as desired (without asking permission), and users can get their own data back out without needing to follow to someone else's restrictions about how they use it.

It's important to look at both aspects above when judging if a system is open. Can I legally fork and/or clone it? And, am I entering into a arrangement that places limitations on my rights to use my own data? (And the corollary, are other people entering into a arrangement that puts limitations on their rights to share their own data with me?)

If the answer to either or both is "no", then no matter what we may want to believe, regrettably it's not an open system. (And don't be misled—even the worst data silos are obviously going to enable some way to get data out, otherwise no one would put anything in. The question is what do you have give up to get it back out? It's a question I believe more people should be asking, and asking it before they turn their data over to some network.)

So when I say "open" I don't just throw the word around casually. I mean those two very precise things: what is the license, and what are the Terms. It's not hand-waving, and it's not marketing. It's technical, and it's legal. Boring to some, easy to ignore for most, but that doesn't make it less important to understand or less important to get right.
...
3) Google absolutely benefits when the web benefits. That's not something to be ashamed of—that's something to be proud of. It's a wonderful business model. I wish more companies were similarly aligned with the health of the web.

4) And as to why Google doesn't implement some particular spec or work with some particular data provider? Well ... the answer usually is: you're asking the wrong guy. You really should be asking them. :)

Victus Spiritus

Centralized versus Distributed Web