Por fin he terminado de leer The Search. How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture de John Battelle.

Os recomiendo su lectura (como Pepe, thanks again, me lo recomendó a mí).

No os perdáis el capítulo que dedica a Bill Gross, los comentarios amargos de Louis Monier y las notas sueltas acerca de WebFountain y Archive.

Animado por la lectura tuve una idea y la idea a su vez de compartirla con Google y con John Battelle.

Battelle ya me ha respondido con un lacónico:

Thanks. I think Google Base is step one in this.

Ahora de nuevo la idea es compartirla con vosotros. Os pongo lo sustancial del correo que les he enviado. Aunque está en inglés creo que se entiende todo.

## Problem statement:

I find the process of building a map of the web really a hard one. The crawler, the index, the algorithms and so forth. It is even harder to put semantics on top of it. Mr. Battelle writes about IBM's WebFountain in his book.

Mr. Battelle even writes about how your mission statement is to organize world's information and the up to some probability fact that everything, including someone's luggage at some point, is to be indexed.

That would make the problem more difficult I guess.

## Hint I take from Mr. John Battelle's book:

The Power of Many.

I don't know if he puts the blogsphere in front of the Power of Many spear to help solving the perfect search problem. Apparently he does so.

From my point of view blogging only makes problems more complex. To search inside the blogosphere is quite a difficult task. Perhaps the best approach to it I know is the one represented by the Technorati Tags.

## The idea (very raw one):

Let Google give the people means to publish content in an already indexed form.

Let us say Google has an editor that allow users to create their documents (and give them a good editing experience and facilities as any other editor does, this is a plus) and easily tag them (and semantically tag them).

This tagging process, knowing the author clickstream already or with better software inference tools, perhaps can be done automatically without the author having to care much.

This process is done in the pc of the author.

The page goes directly (and correctly) into the index, no crawling for it, no more hidden web.

Either this is so (the page is in the index) cause the author already has its contents in Google physical support, or cause the the editor signals Google with author permision on the intents of the page.

Let us say Google offer some benefit to publishers to use this editor. Like social respect for being contributors to a better indexed web, or even some little money, or even money depending on the success (rank) of what is published.

I think it would be a great idea for the benefit of the perfect search project that the problem is tackled just at the begining of the process: publising.

Let Google be your 22nd century web correct publising aid.

In the end I think the solution of the problem passes through something on these lines.

Thanks for the attention given so far.

¿Qué os parece?

(Auguro 0 comments en este post también)