Slashdot Mirror


Google Developing Database Service

QuantumT writes "Ars Technica has the details on the unannounced Google Base service that will allow anyone with a Google Account to post information and other types of data into a massive, Google-run database. Ars believes that the company is gearing up to take on eBay and Craiglist, which makes sense given the Google Payment service that is in development. Google has commented, saying, 'This is an early-stage test of a product that enables content owners to easily send their content to Google. Like our web crawl and the recently released Google Sitemaps program, we are working to provide content owners an easy way to give us access to their content.' There's a few screenshots as well."

23 of 269 comments (clear)

  1. Content is king by BWJones · · Score: 5, Interesting

    Perhaps more importantly, this move positions Google as potentially the pre-eminent publishing house with an inherent built in search engine. Anything that goes into the database will be "intimately" searchable. From my perspective as a bioscientist, the ability to be able to search journal articles not just for text, but also for image data or graph data would be absolutely huge.

    Google has previously posted their position about Google Print here where they documented superficially their desire to enable people to search for "books". However, more importantly, it is the content within the "books" that will become more ubiquitous and more available.

    --
    Visit Jonesblog and say hello.
    1. Re:Content is king by holloway · · Score: 2, Interesting

      There's several points here,

      Firstly, people usually publish metadata, and domain-specific metadata, by following standards within their industry (defacto standards/proprietary/open/whatever). This doesn't necessitate holding the information locally, that's just a file location. What's important is having access to that information. If Google can help people get more files online that's a good thing but it's no different than if the donor put the file on their own site.

      Secondly there are metadata standards and ways of getting information out of files. There's the obvious title / author / subject tags in HTML, and equivalejnt in MS Word files, OpenDocument, Dublin Core, etc. Because there's often a blurry line between content and metadata (title / author / subject are typically both) it's then a question of domain-specific languages and whether search engines can index them. Take XBRL for example, which showing financial reporting information and industry search engines can trawl it and let you search fields and see trends. More industry specific formats will occur in time, consolidate, and we'll get rich data. It's taking its own sweet time but we've got more structured documents now that we did 5 years ago.

      Third, it's not a sure thing that categories and metadata are even the way to help you find things. When it comes to categories vs tagging I'll take tagging any day for finding the relationship between things rather than a formally expressed categories. Formal categories are hard to maintain and don't scale, this is the lesson of Google's search being built (largely) around link terms, and why Yahoo Directory is so out of date, Statistical Analysis beats Categories Hierarchies (well, most of the time).

    2. Re:Content is king by Ruis · · Score: 3, Interesting

      This whole thing sounds like the CIC database in Snow Crash.

    3. Re:Content is king by ozmanjusri · · Score: 4, Interesting

      Secondly there are metadata standards and ways of getting information out of files. There's the obvious title / author / subject tags in HTML, and equivalejnt in MS Word files, OpenDocument, Dublin Core, etc.

      This is going to be the interesting part, and is probably why Google has been showing so much interest in Open Office/OpenDocument. When the pages of this web are XML served by a Google database, and the browser is an XML reader/editor based on OOo or equivalent, you have a much richer, more collaborative internet. A rich web, layered on top of the existing net.

      Google will be in on the ground floor of this too, and because huge amounts of the metadata will be part of the structure of the rich web, they'll be able to index it and deliver the aggregate information (which is their product) an order of magnitude more effectively than before.

      --
      "I've got more toys than Teruhisa Kitahara."
  2. Deep Search by evw · · Score: 4, Interesting

    They've said in the past that the next big step in search is searching databases that other people own. This would seem to be the interface to make that possible. i.e. rather than web crawling to attempt to harvest data, they have people push it to them. Sidesteps the copyright and robots.txt problem. If you want your data to be searchable then you push it to Google.

  3. Excellant news for contract service providers. by stimpleton · · Score: 2, Interesting


    I'm just drawing up a reply to a RFI from a health provider. They are upgrading their medical records database.
    My solution included oracle on linux servers.

    I'll just use this instead..but just say I'm providing the infrastructure.

    Yassah.

    --

    In post Patriot Act America, the library books scan you.
  4. Data is only as good as its source... by Colz+Grigor · · Score: 4, Interesting
    Again, just because information is out there doesn't mean that it's accurate or complete. Providing a tool to capture more data will have the tendency of diluting the level of accuracy of available information.

    If this data is ever going to become useful, Google will needs to create a system for moderation of informational accuracy and usefulness. Their page-ranking mechanism is a good start, but I just don't trust it to tell me that the first few results on a subject I'm researching are accurate.

    This is why Google also needs a trust network. They certainly could begin to leverage Orkut to do this. I'd give more credence to an information source if I knew that someone in my trust network also gave credence to it.

    Google doesn't seem to have a unified and communicated vision. Sure, they can hire the most talented engineers and they can keep cranking out the coolest toys, but what would actually move the internet forward is a way to combine all of those toys into a single, simple platform. For example, combine Orkut and page ranking. Rank my search results differently than someone else's because they have different trust relationships. In my opinion, Google has had only one real hit so far, and that's Google Earth. With that much corporate intelligence, I'd like to see Google doing more.

    ::Colz Grigor

  5. Re:Baffling! by Anonymous Coward · · Score: 1, Interesting

    What data is not considered information, and vice-versa?

    Random strings maybe.

  6. Isn't Google Base redundant ? by Anonymous Coward · · Score: 1, Interesting

    Ye Olde Way : create content -> host it yourself/at an ISP -> various search engines (including Google) will index it -> others can search it.

    Ye Google Base Way : create content -> submit to Google Base -> others can search it only though Google.

    Google would be wasting massive resources in this, if they ever launch it, and their only benefit would be that they would in a way 'own' that content. I don't believe they would be making this content available to MSN or Yahoo. The stench of evil is just too much, Google !

  7. you give up freedom, but by circletimessquare · · Score: 1, Interesting

    you gain ease of use

    doing it on your own is hard and expensive

    basically, google is now acting as your website

    i'm just waiting for the google-hosted porn sites, like yahoo groups

    --
    intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
  8. AOL+ by glengyron · · Score: 2, Interesting

    Let's just call it AOL+

    You take the world's most successfully decentralised network, and for convenience and searchability you umm.... centralise it...

    Take all the power of anyone being able to interconnect which allows free speech to flourish all over the world (even in China if you're wise enough) and then umm.... put it all into the control of one corporate entity in the United States.

    Remember the situation with China... Google (as a corporatation) complied with the law and handed over private gmail information to the Chinese authorities trying stiffle free speech... now image if _everything_ is subject to that control mechanism?

    Google is already so powerful that if your business isn't listed easily in the results you might as well pack up and go home... this just makes that problem even worse.

    Basically Google wants to kill the Internet, to make it work better. AOL didn't die... the whole internet became AOL....

  9. GoogleBase.com ? by AaronCampbell · · Score: 2, Interesting

    gmail.com is to mail.google.com as http://googlebase.com/ will be to base.google.com?

  10. Re:Legal questions? by ZachPruckowski · · Score: 2, Interesting

    Well, if it is publicly searchable, then all Google has to do is let the FBI search for watch words. Which ought to be easy enough. Even if it isn't publicly searchable, then it'll be just like gmail, they have to let the Feds in when the law says they do.

    But Google is itself immune from prosecution under the Betamax decision, and the Grokster case, since all it needs is a legitimate primary use, unless Google like publicly supports the use of the software for illegal purposes. Or something like that. IANAL, nor am I pre-law.

  11. Re:Google have taken their eyes off the ball by jacksonj04 · · Score: 3, Interesting

    I think Google have done anything but take their eye off the ball. Remember how Froogle and Google Local were once beta projects, and are now integrated with google.com search? And then Google Maps was slipped into the equation. define: has been moved out of a little-known backwater of the site and integrated with google.com...

    Google having a foot in all the doors simply means they are finding the best way to index and search that information. It won't surprise me if they all end up integrated somewhere with just plain Google Search, to the extent that they lose their own 'section'. Google Base is simply (from what I can tell) a huge database of everything, which (chances are) will end up integrated.

    I want to be able to log in to Google and have all my own data at my fingertips, easily searchable, and for the engine behind it all to know what I'm after. At the moment, powerful though other web searches may be, Google is the only company to attempt to unify everything for the users. If Google can provide what I'm after, I would be willing to pay a significant amount of money to have them organise all my data, be it news, emails, contacts, files, web history, chats, driving directions, cinema times... the list goes on.

    --
    How many people can read hex if only you and dead people can read hex?
  12. calendar too? by karthik_r085 · · Score: 2, Interesting

    calendar.google.com redirects me to google.com, while other links like
    audio.google.com or browser.google.com, says "siteurl could not be found. Please check the name and try again."
    Is google also developing calendar?

  13. Re:Only one question... by kebes · · Score: 4, Interesting

    As I said in another comment, an example of a problem I sometimes have is that I have some content that I would like to share with the world, but no decent way of doing it. Sometimes I can mesh it into Wikipedia or something... but other times there's no place to put it. Or maybe putting it somewhere else is complicated. Like I have a recipe or a cool trick to solve a problem in Linux. I could make an account with some recipe website or with Linuxforum.org or whatever, but that's a pain. I just want to make the information available to people. I could make my own mini-website and host it, but no one would ever find it.

    But if GoogleBase exists, and I just upload content, and let Google index it for me, I'm done. I can refer friends to it (either via URL or even by describing it, and letting them just do a search for it). I can even upload (non-private) files that I often need to refer to... and then they are always accessible. In fact, since GoogleBase will probably have a private mode, I can use this as a network drive that is accessible anywhere in the world. Not only that, but it does automatic backups and is automatically indexed and searchable. So for semi-private documents that I always need access to, it's great. I post my CV and then I can casually refer somewhere to where it is located. I don't have to pay for webspace.

    Many people use the GMail File System hack so that they can use their GMail account as if it were a hard drive. Google is formalizing it so that we can have access to data easily. I think this solves alot of problems for alot of users. The tradeoff is that I get free web-hosting and even free network storage, as long as I agree to have them index it. Many people are willing.

  14. Re:why? by umeshunni · · Score: 3, Interesting

    This is Microsoft's scrapped Hailstorm initiative all over again. Except that it's Google doing it. It's interesting to note that two of Haistorm's key architects (Mark Lucovsky & Adam Bosworth) now work at Google.
    I suppose they think the same idea would work if a different company did this.

  15. Re:why? by daviddennis · · Score: 2, Interesting

    Google will never charge for raw search results (as opposed to adwords). Google has plenty of competition that does exactly that, and uses underhanded methods (i.e. spyware) to direct people to their sites. Despite these tricks, those sites are nowhere near as popular as Google and don't make the kind of money Google does. Google is not going to mess that up.

    Your observations would appear to mean that Google Adwords are effective advertising.

    My business partner and I have a business here, and even though it's geographically focused to a specific area, AdWords has been our most effective advertising, comfortably surpassing TV and radio advertising and even exceeding our second most effective method, blanket handing out of flyers.

    It's hard to get away from monopolies, especially in small markets like ours. For example, Comcast, the cable provider we used for our TV ads, is a monopoly, too. Google Adwords is effectively a monopoly. Whichever one is effective is what we'll continue to use; it may be an evil monopoly, but it's saved us from the evil of the other guys. It was a lot cheaper than TV ads.

    D

  16. Yahoo by meehawl · · Score: 4, Interesting

    Google may not be aiming to become Big Brother, but they're certainly aiming to provide every single service they possibly can.

    And so the transformation of Google into Yahoo is almost complete... I actually had the pleasure of predicting this to a couple of Google managers a few years ago when I was car pooling with them back up 101. I was the only non-Googler in the car. The conversation eventually got around to how to add more services while maintaining the "simplicity". I predicted that eventually, all services would end up doing the same kind of portal crap as Yahoo/AOL/MSN/Excite, etc. remember, those services became portals before the word "portal" was ever invented. I also predicted that the real rot would set in after the IPO, when Google attracted a lot of people from other companies who wanted to add that sort of stuff, because that was how they had done it in their previous jobs. And that was what the market expected. And once you're a public fad stock, shareholders demand "growth" stories to keep the high valuation and want you to add functionality, no matter how orthogonal that growth might be to your core business. It's feature creep, writ large.

    The rest of the trip was a bit frosty.

    --

    Da Blog
  17. Re:Baffling! by Eil · · Score: 2, Interesting


    What data is not considered information, and vice-versa?

    Data is a set of raw facts. (A stream of bits, for example.) After you apply some sort of algorithm to it, it becomes information. (A digitized image, for example.) After you mentally process the information and consider it within the context of the situation, it becomes knowledge. (Goatse.cx, for example.)

    Of course, there are some kinds of knowledge most people would rather not have.

  18. Google Payments? by bluephone · · Score: 3, Interesting

    Am I the only one to miss the official announcement of this? I've heard rumors about it for years, but when did it become a given?

    --
    jX [ Make everything as simple as possible, but no simpler. - Einstein ]
  19. Sarver.org - Quickbase by rsarver · · Score: 2, Interesting

    I am glad to see someone is reading my http://www.sarver.org/site ;) ... but to elaborate ... To me what Google is doing is not "innovative" or "novel" in the terms of the logic behind what you see as the GoogleBase application... as it has been done many time over. Quickbase, Intranets.com and so many others have done the same thing (http://www.google.com/search?q=web+database). However, what is novel, is the fact that they are positioned to take full advantage of the information that you are inputting to their system and use it in such a way that allows them to leverage their existing search infrastructure to better index your content. Previous to this attempt, each company was solely positioned as a software product company, not as a search, or integrated search company... nor had any company opened their kimono in such a way as to allow other ASPs to use their back end as THE backend. kudos...

  20. Re:Google have taken their eyes off the ball by adpowers · · Score: 2, Interesting

    I agree. I've been to a number of Google tech talk/recruiting sessions and they really emphasize the small groups inside the company. Most projects seem to be 2-4 people. Only when a product nears launch do more people get involved (lawyers, UI designers, translators, etc.). I thought they said that Gmail was done by about four people for most of the time. When you have this many groups, of course there will be lots of diversity. Sometimes when I see a new product or one-box coming out of the Googleplex, I get worried that they are losing their focus and relevance (for a while they had two competing "definitive answer" one-boxes that sometimes both showed up for one query; I can't remember the names exactly, but I thought it looked pretty bad... where was the coordination?). However, most projects get fixed up and made better, and I regain faith in Google. Right now I'm most skeptical about the Web Accelerator (just because of all the problems it has had).

    Also, you are correct. Google has just started rolling out phase two of a three phase update to their ranking engine (dubbed jagger update). A lot of SEO folk are complaining about losing placement, and if they are complaining, I think that means we are getting better results :)

    Andrew