P2P Web searches
prostoalex writes "Researchers at UCLA are looking for easier ways to implement Web searches by using peer-to-peer techniques to decrease the workload. 'Queries need to be passed along only a few links rather than flooded throughout the network, which keeps search-related traffic low,' reports Technology Research News."
I'm sick of all this hype about p2p. Its a good technology but its not like we have to use it for everything. The old ways of doing things still work.
"It is not how things are in the world that is mystical, but that it exists." -Ludwig Wittgenstein
haha, it's not slashdotted yet, but who the hell cares?!!!?!
Simple search lightens Net load
September 8/15, 2004
By Kimberly Patch, Technology Research News
Researchers working on finding better ways to search the Internet are increasingly turning to methods that require individual nodes, or servers, to know a little bit about nearby servers, but don't require servers to look much beyond their own neighborhoods.
This type of co-operation, which is also found in many natural networks such as insect communities, uses local rules in such a way that the system as a whole has predictable global properties.
Researchers from the University of California at Los Angeles have devised a fast search algorithm that uses local rules to find nodes and content in randomly-formed, scale-free networks such as the Internet. Scale-free networks have a few nodes that have many connections to other nodes, and many nodes that have far fewer connections.
"Without global knowledge of the network, and only doing local operations, we can make the cost of searching an entire network grow less than linearly [with] the size of the network, and still have the query be very fast," said Vwani Roychowdhury, a professor of electrical engineering at the University of California at Los Angeles.
The search algorithm could be used to increase Internet efficiency by making it easier to find routes between hosts, said Roychowdhury. It could also be used to reduce traffic in peer-to-peer networks running on the Internet that allow people to exchange files, like Kazaa and Gnutella.
Queries need to be passed along only a few links rather than flooded throughout the network, which keeps search-related traffic low, said Roychowdhury.
"Many networks are known to be scale free... our search algorithm could be applied to all of them," said Roychowdhury. The researchers' simulations have showed that the algorithm could reduce Gnutella traffic by one or two orders of magnitude, he said.
In 2001, researchers at Stanford University and Hewlett-Packard Laboratories developed a simple, light-weight random search algorithm for peer networks. That algorithm forwards queries one node at a time whereas the UCLA researchers' algorithm operates in parallel, said Sarshar.
The researchers' algorithm is based on the bond percolation threshold, or the smallest probability that a message is guaranteed to reach a core sub-network of highly-connected nodes, said Roychowdhury.
As connections randomly percolate through the network at a low rate, only small, isolated islands form. Once the bond percolation threshold is passed, the core of the network becomes connected. The threshold is an abrupt phase transition like the quick transformation of would in that takes place when water boils or freezes.
The algorithm involves three basic steps: content caching, query implantation, and bond percolation, said Roychowdhury.
Content caching happens when a node joins a peer-to-peer network and performs a one-time short random survey, or walk of nearby nodes and adds its content directory to each of these neighboring nodes.
The query implementation step is similar, but happens at the beginning of every query. When a node has a query, it performs a short random walk and passes the query along to each node it encounters.
These random walks are long enough that any given node will almost surely encounter at least one highly-connected node, said Roychowdhury. "So after these two steps one of the high-degree nodes has a copy of a node's directory, and a query is implanted at one of the high-degree nodes."
Once this has been set up, the bond percolation step makes sure that the directory and query connect.
In this last step, all of the initially queried nodes percolate the query throughout the network so that the query is guaranteed to reach a core sub-network of highly-connected nodes. "Since a copy of the query is in one of the nodes in the core network, a
The searching load on servers might be reduced i suppose. But from my experiences with P2P searches are long and slow. How would this help exactly?
Maybe in future Google will implement a small server in our "Gmail notifier" application, and each time we search for something on google, it will cache some of the results, and should anyone close by ask for it, just forward the old results to them.
:D
Save the server load on the main google server!
**Plus maybe some smart guy will figure out how to trade mp3s over the GoOgLe-P2p network!
Online backup with Mozy, sounds like Ozzie, but more!
Google still works.
Results 1 - 10 of about 6,290,000 for p2p [definition]. (0.19 seconds)
webpage
This is how we prevent Slashdotting in 2004:
0 90804/Simple_search_lightens_Net_load_090804.html
http://www.trnmag.com.nyud.net:8090/Stories/2004/
that all the peer would know what i was searching for? No thanks.
I have 6 gmail invites to give away.
GETPKG - Package Management for Slackware
From a quick read of the article it sounds like what they've done is implemented a slightly more sophistcated/less deterministic version of the ultrapeer/hub system already in use by Gnutella/G2 Basicaly quereies are routed such that they are guarenteed to reach a "highly-connected node" which is the equivalent of an ultrapeer/hub node. The main difference is the folks at UCLA have come up with a novel method of picking ultrapeers, but the end result isn't much different.
The query lifestyle is a sin!
Q: What is $search_term and how does it work?
A: A simple google search shows that $search_term is $blahblah and you use it like $this (repeated a hundred times)
Add another hundred replies about how the poster should search before submitting, and how AskSlashdot is degenerating into AskPeopleToGoogleForYou, and there you have it. P2P searching in all its glory.
Ceci n'est pas une sig
That wouldn't solve the problem of local areas of users that are disconnected from everyone but themselves. I know this is an issue with other p2p apps. You can only connect to someone who's in your area, and sometimes that just isn't good enough. I know China is in many respects isolated from the rest of the internet.
Si la vida me da palo, yo la voy a soportar Si la vida me da palo, yo la voy a espabilar
A group of researchers from UCLA have been hired by Google Corporation with enticing payrolls and stock options.
Infrasearch was working on this, until Sun paid $8M for the company, them had them work on something else, then Gene Kan committed suicide. Be careful what you work on.
Google, Yahoo etc of course crawl the web at large, but even if you want to throw a peer network at crawling, aren't you mitigating freshness?
What I can see is a DNS-like system for propogating metadata in to the interior of the network, and maybe a caching mechanism as a result...not sure if this is what they mean.
It's called grub.
Feel free to shoot full of holes as needed....
Every website has DNS servers so what if that same company that ran the DNS servers indexed the pages of the sites that it hosted? Daily?
Wouldn't that then provide a complete index of the web?
Start a search and somehow get the results back through that distributed method. Haven't figure that out yet...... but if you can...
PROFIT!!!!!
What is Google Calculate for? Isn't this the same kind of thing?
How would this affect the DNS? Would you need central servers for name lookups anymore?
It's an ariticle describing a new p2p query routing method. Nothing more. There's already a lot of such algorithms out there. This one seems to exhibit some nice completness properties that hold in idealized scale free networks. But I'm not convinced such a theoretical property would hold in the real world. While p2p networks tend to be roughly scale free, the "roughly" and "tend to be" qualifiers are what make such theoretical properties unlikely to hold in practice.
Nice to see they plan to release some software based on the technique though.
A peer to peer program Ants P2P has just implimented a Distributed Search Engine .Ants P2P is Based on Ant Routing Anlgorithms so it needed a solution to finding files on its network it found a solution that works .The Network also has a HTTP tunneling feature and its developer Roberto Rossi is creating a search solution based on simmilar methoods to search Web Pages published on the network .
Ants P2P is designed to protect the identity of its users by using a series of middle-men nodes to transfer files from the source to destination. As additional security, transfers are Point to Point secured and EndPoint to EndPoint secured.
1. Distributed search Engine - Each node performs periodic random queries over the network and keeps an indexed table of the results it gets. When you do a query you will get files with or without sources. If you get files simply indexed (without a source), you can schedule the download. As soon as Ants finds a valid source, it will begin the download. This will also solve the problem of unprocessed queries. This way you will get almost all the files in the network that match your query with a single search.
http://sourceforge.net/projects/antsp2p/
I'm so sick of companies wanting to push off their crap onto us. If I want something from them they should offer it me on terms I find acceptable.
In this case a couple of text links which may intrest me (Google refrence : check).
I don't want to have to share my bandwith with 50 other people so they can do the same. If you want to run a service, website or game server you should pay for it. Don't start passing off the bandwith bill onto us users.
Either get used to the heat (price) or get out of the kitchen (market).
I like muppets.
--
Try Nuggets , the mobile search engine. We answer your questions via SMS, across the UK.
Step 1) Find established technology which is working more or less happily as-is
Step 2) Add the word 'p2p' in front of it.
Step 3) ???
Step 4) Profit
I assume Step 3) is now as simple as "show name of new product with 'p2p' in the subject and explain how its NOT related to pirating movies or music" (to increase investor confidence they're not going to get taken to town by the RIAA/MPAA), then its just sit back and watch the fat investment/grant dollars roll in!
I wonder If I was alone thinking about something like this when reading the title? :-)
Beware: In C++, your friends can see your privates!
They can always link to the googles very own cache. :-)
Well, actually they might be on to something as I said in a comment on a post some months ago (Why can't I peruse all my comments? (sans subscription)) and also, I noted that a p2p encrypted backup technology would be a good idea, which was then taken off and written about
I said, it'll be peer to peer everything. (in this case, p2p raid, for redundancy, not performance) using certs.
#hostfile 0.0.0.0 primidi.com 0.0.0.0 www.primidi.com 0.0.0.0 radio.weblogs.com
An article about research which showed that random network crawlers gave increased performance on P2P networks... perhaps this means that better performance could be managed if a skynet esque 'self aware) ie third party knowledgeable layer of the network existed to facilitate each node (searching)
Sorry, I hope that makes sense in context.
#hostfile 0.0.0.0 primidi.com 0.0.0.0 www.primidi.com 0.0.0.0 radio.weblogs.com
"The proxy contains an index-sharing p2p-based algorithm which creates a global distributed search engine. This spawns a world-wide global search index. The current release is a minimum implementation of this concept and shall prove it's functionality."P Proxy/index.html
V olunteers. html
--http://www.anomic.de/AnomicHTT
"If the index-sharing someday works fine, maybe the browser producer like Opera or Konqueror would like to use the p2p-se to index the browser's cache and therefore provide each user with an open-source, free search engine."
--http://www.anomic.de/AnomicHTTPProxy/
Just keep comin' round.
Harvest
BugBear
Ignorance is curable. Stupid is forever.
Which reminds me of an interesting long-term monitoring idea: track Google responses for the same query over a long time, and monitor the response time (e.g. 0.19 seconds in the above example). Is anyone doing this?
Simpy
we all know gnutella had a stinking algorithm for searching files.
basically it was a big, fat broadcast of all queries to all hosts, regardless of whether it mattered to that host or not. only very few clients could cope with the linear growing bandwith requirement. the other just "missed" the queries and so the net fragmented.
there were a lot of people who knew this.
one of the first "academic" solutions that came up (at least to my knowing) was p-grid (http://www.p-grid.org/), which uses extremely interesting algorithms, but was never released to the public. i think kademlia uses those algorithms presented in the papers.
edonkey, kaazaa, and all the other offsprings of the hype had better solutions and of course this advantage is the reason they succeeded.
these "new" algorithms are well-established, and it makes sense so seek new applications for them.
Only morons moderate based on a sig.
If you need text styles to communicate then you don't have a message.
There's always WebGoggles. http://webgoggles.com/