If PostgreSQL's full-text indexing is anything like MySQL's, I urge you not to use it. Things I heard about MySQL's full-text index are horrible! Instead, integrate Lucene with your application/database. If you need a book, there is Lucene in Action with free code and sheap eBook version. Full disclosure: I'm one of the authors. Simpy is a good example of PostgreSQL + Lucene integration.
Oh, and if you want non-Java solution, there are several Lucene ports available: C++, Python, Perl, C#, Ruby...
I've been in the industry for about 10 years now, always surrounded by H-1B workers. They've all been WELL paid (software field). Moreover, INS (now USCIS) has a prevalent wage requirement for H-1B workers. I believe that wage is about $75K for software engineers currently. Thus, any employer offering a salary below this rate to a software engineers should/would be denied the H-1B visa.
So, I'm not sure if what you are saying is really true, or.... I'm not going to get into that.
Just read the Slashdot blurb, but this sentence implies this is an Apples and Oranges comparison: "The average salary for a programmer in California is $73,960, according to the OES. The average salary paid to an H-1B visa worker for the same job is $53,387; a difference of $20,573"
Apples : programmer in CA Oranges: H-1B visa worker
Not all H-1B visa workers are programmers in CA. I've got plenty of non-programmer friends on H-1B visas - designers, bankers, advertizers, marketing people, etc.
See http://lucene.apache.org/nutch/ and look for Nutch NDFS (something similar to Google's FS you mentioned). I use Nutch over at Simpy (think Web 2.0) and am very happy with it.
This has to be taken with a grain (or few grains) of salt. Remember, this is the head of the same company that was once laughing at Web Portals and said they would stay focused on search. So, Yes Office Suite Google! It's just a matter of time and surprise.
I say bring the guy to NYC, and power a fleet of his cars for months using our high-energy subway and park rats at $0/liter!
Seriously, for those of you in NYC, when you find yourself at Union Square after dark, take a few minutes to stop and observe the lovely park animals. Rats, of course. South-west corner, right by the intersection is especially active. People are walking by, not paying any attention (the NYC style), but there are whole families of rats in that corner (also not paying attention to humans - NYC style rats).
This is a type of question that, unfortunately, cannot be answered correctly. Well, it can: it depends. But that's not what you are after. As Hans himself pointed out, there are some fsync performance problems with ReiserFS. If you look at PostgreSQL config files, you'll notice a "fsync" setting, and if you look at pgsql-performance mailing list, you'll see frequent mentions of fsync. Obviously, fsync affects DBs (not just PostgreSQL), and ReiserFS may not currently be so great for DBs. However, it is apparently good for large directories (1 directory, lots of files in it). So, it depends how you use your FS. Describe how you use your FS, and maybe somebody can provide good feedback.
This is very similar to what happens to popular tech/geek sites and their audience over time. At the beginning we have early adopters, and those tend to be technically savvy people and geeks. Web server log analysis shows high percentage of Firefox and Safari browsers. Time passes, and the site becomes known to less techy people. Web server log analysis starts showing a decline in Firefox users and increase in Internet Explorer users, despite Firefox slowly taking over and spreading among the typical Internet users. This must have happened with Google, and this is now happening to Simpy, an increasingly popular social bookmarking and personal web service. The same phenomenon happened in the world of blogs, where bloggers like Steve Rubel said they wakt up at 4-5AM in order to beat the other blogging crown and blog news first. Of course, that can't last very long. When people like that run out of steam, regular, more normal and numerous bloggers enter the stage.
Re:Wow, open source search engines.
on
Lucene in Action
·
· Score: 1
It sounds like you may be interested in Nutch, a sub-project of Lucene. Nutch is a full search engine package (fetcher, indexer, searcher, etc.), made to work in a cluster, etc. Of course, at the core of indexing and searching functionality is Lucene.
Funny you mention delicious:) I run Simpy (see the signature or use the demo/demo account), which has some notable advantages over delicious, especially in the search area (surprise, surprise).
Bookmarks solution from the book author
on
Lucene in Action
·
· Score: 1
Hello from Otis, one of the co-authors of Lucene in Action. It is interesting the book review starts with a problem with bookmarks in the browser, because I run Simpy, a fairly popular social bookmarking service. The reason I started the service a few years back was because with a few keywords + search I could locate my bookmark far more easily and much faster than traversing my bookmark folder hierarchies.
Anyhow, I just wanted to connect these 3 islands - Lucene in Action + bookmark problem + Simpy. I'll go back to reading the rest of the review now...
RSS indeed dominates the feed scene, but Atom 1.0 has just been reviewed and approved by the Atompub Working Group (part of IETF, the same group that standardized HTTP, SMTP, and many other RFCs).
Thus, I wouldn't be so quick to claim RSS' victory. Tim Bray is a big supporter of Atom, and here is recent report titled RSS 2.0 and Atom 1.0 Compared. Over at Simpy (feel free to use demo/demo account if you don't have an account yet), I am happily supporting RSS and Atom (as well as RDF).
I believe Atom also has the "push" component, and not just "pull" that RSS has. That is, I believe Atom spec contains specification of Atom as a way for making requests to web services, while RSS, I think, only lets you publish the data passively, and have clients actively pull it. I can't find good references to this now, but maybe somebody else can find them and reply to this thread.
If PostgreSQL's full-text indexing is anything like MySQL's, I urge you not to use it. Things I heard about MySQL's full-text index are horrible! Instead, integrate Lucene with your application/database. If you need a book, there is Lucene in Action with free code and sheap eBook version. Full disclosure: I'm one of the authors. Simpy is a good example of PostgreSQL + Lucene integration.
Oh, and if you want non-Java solution, there are several Lucene ports available: C++, Python, Perl, C#, Ruby...
I've been in the industry for about 10 years now, always surrounded by H-1B workers. They've all been WELL paid (software field). Moreover, INS (now USCIS) has a prevalent wage requirement for H-1B workers. I believe that wage is about $75K for software engineers currently. Thus, any employer offering a salary below this rate to a software engineers should/would be denied the H-1B visa.
So, I'm not sure if what you are saying is really true, or.... I'm not going to get into that.
Just read the Slashdot blurb, but this sentence implies this is an Apples and Oranges comparison:
"The average salary for a programmer in California is $73,960, according to the OES. The average salary paid to an H-1B visa worker for the same job is $53,387; a difference of $20,573"
Apples : programmer in CA
Oranges: H-1B visa worker
Not all H-1B visa workers are programmers in CA. I've got plenty of non-programmer friends on H-1B visas - designers, bankers, advertizers, marketing people, etc.
See http://lucene.apache.org/nutch/ and look for Nutch NDFS (something similar to Google's FS you mentioned). I use Nutch over at Simpy (think Web 2.0) and am very happy with it.
For those using Java and wishing for a RoR-like package for Java, look at Trails - https://trails.dev.java.net/
Computers emit heat, and there are LOTS of them around the world.
Let's see what ratio of on/off answers we get here on Slashdot. Same question for monitor.
I'll start:
Computer: off
Monitor : off
(I also turn off my cable modem and wi-fi router)
This has to be taken with a grain (or few grains) of salt. Remember, this is the head of the same company that was once laughing at Web Portals and said they would stay focused on search. So, Yes Office Suite Google! It's just a matter of time and surprise.
I saw the terminals built into seats of Song (a Delta Company) airline reboot the other day, and they, too, are running Linux.
Are they trying to prove that 2 wrongs make a right?
It's time for them to start doing this... :)
I say bring the guy to NYC, and power a fleet of his cars for months using our high-energy subway and park rats at $0/liter!
:)
Seriously, for those of you in NYC, when you find yourself at Union Square after dark, take a few minutes to stop and observe the lovely park animals. Rats, of course. South-west corner, right by the intersection is especially active. People are walking by, not paying any attention (the NYC style), but there are whole families of rats in that corner (also not paying attention to humans - NYC style rats).
So, bring theGerman dude to NYC.... please!
This is a type of question that, unfortunately, cannot be answered correctly. Well, it can: it depends. But that's not what you are after. As Hans himself pointed out, there are some fsync performance problems with ReiserFS. If you look at PostgreSQL config files, you'll notice a "fsync" setting, and if you look at pgsql-performance mailing list, you'll see frequent mentions of fsync. Obviously, fsync affects DBs (not just PostgreSQL), and ReiserFS may not currently be so great for DBs. However, it is apparently good for large directories (1 directory, lots of files in it). So, it depends how you use your FS.
Describe how you use your FS, and maybe somebody can provide good feedback.
There are 2 other significant players in the FS field that Hans doesn't mention:
XFS (from SGI) and GPFS from IBM.
GPFS has a different focus, but XFS seems to be aimed at solving similar problems as ReiserFS (scalability, high performance, journaling).
I'm a very happy PostgreSQL user - see the sig.
Today is a big shopping day, and when that happens I love watching the buzz spread. Here are some graphs that show the spreading:
- eBay AND Skype
- Oracle AND Siebel.
- the above graphs combined.
This blog post has a nice graph showing the "eBay buys Skype" buzz spike.
This is a bit of a moot point considering Google acquired a minority stake in Baidu back in 2004.
This is very similar to what happens to popular tech/geek sites and their audience over time. At the beginning we have early adopters, and those tend to be technically savvy people and geeks. Web server log analysis shows high percentage of Firefox and Safari browsers. Time passes, and the site becomes known to less techy people. Web server log analysis starts showing a decline in Firefox users and increase in Internet Explorer users, despite Firefox slowly taking over and spreading among the typical Internet users. This must have happened with Google, and this is now happening to Simpy, an increasingly popular social bookmarking and personal web service.
The same phenomenon happened in the world of blogs, where bloggers like Steve Rubel said they wakt up at 4-5AM in order to beat the other blogging crown and blog news first. Of course, that can't last very long. When people like that run out of steam, regular, more normal and numerous bloggers enter the stage.
It sounds like you may be interested in Nutch, a sub-project of Lucene. Nutch is a full search engine package (fetcher, indexer, searcher, etc.), made to work in a cluster, etc. Of course, at the core of indexing and searching functionality is Lucene.
Funny you mention delicious :)
I run Simpy (see the signature or use the
demo/demo account), which has some notable advantages over delicious, especially in the search area (surprise, surprise).
Hello from Otis, one of the co-authors of Lucene in Action. It is interesting the book review starts with a problem with bookmarks in the browser, because I run Simpy, a fairly popular social bookmarking service. The reason I started the service a few years back was because with a few keywords + search I could locate my bookmark far more easily and much faster than traversing my bookmark folder hierarchies.
Anyhow, I just wanted to connect these 3 islands - Lucene in Action + bookmark problem + Simpy. I'll go back to reading the rest of the review now...
This is what happened to the guy who spread the news early on:o tostream/
http://www.flickr.com/photos/smash/36648272/
http://www.flickr.com/photos/smash/36659424/in/ph
RSS indeed dominates the feed scene, but Atom 1.0 has just been reviewed and approved by the Atompub Working Group (part of IETF, the same group that standardized HTTP, SMTP, and many other RFCs).
Thus, I wouldn't be so quick to claim RSS' victory. Tim Bray is a big supporter of Atom, and here is recent report titled RSS 2.0 and Atom 1.0 Compared. Over at Simpy (feel free to use demo/demo account if you don't have an account yet), I am happily supporting RSS and Atom (as well as RDF).
I believe Atom also has the "push" component, and not just "pull" that RSS has. That is, I believe Atom spec contains specification of Atom as a way for making requests to web services, while RSS, I think, only lets you publish the data passively, and have clients actively pull it.
I can't find good references to this now, but maybe somebody else can find them and reply to this thread.
Perhaps it was only a rumour, but it had a very positive effect on Meetro:? q=&url=meetro.com
? q=&url=dodgeball.com
http://www.alexa.com/data/details/traffic_details
However, it's not necessary for that trend to continue. For instance, look at the Dodgeball spike:
http://www.alexa.com/data/details/traffic_details
Nice. Do they also make minime monitors I can put in my other pocket?