It's impossible for a university to teach you everything you need to know to handle some particular business situation. The amount of specialization that would have to exist in a degree such as computer science would be enormous. A staggering number of similar degrees would have to exist as a result. What's more important, is the value of that specialization in states of the economy such as the one we are experiencing now.
My brother has gone the IT route. He has taken all the numerous certifications (A+, Novell, MSCE, etc) and in todays economy they are pretty much worthless. He has attended about four job fairs in the past month and has learned the harsh truth. Industry pumped out more specialized IT works then the industry can support. Every single company he talked to just threw his resume on the mountain of similar "IT professionals".
I, on the other hand, obtained a computer science degree at PSU. I also ended up with a job that doesn't use *anything* from computer science. I teach Cisco Networking at a local high school and perform IT work (maintaining parts of the computer network). The CS degree got me into the door for this job, which affords me enough free time to work on my own software development company. This is probably one of the better examples of a non-specialized degree enabling you to function in occuptions not directly related with the degree.
I can't say the same for the more specialized education my brother received. The truth is, that you obtain a degree at a university because you need to learn how to *learn*. A computer science degree, for example, adequately enables you to adapt to more situations than a specialized degree. Futhermore, you end up with a deeper understanding of how to better solve problems.
That is something that will help you throughout your entire life.
When developing Sparkseek we utilized checksums as a very, very basic way to check for duplicate pages when spidering. So much research has been done on the topic of detecting similar pages, particularly by guys like Sergey Brin of Google and a number of people from Digital (Altavista guys who did a lot of work on document clustering).
What is suprising to me is that using a checksum to check for changes is such an obvious application of an algorithm that I find it funny it was awarded a patent at all. It would be like awarding a patent to someone who thought of sorting page result ID's using radix sort.
Yes, this is an obvious idea. Yes, it has been done before. Prior art on this one should be much more of a cakewalk than the Amazon.com 1-click deal.
I'm one of the authors of Sparkseek, a remotely-hosted search service. I'm also a student at Pennsylvania State University. I want to give you an idea of what kind of problems researchers in the field of internet text retrieval have to deal with.
Larry Page, one of the co-developers of the Google search engine said in his 1997 research paper entitled "The Anatomy of a Large-Scale Hypertextual Web Search Engine" that the primary benchmark for information retrieval, the Text Retrieval Conference, uses a fairly small, well controlled collection for their benchmarks. The largest benchmark they have available is only 20GB compared to the 147GB from Google's crawl of 24 million web pages. Today, Google has over 1.4 billion web pages in their database and a reported 4,000 node linux cluster.
One of the problems I have encountered and digress that I've found difficult to deal with is the shear amount of redundancy in web content. Anybody who has ever tried a search for any linux command has no doubt encountered hordes of duplicate MAN pages in their results.
Not only that, but I honestly don't believe that when it comes to search engines, more is better. I have noticed over the past 6 months, as google has made great increases in its index sizes, that results have consistently become worse and worse. Search engines really need to begin narrowing the focus of their index and creating multiple indexes. Educational institutions should be separated from commercial establishments.. if I'm performing research on some subject, the last thing I want is to arrive at a commercial establishment pitching some product.
Also, the method google utilizes when creating their indexes creates a huge scalability problem. Their indexes are updated less frequently that ever, and if you read their document that was published in '97, it's not hard to see why.
It's impossible for a university to teach you everything you need to know to handle some particular business situation. The amount of specialization that would have to exist in a degree such as computer science would be enormous. A staggering number of similar degrees would have to exist as a result. What's more important, is the value of that specialization in states of the economy such as the one we are experiencing now.
My brother has gone the IT route. He has taken all the numerous certifications (A+, Novell, MSCE, etc) and in todays economy they are pretty much worthless. He has attended about four job fairs in the past month and has learned the harsh truth. Industry pumped out more specialized IT works then the industry can support. Every single company he talked to just threw his resume on the mountain of similar "IT professionals".
I, on the other hand, obtained a computer science degree at PSU. I also ended up with a job that doesn't use *anything* from computer science. I teach Cisco Networking at a local high school and perform IT work (maintaining parts of the computer network). The CS degree got me into the door for this job, which affords me enough free time to work on my own software development company. This is probably one of the better examples of a non-specialized degree enabling you to function in occuptions not directly related with the degree.
I can't say the same for the more specialized education my brother received. The truth is, that you obtain a degree at a university because you need to learn how to *learn*. A computer science degree, for example, adequately enables you to adapt to more situations than a specialized degree. Futhermore, you end up with a deeper understanding of how to better solve problems.
That is something that will help you throughout your entire life.
---
Michael Tanczos
Gamedev.net
http://www.gamedev.net
When developing Sparkseek we utilized checksums as a very, very basic way to check for duplicate pages when spidering. So much research has been done on the topic of detecting similar pages, particularly by guys like Sergey Brin of Google and a number of people from Digital (Altavista guys who did a lot of work on document clustering).
What is suprising to me is that using a checksum to check for changes is such an obvious application of an algorithm that I find it funny it was awarded a patent at all. It would be like awarding a patent to someone who thought of sorting page result ID's using radix sort.
Yes, this is an obvious idea. Yes, it has been done before. Prior art on this one should be much more of a cakewalk than the Amazon.com 1-click deal.
--- Michael Tanczos
I'm one of the authors of Sparkseek, a remotely-hosted search service. I'm also a student at Pennsylvania State University. I want to give you an idea of what kind of problems researchers in the field of internet text retrieval have to deal with.
Larry Page, one of the co-developers of the Google search engine said in his 1997 research paper entitled "The Anatomy of a Large-Scale Hypertextual Web Search Engine" that the primary benchmark for information retrieval, the Text Retrieval Conference, uses a fairly small, well controlled collection for their benchmarks. The largest benchmark they have available is only 20GB compared to the 147GB from Google's crawl of 24 million web pages. Today, Google has over 1.4 billion web pages in their database and a reported 4,000 node linux cluster.
One of the problems I have encountered and digress that I've found difficult to deal with is the shear amount of redundancy in web content. Anybody who has ever tried a search for any linux command has no doubt encountered hordes of duplicate MAN pages in their results.
Not only that, but I honestly don't believe that when it comes to search engines, more is better. I have noticed over the past 6 months, as google has made great increases in its index sizes, that results have consistently become worse and worse. Search engines really need to begin narrowing the focus of their index and creating multiple indexes. Educational institutions should be separated from commercial establishments.. if I'm performing research on some subject, the last thing I want is to arrive at a commercial establishment pitching some product.
Also, the method google utilizes when creating their indexes creates a huge scalability problem. Their indexes are updated less frequently that ever, and if you read their document that was published in '97, it's not hard to see why.
Michael Tanczos