Domain: scu.edu.au
Stories and comments across the archive that link to scu.edu.au.
Comments · 30
-
Re:what a waste of article
Pretty sure MS-DOS 5.0 was rock-solid enough. It only faltered because things running under it had bugs that overwrote random bits - the kernel itself would be stable.
Here are some known MS-DOS bugs in 5.0 and later (the last one is amusing in its own way):
* DriveSpace (DoubleSpace, a.k.a. what was intended to compete against Stacker) bugs in anything earlier than 6.20: https://en.wikipedia.org/wiki/DriveSpace#Bugs_and_data_loss
* FDISK bugs of several types across several versions: http://www.msfn.org/board/topic/159631-testing-ms-dos-limitations/
* SCANDISK bugs in 6.20 and earlier (Cirrus Logic IDE controllers, when used with VERIFY=ON, resulted in erased drives): http://www.mail-archive.com/survpc@softcon.com/msg01557.html
* INT 21h AH=30h (Get MS-DOS Version function) bug: returns AL=$06 AH=$14 (i.e. 6.20) on 6.21 -- fixed in 6.22 (AL=$06, AH=$16): http://spike.scu.edu.au/~barry/interrupts.html#ah30 -
Re:Small, yes, but keep some perspective...
Yes, leverage.
Screen display, and keyboard and disk I/O were handled by DOS or deferred to the BIOS. INT-21h, anyone?
Now, don't make me get my cane!
-dZ.
-
Australian evidence: interbreeding with Aborigines
There is evidence in the DNA record of some regional tribes of Australian Aborigines that there was interbreeding with:
* Malacca / Macassa (modern day Indonesian) fishermen who frequented the fishing grounds of the north-west of the continent from probably by the mid-1,500s and arguably earlier;
* Portuguese discovery of the Australian landmass in the early 1500s, and contact of Portuguese sailors stranded by ship-wreck from the early 1,600s with local tribes;
* disputed but arguable Portuguese 'discovery' of the east coast of Australia, including ship wrecks and habitation, and contact with Aborigines in various areas;
* documented contact between early British settlers and Aborigines in eastern Australia, after 1788 into the late 1800s, included many 'tolerated' inter-marriages due to the lack of suitable female partners in the early convict/colonial days.
-
Re:Do this guys know the definition of user lock-iFollowup to my comment above about OnAustralia, Just found this old 1995 news:
Microsoft can lay claim to 80 percent of the computer users worldwide (Crow & Zampetakis, 1995) and a global revenue of $US4.7 billion in fiscal 1994 (Advertising Age, 1994). Its planned venture with Telstra in Australia (Microsoft Network or OnAustralia), will offer 'filtered' access to the Internet through Telstra's AUSTPAC. This move assumes that the Trade Practices Commission approves this $AUD9 million joint venture. (Crowe, 1995.) Given the convergence of News Corporation's Fox Television and production facilities with the carrier Telstra in Australia, and the formation of a Pay-TV company, Foxtel, one might rightly expect a crossing of the bridge between these technologies. Also, one should not forget that Microsoft's Bill Gates has stated plans to launch over 800 orbiting satellites in competition with Motorola's 64 low-orbit satellites, among others, thus illustrating even further the convergence that is taking place.
-
Re:Bonsai Bush
-
Re:TiVo is a victim
In the Betamax case the US Court system found that home copying for personal use is not infringement. This law has also been applied to PVRs. Frankly, it is common sense. You do have a right to record what you see on TV. I see no difference between this and the home use of a home DVD disk recorder (as has been mentioned in this forum). TiVo's competition would have to include some means (even if it were not as good) of "backing up" the material on their hard disks or seriously lose market share to TiVo. That makes it a temporary save in my book.
-
Google? Anyone?
The Anatomy of a Large-Scale Hypertextual Web Search Engine, has produced Google. Why make this post longer than it need be?
-
Directly from the Creators mouth to your brain
Academic citation literature has been applied to the web, largely by counting citations or backlinks to a given page. This gives some approximation of a page's importance or quality. PageRank extends this idea by not counting links from all pages equally, and by normalizing by the number of links on a page. PageRank is defined as follows:
We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
Note that the PageRanks form a probability distribution over web pages, so the sum of all web pages' PageRanks will be one.
Link Here -
Re:guess that proves
I wonder if the Australian police accidentally stormed the next door neighbours and shot the innocent occupants in the face with a 12 gauge shotgun?
As has happened in the past, by the now disbanded Tactical Response Group.
-
Re:Maybe they're tweaking
No offense, but your argumentation leads me to believe you have no idea what you're talking about. I suggest you pick up a book about information retrieval, e.g. Modern Information Retrieval by Baeza-Yates and Ribeiro-Neto. Or at least the seminal paper about Google, The Anatomy of a Large-Scale Hypertextual Web Search Engine. It's a fun read.
-
Try reading: The Anatomy of a Large-Scale Hypertextry reading: The Anatomy of a Large-Scale Hypertextual Web Search Engine
http://www7.scu.edu.au/programme/fullpapers/1921/
c om1921.htm -
Re:Retaliation
And yes, the stone fish is the world's most venomous.
Looking at Google, I'm seeing lots of conflicting stories about the good old stone fish. Most state that it is the most venomous, but "can often be deadly". I've definitely seen Aussie documentaries that are perhaps understating the dangers. This one states to immerse foot in hot water, which I have heard disables the venom.
This site suggests that death by heart failure can occur.
The deadly (female sydney) funnelweb is a fairly small spider though, and I'm pretty sure it's not very closely related to the tarantula
I saw something on TV recently that claimed that they were. This site suggests it, along with many others, are being incorrectly called tarantulas. Here's an interesting site, stating that they're not tarantulas.
A hard earned thirst needs a big cold beer!
-
Didn't get your ranking?
Complaints of bogus Google rankings are, I think, quite entertaining. What, AltaVista ranked your site higher than Google?
See, Google is a really unique entity. Most successful companies are driven by business types, suits. Google is a big collection of computer scientists doing research, and taking a no-compromises approach to product quality. They decided to go for long-term value -- having happy, well-served customers, instead of the many sites that went with pop-up ads, corporate tie-ins, sponsored portal links and the like during the dot-com era to boost short-term profit.
As a result, Google is on top. And they got on top by doing the Right Thing, unlike almost everyone else in the industry. It's an excellent example of the quality-through-competition-and-enormous-market that Internet visionary types have been trumpeting since the dawn of the Internet.
Of course, not everyone is happy about this. Competing search engines, the ones that frequently have far more money backing them, yet still can't keep up, complain bitterly. The marketing types that used to be able to trick the simple algorithms the old search engines used, or buy positioning in the searches, can no longer do that. I constantly hear bitter complaining about that as well.
But you know what? Despite all the mudslinging I've seen from these types, I've yet to see Google blow up yet. They consistently provide near-magical search accuracy, finding what I'm looking for. They have a simple interface that is built around what the Web was intended to look like (i.e. not pixel-positioned, invisible-table-laden crap). They cost me nothing, other than a few simple text based ads (which are small and have helped me occasionally). Google is absolutely incredible. They happened to be in the right position at the right time, and as consumers flock happily to using Google rather than remembering DNS entries for websites, a lot of companies feel unsettled. In their traditional world, they could *buy* a DNS name for a load of money. They could sue anyone with a competing name. All of a sudden, they're thrown into a world where *they may have to compete for recognition with their smaller competitors*. It's what the Internet had promised for ages -- the ability of the little business to compete with the large one, where incumbents have no inherent advantage. A lot of companies dislike this intensely, hence all the bogus lawsuites and claims of falsifying search results that Google has made.
Google has always claimed that they wouldn't muck with search result ordering because it would cause customers to move away from their then-inferior product. I think that they're true to that, but it doesn't matter -- if they aren't, eventually people will migrate to whatever better search engine pops up. The sort of folks at Google understand trends and systemwide numerical movements based on small factors -- I doubt they'd make an argument like this without it being reasonable.
Google has even put out a whitepaper describing how their search engine works.
So we have a free service that has lesser ads than almost any commercial website, has uncanny accuracy, does *not* (unlike rivals who openly sell them) sell page rankings, has a science/engineering culture (instead of a business one), and is fantastically successful.
Finally, Google is under no onus to do anything. They are not a meaningful monopoly. The entire point of a monopoly is that you can erect barriers to competition by using your clout. You can always easily go to another website, and Google even published a fair bit of the foundational technology in their engine. You can't really go much further than they did to be open, free, and competitive. The point is that they have a superior product, and they are unwilling to screw their customers over to gain short-term bucks.
Contrast this to Microsoft, where you have a vast array of monopolies, compatibility and technical information issues that are visciously used to guard their markets, secrecy, inferior products, and a willingness to gouge the customer and do everything possible to keep them in line. And yet, Microsoft gets a slap on the wrist. If that's acceptable, Google sure as hell is.
When I search for "Altavista" on Google, I get Altavista. When I get something else, *then* I'll start being suspicious.
Finally, you claim that Google returns poor search results. I disagree. I have found that Google consistently returns the most useful results of any search engine I've used, and does a fantastic job of shoving "junk" results well after the "useful" results.
-
Re:This sounds very much like...
Schoenburg created what was is called Serialism. The basic concept that Shoenburg thought up was to constrain the tones by making a strict rule: in his case you must use all 12 notes in some sort of sequence, but no notes may be used twice until the sequence has terminated. This is known more commonly as 12-Tone Music.
However, this idea can be extended to other aspects of music: tempo, rhythm, dynamics, etc. The idea of putting formulaic constraints like this is called Total Serialism. The guy in this article is just another total serialist.. using the platonic dice to constrain the music. The idea of constraining music with math is not new.. it's been around since the 1920's. -
But we know the page-ranking algorithm
-
Real Programmers Use FORTRANFor your reading pleasure, I posting an excerpt from the story "Real Programmers Don't Use PASCAL" which should give you a hint of what you may be in for:
The easiest way to tell a Real Programmer from the crowd is by the programming language he (or she) uses. Real Programmers use FORTRAN. Quiche Eaters use PASCAL. Nicklaus Wirth, the designer of PASCAL, gave a talk once at which he was asked "How do you pronounce your name?". He replied, "You can either call me by name, pronouncing it 'Veert', or call me by value, 'Worth'." One can tell immediately from this comment that Nicklaus Wirth is a Quiche Eater. The only parameter passing mechanism endorsed by Real Programmers is call-by-value-return, as implemented in the IBM\370 FORTRAN-G and H compilers. Real programmers don't need all these abstract concepts to get their jobs done -- they are perfectly happy with a keypunch, a FORTRAN IV compiler, and a beer.
- Real Programmers do List Processing in FORTRAN.
- Real Programmers do String Manipulation in FORTRAN.
- Real Programmers do Accounting (if they do it at all) in FORTRAN.
- Real Programmers do Artificial Intelligence programs in FORTRAN.
And my personal favorite:
A Real Programmer might or might not know his wife's name. He does, however, know the entire ASCII (or EBCDIC) code table.
The whole story can be found here.
Take the time to become a "Real Programmer." You'll be glad you did.
-
Versioning on the web
Although this problem is an especially serious one when it comes to journalism, it's a general problem with the WWW. Sometimes one wants to link to a specific version of a webpage or examine the changes that have made. One solution is to use RCS to keep track of page versions, and use a web server extension (such as an apache module) that allows access to the changelog and to past versions. I would love to see this implemented widely...
I hacked up a little perl script demonstrating the idea. Now each of my web pages can have a "this page contains version information" link to its changelog.
And then there's VMS which has versioning built into the filesystem... -
Re:How Google Makes Money
Half of Google's revenue comes from selling text-based ads
According to this Sept 2001 article, 2/3 of Google's revenue is from advertisements.
Google can't survive without ads, but it's ironic considering the founders Brin and Page once said "...we expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers...[A]dvertising causes enough mixed incentives that it is crucial to have a competitive search engine that is...in the academic realm." -
Price performanceThe crucial question is overall price performance. Google has an inverted index for all the web data, and they have really high throughput requirements. This means that bandwidth is crucial.
The average disk can sustain between 100-200 IOPs, while the average memory module can sustain about 10,000,000 IOPs (100ns latency). At $120/disk, this works out to $1.66/IOPs, and at $250/GB for memory, this works out to $0.00025/IOPs.
Google currently claims to index about 2G pages. If one assumes on average each page is 4KB, and that the inverted index takes half the space of the original text, then this means 4TB of index. 4TB of RAM at $250/GB is 4K memory modules for $1M. Assuming their motherboards can hold 2GB each, this means 2K machines at perhaps $120 each for another $250K. Now, those 4K memory modules on 2K motherboards can sustain something like 40G IOPs. $1M of disk is roughly 8K disks for 1.6M IOPs. In a real system, load is never evenly distributed so you are almost never able to approach the theoretical limit.
For more details on the (original) Google implementation, please see The anatomy of a large-scale hypertextual web search engine , by Sergey Brin and Larry Page.
From dim memory, to do a search, you need to:
- look up the word (or words) in a dictionary
- from the dictionary you get a pointer to the list of all word appearances. (Actually, Google keeps more than one list, and it only traverses as much of the list as it needs to.)
- lookup the document's page rank
- rank the hits
- lookup the document and generate the hit
Each of the lookups (dictionary, inverted index, page rank, document) is a random access (IOP). So, to make a long story short, memory is cheaper for Google because throughput (and latency) is critical to their business and their access patterns are generally random and the cost of enough memory to hold the index is less than the comparable cost of enough disk to support the IO rates they require.
Cheers,
Carl Staelin - look up the word (or words) in a dictionary
-
Perhaps the government...
... is worried that displaying the URLs will show how ineffective it has been on this?
The censorship laws were a joke when first proposed - a joke that could damage Australian content providers, but which could have little or no impact on Australian's access to illegal materials. At the recent ACIS 2001 conference, a paper was give (full text available as pdf) arguing that the whole thing was pointless as far as pornographic sites were concerned, as they were all offshore already (due, in part, to expansive hosting on Australian servers) and therefore outside of Australia's juristiction.
I can only think of two good reasons for not releasing this material - they fear that examination of the material will show that many of the sites should not have been blacklisted (as per peacefire's work), or that they fear it will show how ineffective the legislation is. :) -
Perhaps the government...
... is worried that displaying the URLs will show how ineffective it has been on this?
The censorship laws were a joke when first proposed - a joke that could damage Australian content providers, but which could have little or no impact on Australian's access to illegal materials. At the recent ACIS 2001 conference, a paper was give (full text available as pdf) arguing that the whole thing was pointless as far as pornographic sites were concerned, as they were all offshore already (due, in part, to expansive hosting on Australian servers) and therefore outside of Australia's juristiction.
I can only think of two good reasons for not releasing this material - they fear that examination of the material will show that many of the sites should not have been blacklisted (as per peacefire's work), or that they fear it will show how ineffective the legislation is. :) -
Here's a funny link
Here is a widely circulated piece about nostalgia and Fortran: Real Programmers Don't use Pascal.
-
Magnitude of ProblemHello,
I'm one of the authors of Sparkseek, a remotely-hosted search service. I'm also a student at Pennsylvania State University. I want to give you an idea of what kind of problems researchers in the field of internet text retrieval have to deal with.
Larry Page, one of the co-developers of the Google search engine said in his 1997 research paper entitled "The Anatomy of a Large-Scale Hypertextual Web Search Engine" that the primary benchmark for information retrieval, the Text Retrieval Conference, uses a fairly small, well controlled collection for their benchmarks. The largest benchmark they have available is only 20GB compared to the 147GB from Google's crawl of 24 million web pages. Today, Google has over 1.4 billion web pages in their database and a reported 4,000 node linux cluster.
One of the problems I have encountered and digress that I've found difficult to deal with is the shear amount of redundancy in web content. Anybody who has ever tried a search for any linux command has no doubt encountered hordes of duplicate MAN pages in their results.
Not only that, but I honestly don't believe that when it comes to search engines, more is better. I have noticed over the past 6 months, as google has made great increases in its index sizes, that results have consistently become worse and worse. Search engines really need to begin narrowing the focus of their index and creating multiple indexes. Educational institutions should be separated from commercial establishments.. if I'm performing research on some subject, the last thing I want is to arrive at a commercial establishment pitching some product.
Also, the method google utilizes when creating their indexes creates a huge scalability problem. Their indexes are updated less frequently that ever, and if you read their document that was published in '97, it's not hard to see why.
Michael Tanczos
-
some thoughts
They have the AdWord program where as advertiser you pay for display of your ad on search results for a selected keyword. However this is not the case with "Quake" (there is no ad).
Links with "RealNames" seem to be redirected too (try "amazon") however in a different way (and only for the first result that actually links to the holder of the TM).
Maybe they are just trying to extend their formula for PageRank: users will more likely follow links that look promising, leaving out the obviously misleading ones. Statistical information on followed links could then be incorporated into the PR. -
not a revolution
Surely not that revolutionary? Many people have their bookmarks file on their webspace (I do) and engines such as google rank pages by how many links they have. They even take into account the "quality" of the page that is doing the linking.
The PC World article specifically criticises google for being "unable to organize" the links very well, yet seems to use the same technique itself. What gives?
A paper about the inner workings of google is available is in HTML.
I went to look at the HotLinks link in the original article and it was all blank :-( oh I see it uses some form of scripting to redirect you ... whatever happened to 30x return codes? -
Re:Ranked by referring pagesExcellent description. I can only top that by providing links which go over the research underlying this stuff.
The classic algorithm of this type is called HITS, by J. Kleinberg.
IBM's 'Clever' is an enhancement to 'HITS'.
Part of the success of these is that they can be mapped on to well known matrix solving problems...theres enough information in the documents above for you to work out how to write one.
One wrinkle Restil doesnt mention is that the technique is not purely based around link structure. You _seed_ the process with content-ranked pages (hoping the process 'crawls' to the best set independently of the seed), and subsequently you may select the most relevant 'communities' of pages by content ranking. So if you are already in the top 100, say you may be able to content-mangle yourself up the list, but you need good linkages to get in first!
A further criteria used is response time (I strongly suspect Google use this, I got hooked on it when I found that its sites _responded_ rather than hanging as most AltaVista sites did at the time). Again theres publications on this stuff: the shark search algorithm is a spider with this feature.
-
Re:A matter of time
With enough experimenting someone can find out how the system works. Either through keywords,page text,bribing
.. etc whatever. People will find out how it works. Just a matter of time.No need to waste tons of time experimenting, Google is documented well enough in this paper (presented by the founders of the company at the 7th WWW conference) that someone could implement a look alike system. Of course, since most of the technology described in the paper is patented, actually implementing a system would be illegal.
daniel
-
Re:Democracy and Google explained.
Brief technical follow up: (described more in depth in the original paper of Brin and Page) the PageRank^tm system is pretty cool actually - it's smarter than just ranking each page based on the *number* of other external pages which link to it, the algorithm works effectively like this:
It can actually be described well as a recursive algorithm which takes each page, and ranks it based on the number of external pages which link to it. Then go through all of the pages again, and this time *weight* the links to a page based on their ranking in the previous run through (all the time normalizing based on the number of links *out* from the page that links to you).
Hmm... I am probably not describing this well, but as it's really a rather elegant mathematical relationship between the pages and the links, it's probably best described by the simple formula itself - the PageRank (R) of page P_j (which has O(P_j) links out from it [sorry - *not* O as in "Big O"], and N_j links to it (pages P_{a_i}, i running from 1...N_j) ) is:
R(P_j) = (1-d) + d ( Sum_{i=1->N_j} R(P_{a_i}) / O(P_{a_i}) )
(d is a damping factor i'm not sure why they included, between 0-1, they keep it from
.85-.9)So if a well linked to site links to you, and not many other places, then your site jumps in rank, as to all sites you rank to (but your relative "reputation" is divided evenly among the folks you link to - note the O_{a_i} in the denominator above). This makes it pretty resistant to porn sites making a bunch of pages all link to one main site to get it raised in rankings - if the secondary sites aren't linked to by anyone, their ranking is abyssmal, and won't help the main site much...
So I wouldn't quite call it "democracy"... maybe just popularity.
-
Re:a more technical article anywhere? -- Yes
Read the original Google paper. It includes some description of Google's architecture.
-
Java in Industry.
can anyone point me to a real-world application or website that actually uses Java? I mean properly, not just a tiny applet showing the time or something.
Off the top of my head, let me see Mail.com uses Java to serve its pages. Does Oracle's new Enterprise database count?
And from Sun's page of industry news, we have companies like RSA, Oracle, Netcom, SAAB, Delta Air etc. using Java in mission critical situations on a daily basis.
Posts like this make me wonder about who composes slashdot's readership. Because only script kiddies and so-called web developers (HTML and javascript kiddies) use Java as a web app language. Also no one in his right mind uses Java for GUI development if the application has any degree of complexity. But as a middleware development language it is practically untouchable. When it comes to speed of development, maintainability and expandability for business applications few things beat Java. Add a native GUI or web interface depending on your application and a rock solid app has been created.
PS: Myth dispel mode Oh yeah, by the way Java pages are faster or at the very least as fast as CGI, it has to do with being memory resident a la the VM as opposed to being read from disk. Here's a benchmark and a link or two.