Domain: waikato.ac.nz
Stories and comments across the archive that link to waikato.ac.nz.
Comments · 54
-
Give WEKA a try
WEKA is Open Source, has an adequate GUI, many different kinds of algorithms available, and a "knowledge flow" visual designer for you to chain it all together. I've used it in a few personal and professional projects to find things like which variables most strongly influence an outcome, decision trees, derived formulas and expressions that accurately predict outputs from inputs, and various kinds of data visualizations for clustering data samples. Code is in Java so I presume you could embed it within a system to automatically perform analysis and swap algorithms on the fly. Best of all, since this is software under your control, and not a Corporate-offered service, your valuable data never leaves your control.
I think WEKA already did a lot to make these kinds of data analysis accessible as Microsoft is aiming to do. No matter who provides it to you, there is something totally awesome about being able to click a few buttons and get some interesting results to munch on. -
Re:FANN Neural Net
I did some testing using classifiers in WEKA package but was quite disappointed on the results. My next attempt was to leverage PNN (Probabilistic Neural Network) and got somewhat better results. In the test runs with noisy audio files with Morse code I got up to 90% accuracy in classifying dits and dahs. I have not used FANN package a lot though I installed it on my development machine 1-2 years ago. What are your thought about FANN exactly? How would you go about using the package?
-
Not particularly new?
Reading one of the articles ( http://researchcommons.waikato.ac.nz/bitstream/handle/10289/3622/Hogg%20Intcal09%20and%20Marine09.pdf ) seems to make it clear that the Lake Suigetsu project is a player, but only one of many, in the project to develop a better INTCAL chronology. It may be obvious to some, but any single dataset is not particularly useful until it is corroborated with many others. The Suigetsu project has been at work for several years and, although there has been some revision made to their baseline data, it hardly seems like headline news.
OTOH, it's always great to hear what scientists have been up to, regardless of the field.
-
R or WEKA ... Wait, What Exactly Are You Doing?
R is my personal favorite but you're going to have to get down and dirty with some high level programming (scripting). Check out the data import package (you would probably export your spreadsheets to flat txt files and import although the functionality is ever increasing). There's no user interface in this suggestion
... what there is, however, is a massive collection of packages for statistical analysis. Very well maintained, constantly updated and ever expanding.
The other suggestion has a better GUI but is really heavyweight. WEKA has helped me time and time again perform advanced statistical calculations on data sets and it's in Java so runs on just about anything. Their interface occasionally improves too, they now have an explorer that I use to prep data and remove outliers/null data (don't worry, this isn't climate data). It's well documented.
These (probably) require an intermediate data transformation step but are open source and extensively supported. Any examples of what you wanted to do? Simple stuff like standard deviation or complex stuff like principle component analysis (PCA)? I guess if it was just simple stuff, that'd be built into Excel, right? Maybe your problems are simple enough to just need a good macro writer to tackle? Whatever happens, good luck! -
R or WEKA ... Wait, What Exactly Are You Doing?
R is my personal favorite but you're going to have to get down and dirty with some high level programming (scripting). Check out the data import package (you would probably export your spreadsheets to flat txt files and import although the functionality is ever increasing). There's no user interface in this suggestion
... what there is, however, is a massive collection of packages for statistical analysis. Very well maintained, constantly updated and ever expanding.
The other suggestion has a better GUI but is really heavyweight. WEKA has helped me time and time again perform advanced statistical calculations on data sets and it's in Java so runs on just about anything. Their interface occasionally improves too, they now have an explorer that I use to prep data and remove outliers/null data (don't worry, this isn't climate data). It's well documented.
These (probably) require an intermediate data transformation step but are open source and extensively supported. Any examples of what you wanted to do? Simple stuff like standard deviation or complex stuff like principle component analysis (PCA)? I guess if it was just simple stuff, that'd be built into Excel, right? Maybe your problems are simple enough to just need a good macro writer to tackle? Whatever happens, good luck! -
R or WEKA ... Wait, What Exactly Are You Doing?
R is my personal favorite but you're going to have to get down and dirty with some high level programming (scripting). Check out the data import package (you would probably export your spreadsheets to flat txt files and import although the functionality is ever increasing). There's no user interface in this suggestion
... what there is, however, is a massive collection of packages for statistical analysis. Very well maintained, constantly updated and ever expanding.
The other suggestion has a better GUI but is really heavyweight. WEKA has helped me time and time again perform advanced statistical calculations on data sets and it's in Java so runs on just about anything. Their interface occasionally improves too, they now have an explorer that I use to prep data and remove outliers/null data (don't worry, this isn't climate data). It's well documented.
These (probably) require an intermediate data transformation step but are open source and extensively supported. Any examples of what you wanted to do? Simple stuff like standard deviation or complex stuff like principle component analysis (PCA)? I guess if it was just simple stuff, that'd be built into Excel, right? Maybe your problems are simple enough to just need a good macro writer to tackle? Whatever happens, good luck! -
Want to be a mathematics and Open Source hero?
Write a new PCB autorouter for KiCAD! If that is not appealing you could integrate the Topological autorouter with KiCAD.
http://anthonix.resnet.scms.waikato.ac.nz/toporouter/
I really like KiCAD and use it often. However the built in autorouter needs some serious rework and someone with a mathematical background to fix the PCB autorouting. It is a very complicated problem.
Thanks!
Andrew Lynch
PS, yes, I am aware of and use FreeRouting.net. They are great but are a closed source proprietary tool.
-
Weka
If you just want to experiment with some machine learning/pattern recognition stuff without too much programming, give Weka a try. It is a suite of open source machine learning algorithms packed in a pretty usable interface.
-
Re:Jeff Albertson
I saw that too.
As You mentioned "data mining" I had a reflex to replay with link to WEKA, state-of-the-art datamining and statistics tool, written in java.
http://www.cs.waikato.ac.nz/ml/weka/
That may not be what You want for small shop, but it may be what You need. -
Re:Dating serviceHow does the stereotypical broad, flattish nose of Africans give them an advantage for their environment? Damn interesting, I'd say. There were a study about nose shapes and sizes and climate.
In a humid, warm, jungle environment, the air is pretty much perfect for your warm, humid lungs, so you don't want too much nose structure constricting your oxygen/carbon dioxide exchange. A short nose with wide nostrils helps air exchange. If you look at our great ape relatives who live in the jungle, such as the gorilla or chimp, they have almost no 'nose' at all -- just flat nostrils stuck on their face.
Our basic anatomical structure probably evolved on the plains of Africa, which get pretty dry. That's probably where our tubular nose developed. This site says that Homo erectus was the first hominid to have a protruding nose. His fossils are found in Asia and Africa. A long, thin nose humidifies the air more before it goes into the lungs. People in dry climates and cold climates ( cold air is dry ) have longer, thinner noses. This is why you see long, thin noses in desert climates, and in cold, arctic regions. Small nostrils reduce the surface area exposed to the cold, so you don't lose as much heat to the outside air.
Of course, there isn't a perfect correlation. And human beings really like to mate with whomever they can find, so you find a broad spectrum of features in almost any environment. -
Re:Of course.
BrainMaker (respected neural network software): basic version, $195. Professional, $795. No free trial.
Have you tried the free and open source Weka ? -
Shameless Weka PlugSo I can already anticipate people being concerned about their identities being tracked through clicks online.
You don't have to worry about this, however, as it is easy to distinguish two different users but probably difficult to pick you out of a crowd. Furthermore, if they're tracking your clicks, they probably already know your IP address. The number of sessions probably raises to a problematic number if you are trying to identify one user out of one thousand. Therefore, this will only be useful in identifying different behavior between two users -- or specifically identifying when it is highly likely that someone who is logged in is significantly different from the click profile associated with that account (as the article states).
There's a lot of discussion about this in the paper. Mentioning that the priors are set at 50% for 2 users but at 1% for 100 users (obviously). And also that:In an experiment involving 42 user profiles, Monrose and Rubin (1997) shows that depending on the classifier used, between 80 to 90 percent of users can be automatically recognized using features such as the latency between keystrokes and the length of time different keys are pressed.
They go on to say that the method they suggest for detecting a fradulent user "do not require that users have truly unique profiles."
I read a bit of the paper and I identified Weka's decision tree method being used to classify the users (if you've ever used the ID3 algorithm or its brethren C4.5 in classification, imagine exploring methods of developing different decision trees).
Indeed the paper states:We chose weka's J4.8 as the classifier since classification trees in general have been shown to be highly accurate classifiers.
I'll take this opportunity to recommend two open source projects. Torpark for those of you concerned about your identity and also Weka -- the easy to use collection of data mining software in Java! Also something to note is that Weka has recently become part of Pentaho, a project of open source business intelligence products. Explore the valuable tools that are out there and enjoy! -
I Hope They Don't Know About Weka!Damn, I hope they don't abuse the hell out of the Weka Project, that's one slick open source engine I've used time and again. It'd be a crying shame to see it put to use of ill repute!
The researchers predict that this will be extremely hard to detect, but they do offer a few suggestions for combating it.
Like what? Capital punishment for spammers? -
GPL data mining software
I have played around with WEKA a bit. I am interested in data classification and WEKA has many classifiers you can try out on your sample datasets.
-
Yawn
I could do the same thing with any of several open-source data-mining packages (see Weka, http://www.cs.waikato.ac.nz/ml/weka/) and a spare weekend. There's simply so much data available in the form of Billboard charts extending over 65 years that it would be trivial to define a bunch of parameters and run any of several well-defined models to generate a simple binary classification.
-
Re:I'm a pastafarian
You ask what the difference is between scientific law and scientific theory:
http://sci.waikato.ac.nz/evolution/Theories.shtml
Couple related examples explained by me...
Law of Gravity:
It is a fact that if I jump off a building there will be gravitational force on me and I'll fall unless there are enough opposite forces exerted on my body to prevent that (regardless, gravity still effects me). This has been observed, made into a theory/hypothesis, tested, and proven. We can measure it (dependent on the mass of the bodies being attracted to each other).
Evolution:
Evolution is a theory about how known things (like micro evolution) combine to explain some other thing (macro evolution). I don't go so far as to even consider "macro evoltion is how we came to be" a theory, but more a hypothesis as it is a statement about the past. The past (unless/until we can time travel) cannot be observed, tested, and proven in the same way that we can observe other complex systems (like birds common to this age) work. -
Re:Quadruple independent redundancy.
Is it worth paying twice as much to add another "nine" to the uptime?
The stock exchange can place the blame on nobody but themselves. They have NO redundancy. A single connection provider, through a single firewall. They're not even peering at one of the peering points, unlike the National Library who were able to get a connection through Telstra Clear (with whom they had no previous relationship) up and running long before the fault was resolved. See this NZNOG post for details.
When it keeps your stock exchange running? Sure. -
In New Zealand...not that you'd care...
Well, in 2004, when I graduated at Waikato University http://www.cs.waikato.ac.nz/ (which do offer hardware majors) the amount of people that graduated with a hardware major per year could be counted on one hand. They also had no trouble getting jobs (and getting good pay - humbug!)
Software majors (like myself) experienced the opposite; plenty of graduates, but trouble finding a job.
There seems to have been a shift over the last few years from people choosing hardware majors to people choosing software majors (or, god forbid, 'information technology' - ugh)
I don't know why though...
I know of a number of IT companies here in Auckland that are currently looking for hardware people, and just cannot get them. We might have to import a couple dozen :) Eastern Europe and Russia are quite popular for this at the moment... -
Larry Page Should Seed the K-PrizeSince Larry Page is on the X-Prize Board of Trustees, and since Google is pushing the envelope of what is needed to index and compress the entire content of the Internet, Page should consider providing seed funds and then matching funds for any donations to a compression prize with the following criterion:
Let anyone submit a program that produces, with no inputs, one of the major natural language corpuses as output.
S = size of uncompressed corpus
P = size of program outputting the uncompressed corpus
R = S/P ... or the Kolmogorov-like compression ratio.Previous record ratio: R0
New record ratio: R1=R0+X
Fund contains: $Z at noon GMT on day of new record
Winner receives: $Z * (X/(R0+X))Compression program and decompression program are made open source.
If Larry has any questions about the wisdom of this prize he should talk to Craig Nevill-Manning.
If, in the unlikely event, Craig Nevill-Manning has any questions about the wisdom of this prize, he should talk to Matthew Mahoney, author of "Text Compression as a Test for Artificial Intelligence"
"The Turing test for artificial intelligence is widely accepted, but is subjective, qualitative, non-repeatable, and difficult to implement. An alternative test without these drawbacks is to insert a machine's language model into a predictive encoder and compress a corpus of natural language text. A ratio of 1.3 bits per character or less indicates that the machine has AI."
This "K-Prize" will bootstrap AI.
OK, so he can christen it the "Page K-Prize" if he wants.
-
From Cringely's 17 March Column
On the same topic this week, Cringely speculates...
"there are other dirty tricks available to broadband ISPs. Telecom New Zealand, for example, is reportedly planning to alter TCP packet interleaving to discourage VoIP. By bunching all voice packets in the first half of each second, half a second of dead air would be added to every conversation, changing latency in a way that would drive grandmothers everywhere back to their old phone companies. This is because phone conversations happen effectively in real time and so are very sensitive to problems of latency. Where one-way video and audio can use buffering to overcome almost any interleaving issue, it is a deal-breaker for voice."
This has certainly pissed off a few Kiwis, as seen on the NZNOG list: http://list.waikato.ac.nz/pipermail/nznog/2005-Mar ch/thread.html/ -
Re:Axis of Evil
I'm more worried about what is going to happen to the guys who wrote those tools http://www.cs.waikato.ac.nz/ml/
-
PIvot tables? YawnFrom what I can tell, these "pivot tables" are just a primitive form of concept analysis (data mining).
Anybody reading this article who actually thinks these pivot tables sound "powerful" should look into some of the real row-based data mining tools out there. For starters I suggest looking at Weka and Orange.
Weka in particular is extremely easy to use and you don't have to be a researcher to figure it out.
-
Re:I have a few questions about WinFS
The article also states "WinFS uses a direct acyclic graph of items (DAG)."
Here in Australia (and NZ, I think), DAG is the um.... technical term for the lup of flyblown crap that hangs off the end of a sheep's arse.
The Acronym seems appropriate, somehow. -
Re:Sourcecode?
you could start here. They have a very nice dataminig software. They implemented bayesian networks, support vector classifiers, J48 trees and many more!
-
Re:DjVu
For scanned documents, tic98 compresses even better than DjVu. It's free software and you can even read the author's PhD thesis about it.
-
Re:The real invventors of the airplane.
besides,
the ppl who belvie that NZ claim,
didnt watch the closeing statments
of the documentry, it was a hoax..
Film of Pearse 'doctored'
PETER JACKSON
DOUBLE FEATURE
but you guys would of course rember this
as it was mentioned in a /. discussion on
Peter Jackson. -
Weka
There are already several excellent open source machine learning toolkits available. The one I have the most experience with is Weka, a Java-based system. In addition to providing an API, it has both command line and GUI tools.
With that and a decent ML book, I imagine most programmers could get up to speed rather quickly. -
CityLink's webcams: Anycast
It's interesting to note (on a site supposedly containing "News for Nerds", anyway) the preparations CityLink have made to avoid running up a massive internet traffic bill with traffic to their webcams: they're using anycast to distribute content from the server nearest the viewer (thus reducing load on NZ's less than amazingly cheap undersea cables).
They've got servers in Wellington and Auckland, plugged into both of the (CityLink-run) peering exchanges there, and they've got a box in the US advertising the same prefix (202.7.4.0/24) - so if you're in NZ, and your ISP isn't crap, you'll get the local servers: everyone else will get the box(es?) in the US.
There are more details here.
Anycast is also used for other stuff: the F (IIRC) DNS root server is anycasted for redundancy, and one of the IPv4 to IPv6 transition mechanisms uses anycast to locate a nearby hop-on point to the IPv6 network. -
Re:PanicBig difference between documentaries like Forgotten Silver or the one you mentioned and a live broadcast news "hoax" like War of the Worlds.
The latter is simply not possible any more without some extreme cooperation between news networks -- who would believe the news story on Channel 1 when Channel 2 is running it's regularly scheduled programming. You might wonder for 5 minutes what is going on, but you'd pretty quickly "click" to the joke.
A mockumentary (or hoax documentary) is easier because we don't expect other channels to cover it.
Of course, it would be great if the networks could all come togethor to play one big joke like WotW on the viewers. I'd love to see thatg just to find out what the general public's reaction would be, infact, just to find out what my reaction would be. But it won't happen because it would just cause panic and other bad things to happen. There's no fun when people die.
Of course, that doesn't mean you can't imagine what you would do if you turned on the TV in an hour to find that a fleet of UFO's has positioned itself over the capital of every country.
It might happen one day, it just might happen.
-
Re:I guess that explains my firewall activity
I'm seeing a tonne of icmp ping request traffic also. Can anyone confirm this is related to the new worm?
http://list.waikato.ac.nz/pipermail/nznog/2003-Aug ust/006762.html
-
Amusing as fuck..
About 15 years ago when I had only recently been thrown out of university, my flatmate found a largish number (40-50) of unsold textbooks in a dumpster behind the on-campus bookshop. He managed to sell at least 30 through the university's own second-hand bookstore before they became aware of the situation, and I think he sold a few more via noticeboard adverts.
More recently, me and a friend found well over 200 Windows manuals with licences for windows98 in a paper-recycling bin, and sold most of them through an auction site with no hassles whatsoever.
So what exactly is the difference between dumpster-diving an unsold sewing pattern, magazine or software licence, and dumpster-diving a slightly damaged but easily repaired "probably returned-under-warrantee" monitor?
Or is that 'piracy' too? -
Scanned pages
This story is a good opportunity to plug some free software you could use to help digitize books.
Stuart Inglis's tic98 is a lossless compressor designed for black-and-white scanned documents. It achieves better compression ratios than anything else, or at least it did a couple of years ago. If you have scanned documents to make available online, it's fairly simple to write a CGI script to convert tic98 on the fly to PDF.
Hopefully someone else will reply to this comment with a recommendation of good free OCR software. -
Here's research about it
The WAND Research group did a lot of research about this several years ago, when NZ's bandwidth was a piece of string and people were investigating using satellite for most of NZ's traffic. Their publications are available on their website. You probably want to look at all the ones that mention a high bandwidth delay product. basically issues you have are not having a large enough tcp window size, and the latency on connection setup/tear down. The tcp window size can be easily tuned on most OS's (including windows), the latency on connection setup issue can be resolved by using proxies at both ends that forward from one to the other and keep their connections open.
-
Block Based Network File Systems
One of the lecturers at the local uni did some research into how to have multiple machines interact with a disk over a network without stepping on anyones toes. A Block-Based Network File System
-
Re: Photo of disk platter...... I took a photo of this when I visited California:
http://www.cs.waikato.ac.nz/~jrm21/images/platter
- lowres.jpgI put a US one dollar bill on the display case for size comparison
;) There is also a clipping from a newspaper of the time saying how Stanford was suing over warranty issues (such has high unavailability) but it doesn't say what the outcome was... -
Forgotten Silver - Link
Waikato University's media school has some good resources regarding the Forgotten Silver documentary.
-
Re:This is news to me.The biggest fault in it is a hole in the typesystem you can drive a bus through. They try to patch it up with global dataflow analysis, but that hack only half works, and makes seperate compilation a PITA.
(basically, arguments to functions are covariantly typed, when they should be contravariantly typed. This means that the arguments of a method in a subclass may be further specialized than the arguments of the method overridden in the superclass. This means that at runtime it can throw error. Instead the arguments a subclass method can take should only be allowed to be generalized.)
Sather is very similar to Eiffel, and gets this right. But there is only one compiler (though a couple of variants), and it hasn't been updated in a while.
http://www.cs.waikato.ac.nz/sather/
http://www.gnu.org/software/sather/
http://www.icsi.berkeley.edu/~sather/ -
Re:Myth: Viral nature of the GPLYoure entire argument is based upon a technicality and does not hold true for real-world users. While code compiled using GCC does not include any derivative code from the GCC compiler itself, it will almost certainly include code from one of the many libraries which are distributed with GCC.
Any software project larger than "Hello World!" is going to be dependant upon libraries which are liscensed under the GPL. While the LGPL was created to address some of these issues, not every library is liscensed under the LGPL, so the problem still exists. This includes the very prevalant C++ library which includes the cin and cout operators which I'm sure every programmer has used at some point in his or her career.
Please see here, here, or here for documentation supporting my claims. The last link goes to an article written by Stallman himself addressing this problem, so please don't try and tell my that my information is based on Microsoft FUD.
-
Re:Safari is your friend
Update: tic98, the tightest (lossless) compressor for scanned documents, is back online.
-
Weka- statistics and machine learning
I use an open-source data analysis package called Weka.
It was developed in Java, and it's quite easy to modify and extend as you see fit. Solid documentation available on the website. Excellent CLI, decent GUI, decent graphics. Really useful for doing basic statistical analysis and using some of the more interesting machine learning techniques.
-
Re:ED-209http://phys.waikato.ac.nz/research/microelectroni
c s.shtmlthe publications link has a list of Master Theses which seem to be part of the project - e.g. "Laser Range Finding for an Autonomouss Mobile Security Device"
-
Re:Dale Carnegie?OK, I've found a few non-informative links, which at least indicate that the story is not a hoax: Here is a link at the University in question, indicating that work really is being done on a "robocop", and here is the home page of Dr. Dale A. Carnegie, the person behind the project, who however (unfortunately) does not seem to have a list of publications so that one could find out a bit more about his robocop...
Enough kiwi hunting for today. Maybe some other slashdotter can find more info, I'm giving up.
Ron Obvious
-
Re:Dale Carnegie?OK, I've found a few non-informative links, which at least indicate that the story is not a hoax: Here is a link at the University in question, indicating that work really is being done on a "robocop", and here is the home page of Dr. Dale A. Carnegie, the person behind the project, who however (unfortunately) does not seem to have a list of publications so that one could find out a bit more about his robocop...
Enough kiwi hunting for today. Maybe some other slashdotter can find more info, I'm giving up.
Ron Obvious
-
Dale Carnegie?Anyone else find it interesting that the machine's creator is (allegedly) named Dale Carnegie?
As in, how to win friends and influence people? At 35 km/h?
Another question: Anyone got a link to the people doing this research to find out if this thing is more than a toy? I mean, we had an automated post-cart at the company I used to work at in the 80's... Is this really much more? I've found the URL for the Univ. where this supposedly was developed: http://www.waikato.ac.nz/, but, although they have a search page, I've yet to find any mention of their "mechatronic lab" or this project --- but then, I have only begun to search...
Ron Obvious
-
Re:-1 redundant
DjVu is for scanned documents. There is no major accepted file format in use for this kind of data. This will be a huge market once bureaucracies around the world start digitizing their tons of documents. OTOH, DjVu is there for quite a while already and I don't see it having succeeded. Plus, when I installed the plugin under IE 5 a year ago, it was in some dubious beta state. Not nice to work with.
Lossy / lossless image compression types. You cannot compare PNG tolossy schemes. PNG cannot beat a lossy method because the goals are different. Lossless: Compress as small as possible (but the exact original must be restoreable). Typically, the algorithms that throw more resources (CPU and memory) at it are better. Lossy: For a given file size, reach the best quality. You can easily beat PNG with a lossy scheme by simply choosing very bad quality.
Open source. There are a few programs out there. Try TIC. It's GPL'd and beats JBIG-1 by about 40 percent on scanned images, according to the website.
Resources: Image Compression Resources, The Data Compression Library. -
Re:Wot?Waikato, New Zealand.
The course (bar two papers, I believe) could be done soley by tests. Of course, the tests weren't just ones regular students would do - they covered much more.
But with your reputation, Sir, perhaps they'd consider international study.
-
Re:USes for a dish
1. A stir fry dish
Of course, you could go the other way. Some friends of mine turned their unused wok into a satellite dish on their roof. It was only for a laugh because Waikato Uni (which was next door) had put up another huge satellite dish for internet pointing almost directly at them. Presumably there was some geo-synchronous satellite just above the horizon.
-
Re:The best BOfH site I've found yet...
Here is the definitive Bastard stuff, his own web site:
http://prime-mover.cc.waikato.ac.n z/Bastard.html -
WEKA?WEKA? Various machine learning/data mining tools. Java. GPL.
And you'll keep getting assorted answers in various directions unless you're more specific...
-
OCR position, that was me! :-)
Hi all,
The department of Computer Science at Waikato University in New Zealand is offering the position. It's a great place to live, they pay well and you get to write GPL'd software. They even pay to fly you there!
If you want more information write to Professor Ian Witten and ask about the GNU OCR job! It's a University so they're not after people with 10's of years experience, they're after hackers that can get stuff done :-)