kaggle.com · Domains · Slashdot Mirror

Re:This is fantastic by DidgetMaster · 2019-02-09 14:03 · Score: 1 · on Python Developer Survey Shows Data Analysis More Popular Than Web Development (jetbrains.com)

How easy is it to use Python and Pandas to do these kinds of analytics? I have been exploring the Chicago crime data set as an example (see: https://www.kaggle.com/boldy71...) and I am interested to know how much expertise and time does it take to do something like this. I am building a data analytics tool that will allow non-programmers the ability to do simple analysis of large data sets using a point-and-click interface. I use this crime data set to test things out, but I want to explore more in-depth analysis to see if it can help even more than it already does. A 4 minute video demonstrates our tool. https://www.youtube.com/watch?...

Re:pre-trained machine learning by invalid_user · 2018-03-07 21:30 · Score: 1 · on Next Big Windows Update Will Bring Hardware-Accelerated AI (zdnet.com)

C'mon. As much as I despise them, I'm sure there are script-kiddies who can code Keras or PyTorch at Microsoft.

https://www.kaggle.com/gaborfo...
https://github.com/Cadene/pret...

Kaggle by zifuhumexa · 2017-12-03 07:19 · Score: 1 · on Can Researchers Detect Irregular Heart Rhythms with the Apple Watch? (usatoday.com)

There is a dataset of recorded heartbeats at Kaggle free for everyone to download. Not that many results there, though. I wonder how much better the algorithms have gotten and if Apple can actually do something useful with their device.

Re:This is tailor made for public funding by ShanghaiBill · 2017-06-04 01:55 · Score: 1 · on When Sentencing Criminals, Should Judges Use Closed-Source Algorithms? (technologyreview.com)

Even better, just put the raw anonymized recidivism data on Kaggle and let everyone compete to come up with the best model.

The dataset appears to be missing by innocent_white_lamb · 2017-04-30 10:26 · Score: 2 · on Massive Tinder Photo Scrape Has Users Upset (techcrunch.com)

The article links this as being the dataset "consist[ing] of six downloadable zip files, with four containing around 10,000 profile photos each and two files with sample sets of around 500 images per gender."

https://www.kaggle.com/scolian...

Which gives a 404.

Re:Not one example? by ShanghaiBill · 2016-02-20 07:35 · Score: 4, Informative · on Tiny, Blurry Pictures Find the Limits of Computer Image Recognition (arstechnica.com)

Here is a page with some examples.

Here is a PDF of the paper, which has more examples.

I don't think it means much. Instead of showing that humans see better than computers, it really just shows that this one researcher is bad at programming computer vision systems. If he took his dataset, and made it a Kaggle Competition, I think someone would design a computer vision system that would do much better than his.

Kaggle by ZahrGnosis · 2015-12-28 07:22 · Score: 1 · on Ask Slashdot: How To Get Into Machine Learning?

The single most motivating thing for me, personally, was to find real problems to solve and real examples and help on how to solve them. Bonus points for variety and competition and even prizes.

Enter Kaggle -- data mining competitions with an absurd amount of examples, datasets, community posts, forums, curated examples. I really cannot emphasize how much I've learned in this community. Join and try one of the example competitions -- the Titanic one is popular, follow the getting started guides and go from there.

I'm sure there are many other ways, and it may not be for everyone, but this has really been a great resource for me.

Re:Great experience by michaelmath · 2015-08-27 18:35 · Score: 1 · on Google May Try To Recruit You For a Job Based On Your Search Queries

They asked me a bunch of graph theory / number theory problems, if you didn't know the algorithms and could implement them quickly you'd be SoL.

For me programming challenges were my way to get a foot in the door without a degree. I remember getting interviews from Google after their first Code Jam. And facebook after I solved a bunch of programming puzzles they released. But topcoder was my real salvation from my parent's basement.

Even though I have a career now, I find solving them lots of fun. Sometimes I come to the solutions when I'm drowsy before bed and have to get up and write it down before I forget!

Re:So misleading. by A_Lost_Frenchman · 2014-08-12 23:08 · Score: 1 · on New Watson-Style AI Called Viv Seeks To Be the First 'Global Brain'

This is so misleading. No program can do anything outside what it is explicitly programmed to do.

You are the misleading one.

Machine learning and Optimization are the science of getting programs to do things they are not explicitly programmed to do.

Evidence:

The Merk molecular activity challenge was won by data scientists who did not have themselves the capacity to perform the task.
http://blog.kaggle.com/2012/10...
As described on wikipedia: "Machine learning is a subfield of computer science (CS) and artificial intelligence (AI) that deals with the construction and study of systems that can learn from data, rather than follow only explicitly programmed instructions".
http://en.wikipedia.org/wiki/M...
Artificial evolution for instance is a special kind of Optimization algorithm.
http://en.wikipedia.org/wiki/E...

The whole point of machine learning is to program learning rules, not the explicit final program. The behaviour of the program is then determined by the data used to train it.

Re:Something else? by ZahrGnosis · 2014-05-07 01:57 · Score: 1 · on Ask Slashdot: Beginner To Intermediate Programming Projects?

While I agree with parent in the case you actually are interested in newt farming, I actually code mostly just for the fun of coding, and focus on the type of code rather than the end product. To give an alternate approach, then, depending on what type of code you like there's probably a hackathon or a set of "challenges" or some competition that can provide motivation if you just want random problems to solve. I'm mostly an algorithms guy, so I do a lot in Kaggle, and Project Euler. Project Euler for example has hundreds of problems that more or less increase in difficulty, making it relatively easy to find something that will increase your skill, and the Kaggle forums are full of code examples from past projects to help you get on your way.

If you're interested in graphics or UI programming these examples may be less help, but I'm sure there are similar things out there. The results of hackathons are great places to start because the code is generally written by competent programmers but they have no time to do clean up nor to build the spaghetti that years of updates often brings... bug fixes and hacks are common, so the code needs some TLC, but it typically has very few hands in it and so has some good consistency. iosDevCamp (from a quick google search), has links to github code for some of its results.

Re: Your (excellent) questions. by SigHolmes · 2013-11-30 06:18 · Score: 1 · on Ask Slashdot: DIY Computational Neuroscience?

My questions: (1) What are some interesting computational neuroscience simulation problems that an individual with a workstation class PC can work on? ** These come up more frequently than you might think. Even what you'd think of as a regular home or office PC can do a lot with 8-16 gigs of memory, let alone amounts beyond that. I'd suggest that you start looking at http://www.kaggle.com/ as a place to start. Also, start looking at the discussion groups that you can find on (I hate it, but use it) LinkedIn. I prefer the discussion groups that you can get at the American Statistical Association, and even the listserve discussion groups for various statistical software packages (e.g., R, Stata, SAS). (2) Is it easy for a non-academic to get the required data? ** It depends on the problem being examined, and who "owns" the data. For Kaggle competitions, the data is given to you. For other projects, a lot of data is becoming "open sourced" so that people can get to it publicly. So, that's a qualified yes for some things, and a no for others. (3) I am familiar with (but not used extensively) simulators like Neuron, Genesis etc. Other than these and Matlab, what other software should I get? ** I tend to lean on Stata and R. Will be moving over to R after finishing current research project. It depends on the areas you want to examine. If you're willing to deal with the "learning curve" for R, I'd go with that. It's free and has a fantastic community. (4) Where online or offline, can I network with other DIY Computational Neuroscience enthusiasts? ** I hate LinkedIn, but I use it in my own field. You might try that, as well as G+ initially. I'd also be looking at the American Statistical Association and related professional groups. The listserves for various statistical software packages are good, but they get nasty about off topic posts (tangential to the use of the software) ** I think that the related StackOverflow forums would be very good. I've had good results with them.

Competitions, trading by gregor-e · 2013-02-12 07:06 · Score: 2 · on Ask Slashdot: Making Side-Money As a Programmer?

You could try your hand at various programming competitions such as those offered on TopCoder or Kaggle. Some of the prizes in these competitions amount to serious dough.

Alternatively, you could try algorithmic trading. Several online brokerages offer an API, such as Interactive Brokers and TradeStation.

Automatic creation of features by michaelmalak · 2012-11-24 17:06 · Score: 4, Insightful · on A.I. Advances Through Deep Learning

I wonder how much of these improvements in accuracy are due to fundamental advances

I was wondering the same thing, and just now found this interview on Google. Perhaps someone can fill in the details.

But basically, machine learning is at its heart hill-climbing on a multi-dimensional landscape, with various tricks thrown in to avoid local maxima. Usually, humans detemine the dimensions to search on -- these are called the "features". Well, philosophically, everything is ultimately created by humans because humans built the computers, but the holy grail is to minimize human invovlement -- "unsupervised learning". According to the interview, this one particular team (the one mentioned at the end of the Slashdot summary) actually rode the bicycle with no hands and to demonstrate how strong their neural network was at determining its own features, did not guide it, even though it meant their also-excellent conventional machine learning at the end of the process would be handicapped.

The last time I looked at neural networks was circa 1990, so perhaps someone writing to an audience more technically literate than the New York Times general audience could fill in the details for us on how a neural network can create features.

Re:Not a problem with resolution by tgd · 2012-10-23 06:58 · Score: 1 · on Microsoft Prepares To Push Kinect Everywhere Windows Is

The Kinect Gesture challenge over at Kaggle was a competition where the goal was to match gestures with a specified dictionary of previously-recorded gestures.

The problem isn't the resolution, it's the recognition algorithm.

Its a little bit of both, actually. The problem isn't resolution, from a hardware standpoint -- its the point density on the IR projector and the lens on the IR camera that limits how close you can be to a Kinect and still have any accuracy. Once your depth cues go wonky, gesture recognition becomes much harder.

Gesture recognition, while not trivial, is not intrinsically more complicated than whole body tracking. The way Kinect does it is very clever, knowing basically "where can the body have moved from where it last was" which makes the matching process very efficient computationally. Gestures are the same thing. Your joints can only each move one of a limited set of ways from where it was. Just like handwriting recognition is dramatically easier for computers when they can see the order of strokes, the same is true of gestures.

Not a problem with resolution by Okian+Warrior · 2012-10-23 06:46 · Score: 1 · on Microsoft Prepares To Push Kinect Everywhere Windows Is

The Kinect Gesture challenge over at Kaggle was a competition where the goal was to match gestures with a specified dictionary of previously-recorded gestures.

The problem isn't the resolution, it's the recognition algorithm.

A human looking at the videos could easily distinguish between gestures and interpret the meaning. The problem was even easier for a human because you only had to choose the closest match from within the dozen-or-so gestures in the dictionary. This leads me to believe that it's not a problem with the resolution, or the hardware in general.

Despite this, gesture recognition is a very difficult problem. Aspects which humans would naturally interpret as similar can be wildly different for the computer. Hold your hand up and wave - if the hand is in a different position (relative to the torso), the angle of waving is different, the body is waving back and forth instead of still, the number of waves is different, the time cadence of the waving is different... all of these confuse the heck out of a match algorithm.

(One video had curtains in the background, apparently waving ever so slightly in the breeze - causing lots of motion for the camera. Another video (color channel) contained an intricate flower pattern, which was very complex to match against.)

Finger position and motion have limited resolution (they form only a small part of the input field), but a human could still interpret various ASL hand signs to a large extent. Perhaps very similar hand signs would be difficult to discriminate, but certainly many of the ones shown were recognizable.

This is pretty-much an aspect of hard AI. We're not that close to solving this problem, and breakthroughs are not expected any time soon.

Kaggle is unverifiable by Okian+Warrior · 2012-09-12 08:33 · Score: 5, Informative · on Turning Data Science Into a Spectator 'Sport'

I've entered a couple of Kaggle competitions, but I'm 'kinda put off by the opaque results.

After the first one ended (predict HIV progression), the released full dataset indicated that the data had been sorted before it was separated into train and test sets. IOW, after being sorted by length, all the short sequences were put into the training set, and the longer ones into the test set. This mistake may have invalidated the competition, and I strongly suspect it would have invalidated any paper written about the results.

More recently, the organizers of one competition stated flatly in the forums that they would release the entire data set once the competition had ended, but then didn't. I inquired about this, and a Kaggle data scientist replied saying "we almost never release the test data".

I'm not sure that Kaggle is all that scientific. If the full dataset can't be examined after the competitions close, there's no way to verify the results.