So, does that mean if they went out and got the fastest Xeon processor (they used the fastest gpu - excluding the C2070), wrote parallel and used the Intel Compiler (writing to it) the speedup over the cpu is less than zero?
After having just looked at their code, also remove the cpu stl shit (actually any template since they don't vectorise). If you are writing speedy code for the gpu, to perform an adequate test for the cpu you also have to write appropriately.
hahahahaha
This gets better and better... They only timed the compute time. Cudamalloc was not part of the timing or cudamemcpy.
Sorry, I only count 'time to solution'. That is all i'm interested in. I thought is was strange that a less than O(n**2) was faster on the gpu.
It is like writing benchmarks that ignore disk IO, ignore reading from system memory, L3 cache, etc. Only time stuff that is in the registers.
I have a system at home that's dual boot... I like to game occasionally. But don't make me choose. My professional career depends on Linux and I need it for productivity.
I realise that I could easily do a search to work out what is going wrong and fix it, but, really I couldn't be fucked spending time on it. I (like many others) could spend that time doing other things.
This is in Microsoft's best interest to make sure that applications that run on their OS fix this.
85 Thousand rows and 25 columns of floats is not what I consider a large dataset. Also, due to the scaling of your y-axis on those graphs, it is not immediately apparent that even the PyTable doesn't scale. What happens when you scale up by a couple of orders of magnitude? I'm not trying to be flamey, but how does it work when you have 85x10^6 or 85x10^9 rows?
One dataset I have is a timeseries of thousands of square kilometres of points where there is a density of around 400 points per square metre. Each point has 12 dimensions. Honestly, a flat file structure with well named directories and filenames, along with meta data stored in 2 file (txt and XML - I know, I don't like XML either, but it is not a big file - couple of KBs each) in each directory works best for us. Files are currently compressed with lzo (parallel decompression, buffered and sequential - seems to work ok for now).
We can search the repo quickly for a desired ROI using any programming language we want. There are no such problems such as version changes.
The last thing I wanted to remember about my mum when she fought a long battle with cancer was her final days. She finally died when I was 18.
Things are not pretty in those final years. Pale, tired, sick, moody but mostly high on drugs. And that is what you are leaving as a final memory to your kids. She was not that woman.
Go through existing photos of when she was a kid and make sure the photo albums (physical or not) are well documented and chronologically ordered. What was happening at the time, who was she with, how good of time was she having and how happy was she.
Go through photos of when the two of you met and dated. Document that. The happy times and the not so happy times. The two of you should go through each photo and describe the event.
I, for some reason, have only three photos of my mum. Two when she was sick (not so fun to look at) and one just before she met my dad. I would love to know where she was, what she was doing (was it during uni on break?), etc.
It is not really that big. You should be able to process that stuff on commodity hardware easily enough. I imagine the big bottleneck will be with the IO. Pretty easy to stripe some disks.
Quad core with well written code would buzz that tower fairly easily. Only talking about a few gigs of compressed data here.
I would use directory structures with an info file in each to store this information. That would be my database.
I'm not arguing with anything you state because, frankly, I don't know better. I don't do java much any more. One thing that did grab me with your post was
1000 cpu cycle is nothing in a java program.
How many programmers require that mindset before, as you stack layer upon layer upon layer, it all becomes slow as buggery? Maybe that is exactly the reason why I don't look to java any more?
America says one thing, but does others. Quite honestly, the leadership there sees themselves in a cold war with the east, and are trying to take advantage of the east's not wanting to be in one.
The problem is that America has a VERY active space weapons program and will not give it up. If you look closely at what they are working on, it should be obvious that it is not about defense, but about an offense. They are
1. working on a ground based laser designed to take out eastern sats to try and stop GPS and communications.
2. Working on interceptors designed to take out incoming missiles.
3. Building nuke-powered Boomers/attack sub at a rate of 1-2 EACH [actually these are already built in abundance].
4. Getting ready to launch multiple space stations. The first one will allow civilians on-board, but the second on, are expected to be military only. There is ZERO need for a military to have a manned space station, EXCEPT as a way of hiding weapons as a prelude to an attack.
Dude, no need to get all hot about it. Just thought I'd put a little perspective on it for you. I'm sure that Americans believe that their motives are innocent but I'm sure that the chinese think exactly the same. Like I said, same old shit for the rest of us.
What? And the US hasn't been giving the rest of the world the big 'FUCK YOU' for the last fifty years. Doing pretty much whatever they like.
It's the same old shit for us. Just another country compensating by trying to show how powerful they are. Actually, it's kinda refreshing that there is another 'sustainable' player.
I work in an area that is 100% R&D (along side twenty other coders).
IT departments really do vary in size. Ours is easily over 150 people.
My boss is still a coder and occasionally puts out a paper. He reports to the director who is a people manager and not an IT expert. And he shouldn't be one. He has 6 managers under him who know their stuff to give the directions. The director makes sure that everybody talks nicely to each other.
My assistant director and director barely know how to turn on a machine.
Not that they really need to know; Their job is more of people management. Making sure that departments talk to each other (and nicely) and that friendship bonds are formed and not destroyed. Conflict resolution, etc.
I would say you are being a little paranoid. There is such a thing as a good boss, you know. I find that these are the guys who are still heavily involved in some sort of 'research'. Which is probably what he/she is doing. Probably a smart cookie, does some coding but by no means all of it. Knows enough to recognise a good text to buy for his group so they can all learn together.
I put it to you that I'd prefer to work with this guy than with your paranoid self. Do you have meetings of the secret type?
In case you need help convincing the hierarchy and you need a little ammunition to get a decent, scalable, centralised solution, you will find allies in: Engineering - find out those that teach and apply for grants doing any kind of FEA work, the robot people, Physics: The medical imagers, users of geant4 and beam, biomechanical, Comp Sci: talk to anyone related to the document searching/indexing areas, machine learning, etc Chemistry: Search you local paper repository for those that have someone from your maths department/school as an author with a chem bod - all sorts of partnerships there Mathematics: find the algebra buys. Health: You will have people trawling data on world health that use a small machine and SAS, SPSS. They will need your help
Our scientists have been having similar problems. I believe that the real solution here is to stop these guys from working on their local machines with the full sized datasets. We've provided a centralised HPC system that is connected via infiniband (and others) to multiple architectures of storage.
There is the standard/home which is DMF'ed with the top tier being 50T of total 650MB/s write (not sure of the read stat - I'm the software guy not the hardware guy). This is fine for a lot of people. They ssh into an interactive batch compute node for testing the datasets too big for a workstation (definitely for the Standard Operating Environment) and run their programs there. They can have access to up to 192GB of shared memory (12 nodes with that). At the moment, each researcher can have a maximum of 130 jobs running at any given time.
There is also a/scratch that is made up of multiple units each being 32TB and individually capable of 750MB/s write - 600MB (single client - 800MB/s total for multiple reads). This is for the terabyte dataset analysis - your 80GB datasets are that for now... they will grow quickly.
These disks can be mounted to multiple areas around the country. Not all, given some are behind with getting their security certificates working properly. The problem with removable storage is that the researcher copies their dataset onto the drive and then deletes the original. One copy in a volatile situation.
The trick here is to educate the user that working on their dataset in front of them is not sustainable. The data acquisition rate is much higher than Moore's law. Whether that is you have a 'cloud' like arrangement for those who still use standard Windows programs or a centralised pool of resources to some sort of cluster arrangement.
Some of the examples used in the cudaSDK are phoney. The sobel one can be made to run faster on the cpu - provided you use the intel compilers and performance primitives and can parallelise.
It doesn't surprise me. There is an example of Sobel for the FPGA's that tote much faster execution times, but then when you examine the code, the fpga version has algorithmic optimisations that were 'left out' for the cpu version. Again, it can be made to run faster on the cpu.
I'm not saying that GPUs are crap. For the right problem, they can be really good. It is just that they are not anywhere near the magic bullet the NVidia PR machine is saying.
Goto the source; They only timed the compute time. Cudamalloc was not part of the timing or cudamemcpy.
Sorry, I only count 'time to solution'. That is all i'm interested in. I thought it was strange that a less than O(n**2) was faster on the gpu.
It is like writing benchmarks that ignore disk IO, ignore reading from system memory, L3 cache, etc. Only time stuff that is in the registers.
So, does that mean if they went out and got the fastest Xeon processor (they used the fastest gpu - excluding the C2070), wrote parallel and used the Intel Compiler (writing to it) the speedup over the cpu is less than zero?
After having just looked at their code, also remove the cpu stl shit (actually any template since they don't vectorise). If you are writing speedy code for the gpu, to perform an adequate test for the cpu you also have to write appropriately.
hahahahaha
This gets better and better...
They only timed the compute time. Cudamalloc was not part of the timing or cudamemcpy.
Sorry, I only count 'time to solution'. That is all i'm interested in. I thought is was strange that a less than O(n**2) was faster on the gpu.
It is like writing benchmarks that ignore disk IO, ignore reading from system memory, L3 cache, etc. Only time stuff that is in the registers.
I have a system at home that's dual boot ... I like to game occasionally. But don't make me choose. My professional career depends on Linux and I need it for productivity.
I realise that I could easily do a search to work out what is going wrong and fix it, but, really I couldn't be fucked spending time on it. I (like many others) could spend that time doing other things.
This is in Microsoft's best interest to make sure that applications that run on their OS fix this.
It's more efficient if the vectors holds pointers instead of data.
It's more efficient if you don't use std::vector.
[quote]I sorta like it.[/quote]
I have mod points but couldn't find 'dad joke'
85 Thousand rows and 25 columns of floats is not what I consider a large dataset. Also, due to the scaling of your y-axis on those graphs, it is not immediately apparent that even the PyTable doesn't scale. What happens when you scale up by a couple of orders of magnitude? I'm not trying to be flamey, but how does it work when you have 85x10^6 or 85x10^9 rows?
One dataset I have is a timeseries of thousands of square kilometres of points where there is a density of around 400 points per square metre. Each point has 12 dimensions. Honestly, a flat file structure with well named directories and filenames, along with meta data stored in 2 file (txt and XML - I know, I don't like XML either, but it is not a big file - couple of KBs each) in each directory works best for us. Files are currently compressed with lzo (parallel decompression, buffered and sequential - seems to work ok for now).
We can search the repo quickly for a desired ROI using any programming language we want. There are no such problems such as version changes.
The last thing I wanted to remember about my mum when she fought a long battle with cancer was her final days. She finally died when I was 18.
Things are not pretty in those final years. Pale, tired, sick, moody but mostly high on drugs. And that is what you are leaving as a final memory to your kids. She was not that woman.
Go through existing photos of when she was a kid and make sure the photo albums (physical or not) are well documented and chronologically ordered. What was happening at the time, who was she with, how good of time was she having and how happy was she.
Go through photos of when the two of you met and dated. Document that. The happy times and the not so happy times. The two of you should go through each photo and describe the event.
I, for some reason, have only three photos of my mum. Two when she was sick (not so fun to look at) and one just before she met my dad. I would love to know where she was, what she was doing (was it during uni on break?), etc.
It is not really that big. You should be able to process that stuff on commodity hardware easily enough. I imagine the big bottleneck will be with the IO. Pretty easy to stripe some disks.
Quad core with well written code would buzz that tower fairly easily. Only talking about a few gigs of compressed data here.
I would use directory structures with an info file in each to store this information. That would be my database.
I'm not arguing with anything you state because, frankly, I don't know better. I don't do java much any more.
One thing that did grab me with your post was
1000 cpu cycle is nothing in a java program.
How many programmers require that mindset before, as you stack layer upon layer upon layer, it all becomes slow as buggery? Maybe that is exactly the reason why I don't look to java any more?
Trying to stay on topic, what about the beer can regatta at Darwin, Australia
Regatta
Becoming quite famous here.
Exactly my first thought.
Reminiscent of UFO flicks.
America says one thing, but does others. Quite honestly, the leadership there sees themselves in a cold war with the east, and are trying to take advantage of the east's not wanting to be in one.
The problem is that America has a VERY active space weapons program and will not give it up. If you look closely at what they are working on, it should be obvious that it is not about defense, but about an offense. They are
1. working on a ground based laser designed to take out eastern sats to try and stop GPS and communications.
2. Working on interceptors designed to take out incoming missiles.
3. Building nuke-powered Boomers/attack sub at a rate of 1-2 EACH [actually these are already built in abundance].
4. Getting ready to launch multiple space stations. The first one will allow civilians on-board, but the second on, are expected to be military only. There is ZERO need for a military to have a manned space station, EXCEPT as a way of hiding weapons as a prelude to an attack.
Dude, no need to get all hot about it. Just thought I'd put a little perspective on it for you. I'm sure that Americans believe that their motives are innocent but I'm sure that the chinese think exactly the same. Like I said, same old shit for the rest of us.
What? And the US hasn't been giving the rest of the world the big 'FUCK YOU' for the last fifty years. Doing pretty much whatever they like.
It's the same old shit for us. Just another country compensating by trying to show how powerful they are. Actually, it's kinda refreshing that there is another 'sustainable' player.
I learned firearm safety around age 6 and have been using/carrying firearms on my own for over 30 years
Wow! What are you scared of?
I believe the number in that graph are phony.
There is absolutely no way almost 10% of Australian households posses a gun.
Some may have gun license but definitely not a gun ... they are effectively outlawed.
Not sure about about homes, but there are starting to be plenty of these.
Will have a crack at streaming the 4K vids tomorrow morning.
I work in an area that is 100% R&D (along side twenty other coders).
IT departments really do vary in size. Ours is easily over 150 people.
My boss is still a coder and occasionally puts out a paper. He reports to the director who is a people manager and not an IT expert. And he shouldn't be one. He has 6 managers under him who know their stuff to give the directions. The director makes sure that everybody talks nicely to each other.
My assistant director and director barely know how to turn on a machine.
Not that they really need to know; Their job is more of people management. Making sure that departments talk to each other (and nicely) and that friendship bonds are formed and not destroyed. Conflict resolution, etc.
That is why I was surprised.
I would say you are being a little paranoid. There is such a thing as a good boss, you know. I find that these are the guys who are still heavily involved in some sort of 'research'. Which is probably what he/she is doing. Probably a smart cookie, does some coding but by no means all of it. Knows enough to recognise a good text to buy for his group so they can all learn together.
I put it to you that I'd prefer to work with this guy than with your paranoid self. Do you have meetings of the secret type?
A director that still codes? What a novel concept. Good for you.
In case you need help convincing the hierarchy and you need a little ammunition to get a decent, scalable, centralised solution, you will find allies in:
Engineering - find out those that teach and apply for grants doing any kind of FEA work, the robot people,
Physics: The medical imagers, users of geant4 and beam, biomechanical,
Comp Sci: talk to anyone related to the document searching/indexing areas, machine learning, etc
Chemistry: Search you local paper repository for those that have someone from your maths department/school as an author with a chem bod - all sorts of partnerships there
Mathematics: find the algebra buys.
Health: You will have people trawling data on world health that use a small machine and SAS, SPSS. They will need your help
just to name a few.
Good luck man.
I completely understand the red tape.
Our scientists have been having similar problems. I believe that the real solution here is to stop these guys from working on their local machines with the full sized datasets. We've provided a centralised HPC system that is connected via infiniband (and others) to multiple architectures of storage.
There is the standard /home which is DMF'ed with the top tier being 50T of total 650MB/s write (not sure of the read stat - I'm the software guy not the hardware guy). This is fine for a lot of people. They ssh into an interactive batch compute node for testing the datasets too big for a workstation (definitely for the Standard Operating Environment) and run their programs there. They can have access to up to 192GB of shared memory (12 nodes with that). At the moment, each researcher can have a maximum of 130 jobs running at any given time.
There is also a /scratch that is made up of multiple units each being 32TB and individually capable of 750MB/s write - 600MB (single client - 800MB/s total for multiple reads). This is for the terabyte dataset analysis - your 80GB datasets are that for now ... they will grow quickly.
These disks can be mounted to multiple areas around the country. Not all, given some are behind with getting their security certificates working properly. The problem with removable storage is that the researcher copies their dataset onto the drive and then deletes the original. One copy in a volatile situation.
The trick here is to educate the user that working on their dataset in front of them is not sustainable. The data acquisition rate is much higher than Moore's law. Whether that is you have a 'cloud' like arrangement for those who still use standard Windows programs or a centralised pool of resources to some sort of cluster arrangement.
Male's language?
I don't really care about folding themselves. More that I want them to wash, dry and put themselves away.
Some of the examples used in the cudaSDK are phoney. The sobel one can be made to run faster on the cpu - provided you use the intel compilers and performance primitives and can parallelise.
It doesn't surprise me. There is an example of Sobel for the FPGA's that tote much faster execution times, but then when you examine the code, the fpga version has algorithmic optimisations that were 'left out' for the cpu version. Again, it can be made to run faster on the cpu.
I'm not saying that GPUs are crap. For the right problem, they can be really good. It is just that they are not anywhere near the magic bullet the NVidia PR machine is saying.