pz · Slashdot Mirror

Re:Comment your data too! on Programming Things I Wish I Knew Earlier · 2010-09-06 10:30 · Score: 1

I'm in a different boat from most commenters here, I think, because I am a scientist writing simulations; some simluations run a long time and create a lot of data which would be costly to reproduce, and what I wish someone had told me early on was that I should comment my *data files*, not just my code. Each file should include the exact parameters used to create it, an explanation of what each column represents, and preferably there should be a way of knowing what version of your simulation code was used to create it. A couple of times in grad school I had toss out months of data after I discovered a bug in my code, and didn't know when the bug showed up and which data was affected by it.

(I'd welcome other advice from simulationists too; I've never had an advisor who was particularly programming-savvy, even though programming was always a large part of my research, and so I always had to make it up as I went along.)

Yes, yes, yes. This is why when I collect data (I'm an experimentalist), I save a COMPLETE copy of the code used to run the experiment along with each day's data. Hard drives (and CD/DVDs) are cheap compared to the potential time loss from not knowing exactly which data sets are subject to which bugs you will (not might, but *will*) find in your code.

I'd go one farther: unless space is a serious constraint, store your data files in ASCII. Including and especially the associated collection parameters. Everything in my lab gets stored as ASCII except a handful of data streams that would be excessively large if not stored in binary. I use the Windows world INI-style format, since there are many libraries available for parsing it (and it's easy to write your own, too), and it's very easy to read in an editor.

Re:Psychiatric consultation! on Best Way To Archive Emails For Later Searching? · 2010-09-06 06:31 · Score: 2, Interesting

When I run events, I need to be able to post-hoc review all of the correspondence for demographic analysis, often done two years after the event when the final reports are being written. Saying that this sort of behavior is odd, or not normal is either being a troll, or not understanding how the world works when you're not just a drone.

This sort of behavior is odd and not normal. If you want to keep your email, then that's fine, but thinking that it's "vitally important" is odd and I think without question points to some "OCD with some component of Aspberger". If you don't then maybe you need to re-evaluate.

I am however interested in how you pull demographic analysis out of emails? I mean, hopefully you're not suggesting that you go and chomp on the text to pull out fields of data?

So on the one hand, you think my saving email for later access and analysis is not useful, but then, you want to know why it is useful?

I run a research laboratory where we do two things, one is work on restoring sight to the blind, the other is to organize a conference every two years. The primary demographic analysis I need to do is to analyze the country-of-origin for email traffic pertinent to the conference. This has helped to raise many tens of thousands of dollars of support for the conference by demonstrating various aspects of the global attendance to funding agencies.

Being able to access my email and locate attachments, review discussions, find references, remember addresses, etc., in other words, to recall what someone once wrote to me, has resulted in millions of dollars of grant money to fund my research. Without the ability to review email that is, at times, years old, that would not be possible. Having rich access to my email stream has allowed me to fund my lab, and therefore feed and house my family and the people who work for me, publish high-impact papers, receive numerous awards, get coverage in the international press, etc., or, put better, to run the daily business of a research lab at a high-profile university. While the tools I use are good, they leave a lot to be desired, and having a better system would make me more productive.

IMO, this is one of the best Slashdot questions ever, and I am greatly anticipating hearing some good answers, especially if they don't include suggesting GMail as a panacea,

I think that GMail could be the panacea here. I mean, if you're just trying to make sure it lasts and you can search it with ease, then GMail can do it better than you can.

I dislike GMail for my professional correspondence for a number of reasons: (1) it does not allow me to readily use my university affiliation address (and since that's a top university, that makes a difference whether people like it or not), (2) I do not have ownership of my email, (3) the lack of a good filing / archiving interface makes it hard to associate different threads together, or to limit searches (I intensely dislike the tagging feature), (4) GMail has an only rudimentary ability to edit text since it's browser-based.

I do use GMail for my personal correspondence, but that's mostly because it's the best of a bunch of poor, but free, services. It does have the best searching features, but falls down in a lot of other ways. It also would be against my employer's policies to store HIPAA-regulated email offsite. So GMail is not a panacea. Thanks for the suggestion, though.

Re:Use gmail. on Best Way To Archive Emails For Later Searching? · 2010-09-06 04:02 · Score: 1

Migrate all to gmail With gmail you got room for your couple of GB. And the search feature works like a charm. Only thing missing is "folders" to make it act like you are used to.

Although the searching features in GMail are great, I find the interface with a single unified sequence of mail, and lack of folders (the tagging feature is far too clunky) to be a major impediment. The biggest issue though, is that I do not own a copy of the information on my own server.

Re:Psychiatric consultation! on Best Way To Archive Emails For Later Searching? · 2010-09-06 03:58 · Score: 4, Insightful

You, sir, are a mental case! I suspect you have OCD with some component of Aspbergers that is making you have this fixation on doing all this work to save ancient bits of information.

How was this modded Informative? Saving correspondence for future reference is critically important. I have many times needed to refer back to messages that are years old, in order to pull up a vital bit of information that was suddenly relevant. I have needed to pull up an attachment from an email a few months old old, or view the exact wording of correspondence, check the date of a quotation, etc., more times than I can count, so searching and retrieval are both vitally important. When I run events, I need to be able to post-hoc review all of the correspondence for demographic analysis, often done two years after the event when the final reports are being written. Saying that this sort of behavior is odd, or not normal is either being a troll, or not understanding how the world works when you're not just a drone.

IMO, this is one of the best Slashdot questions ever, and I am greatly anticipating hearing some good answers, especially if they don't include suggesting GMail as a panacea, as I want to have the email text and attachments in my possession.

Re:Email is overused on GMail Introduces Priority Inbox · 2010-08-31 03:57 · Score: 1

Easier said than done. I get a daily feed of slashdot into my gmail account. I don't need it since I prefer going directly to the website. But, I can't unsubscribe, even when I follow the simple directions.

Mark it as spam. Do that a handful of times and you'll never see it again.

Re:It's absolutely ridiculous on Flight Data Recorders, Decades Out of Date · 2010-08-31 03:49 · Score: 1

Umm, no. You're almost a century out of touch with reality. What you say was true in 1930s.

Today, when an airplane crashes, the human has failed. Pretty much always. Technical issues that lead to crashes are very, very rare. If you were to place monetary bets, a winning strategy is to bet for human failure.

I believe this assertion to be true, but do not have data to support it. I'm certain the data exist, though.

However, there have been some recent mechanical failures that were very important to understand because of the wide-spread safety implications. I'm thinking mostly of the problem with lubrication of the horizontal stabilizer lead screw on MD83 aircraft from 10 years ago that resulted in at least one crash and the grounding of the fleet (and, now that I've reviewed the incident on Wikipedia, it also resulted in a revision of Alaska Airline's maintenance schedule).

While it would seem that most errors are pilot errors, we still need to pay attention to the mechanical ones.

Re:Also on Sorting Algorithm Breaks Giga-Sort Barrier, With GPUs · 2010-08-30 05:56 · Score: 1

Those are all valid practical reasons, but theory still wins out. If the implementation has different constants for different algorithms, that's all well and good until the data set gets large enough.

O(n) will, for large enough n, always be faster than O(n^2). The range for n might be beyond your current level of interest, which is certainly something to take into consideration when selecting an algorithm, but you would be very surprised at how quickly n^2 grows, even when the leading constant is very small. An O(n^2) algorithm that works reasonably quickly for n=100, say one second, is unlikely to be useful when n grows to 100,000 and suddenly the running time is a million times slower. If you have been so foolhardy to implement as O(n^3), it becomes highly questionable if there are enough computers at your disposal to have a reasonable running time. At O(n^4) there are not enough computers on the planet, even if you could cobble them all together. How easy is O(n^4)? Surprisingly so, if one part of a system has O(n^2) behavior, and it calls a second part of the system, perhaps implemented by a different programmer, that also has O(n^2) behavior.

Order of growth wins, every time, once the problem gets big enough. If the programmer is lazy, and does not pay attention to global algorithmic optimization, it wins very very quickly. Developing a faster algorithm is far, far more powerful than spending more money on hardware.

Re:Great news on It's Official — AMD Will Retire the ATI Brand · 2010-08-30 05:37 · Score: 1

So with current die-sizes of about 146mm^2, assuming it's really square, we have a maximum length of about 1.7cm. Sounds like we can go up to 9Ghz, at least if we are just using the speed of light in vacuum.

And that's why we've been seeing chips top out at around 3-4 GHz: on-chip signals only travel at a fraction of the speed of light.

Re:Also on Sorting Algorithm Breaks Giga-Sort Barrier, With GPUs · 2010-08-30 01:33 · Score: 3, Interesting

People need to better understand that it is a theoretical tool for comparing speed factors algorithms. That is useful, but you have to then consider the reality of the situation.

Right. And any good programmer understands that *first* you pick the right algorithm, and *then* you optimize the implementation. Working the other way around is wasting time.

But, more importantly, that the parent seems to miss, is that the speed improvement from changing the order of growth of an algorithm can swamp nearly any possible improvements in hardware. Going from O(n^4) to O(n^2) when n is 10,000 is like replacing your desktop box with ten thousand supercomputers. No amount of implementation tweaking or upgrading processors and memory -- which only affects the constants in algorithm speed -- is going to give you the same level of speedup.

There is a very, very good reason to pay attention to O() analysis of algorithms, and, in particular, their implementation. You can implement bubble sort which is O(n^2) when done correctly, in a brain-dead way that makes it O(n^3) --- if you, for example, re-extend the output array for each new element --- and the difference is huge. Extra factors of n creep in easily, and understanding them can be highly useful.

So, the parent can review real-world constraints all he wants, but in the end, order of growth is more important.

Incomplete Redaction? on 3 Prototypes From HP, In Outline · 2010-08-29 13:14 · Score: 1

In both images, there is a partial view of a white item that looks very much like a wristband device off to the right, sitting on the keyboard-ish tablet.

Perhaps this is the same item as what's been blacked out on the subject's wrist?

Re:Cpt Obvious Observation on Video Showing Half a Million Asteroid Discoveries · 2010-08-26 06:32 · Score: 1

discovered in a direction of the earth's orbit opposite the sun

Yeah, we call that "nighttime" around here.

Doesn't that mean the region within the earth's orbit will be substantially less studied than the region outside the earth's orbit? Moreover, given the confounding effect of figuratively staring in the direction of the sun, won't the same tools be far less sensitive than when looking at the night-time sky? Could it be that there are many more asteroids closer to the sun than us than we have observed?

Is there a professional astronomer here who can give an authoritative answer?

Re:Some notes From The Creator on Video Showing Half a Million Asteroid Discoveries · 2010-08-26 06:19 · Score: 1

I'd love to see more of the animation at the end, showing the dynamics of all these objects. That's fascinating!

Re:Sysadmin's take on Should Developers Have Access To Production? · 2010-08-25 06:34 · Score: 1

On the flip side, sometimes developers will just flat out need access. In this case, at least in my experience, a clone does the job just as well. You just need to have a couple servers sitting around specifically for development use, and then have a way to clone machines to this hardware in short order. In my years of experience I have yet to come across a problem that absolutely needed to be tackled on a production server.

I've run into one instance where code worked perfectly in development, perfectly in QA, but under the real-world load of production failed in interesting and hard to understand ways. It ultimately came down to a question of network delay under heavy traffic, similar to a classic race condition, but not as simple. Determining the problem and the correct solution required debugging in production, but all of the code went through all of the normal controlled, auditable deployment processes.

Re:Need some sharper glass... or better physics on Canon Unveils 120-Megapixel Camera Sensor · 2010-08-24 08:59 · Score: 1

With this new sensor, just the readout would prevent this sensor from being used in any but the most specialized of applications.

I'm thinking that you've hit the nail on the head: this is for specialized applications where things like light level are controlled and long read-out time can be tolerated. I'm familiar with some projects that are attempting to build nano-meter scale 3D reconstructions of the brain (they constructed the first gigapixel camera, although didn't announce it), and they would hugely benefit from a huge sensor like this.

I'd put good money on scientific and industrial applications.

Americans with Disabilities Act? on Portal On the Booklist At Wabash College · 2010-08-20 12:34 · Score: 2, Insightful

How will they deal with students who have physical disabilities? I'm thinking oh, paralysis, cerebral palsy, or anything else that leaves manual dexterity impaired. Or what about visually impaired or blind students? Remember this is a required course for all incoming students. Sounds like a half-baked idea from this distance, and yes, I did read the article.

Let's hope Wabash doesn't get into a heapload of trouble for not complying with the ADA, like losing any Federal grants they might have.

Re:Private technological gizmos on Building a Traffic Radar System To Catch Reckless Drivers? · 2010-08-20 10:02 · Score: 3, Informative

will never replace rule of law.

My understanding is that the story submitter is trying to provide the police / government with a means to enforce the law. You'll note the phrase, "but the motorcycle driver who was responsible fled and the police weren't equipped to catch him," implying that the police do not have sufficient means.

You'll also notice that the summary states, "build a traffic radar system able to capture a vehicle's speed," and "[t]here are laws, but not much willingness to enforce them," and hopes with the hypothetical new system that fines will be levied. This, along with the general tone and explicit suggestion of rolling fines into additional technology, would all suggest that the submitter is looking to bootstrap rule of law.

Re:Same for coax vs. optical ... on Calling Shenanigans On Super SATA's Claimed Audio Qualities · 2010-08-19 09:29 · Score: 1

We just put everything behind two UPS with an autoswitcher in the middle and never looked back.

That would be breaking the ground loops.

Re:Same for coax vs. optical ... on Calling Shenanigans On Super SATA's Claimed Audio Qualities · 2010-08-19 09:25 · Score: 1

Shouldn't any decently designed DAC have whatever technical measures are necessary to ensure that analog noise coming in on the digital line does not get passed out in the analog output of the DAC? Trying to solve the problem with the cable seems to me to be the wrong way to attack such a problem - the playback device which generates the final analog signal should take care of that problem. Of course, from that point forward (from the DAC to an amp, and from the amp to the speakers) you certainly do need to worry about those analog interference problems. But you shouldn't have to worry about it on the digital portion of the system.

This is just the sort of reasoning that, unfortunately, gets you into trouble with ground loops. We implicitly assume, in many designs, that ground is solid and unvarying. Reality is not so kind, especially when you have a large loop that can act as an antenna for mains frequencies (50 / 60 Hz). The loops get formed when you plug two or more pieces of equipment into the same circuit, each with a ground wire that ultimately leads back to a pole in the ground, and then hook up a signal cable between the two boxes. The loop formed in the ground wires (pole through wall wiring to plug to power cable to case A to signal cable shielding to case B to plug to wall wiring back to pole) can have a huge effective cross-section, and thus pick up enough voltage to interfere with the signal.

The naive attitude is that these don't matter, but, unfortunately, reality is a harsh mistress and doesn't care what one thinks should or should not be true.

Seriously, read up on ground loops.

Re:Same for coax vs. optical ... on Calling Shenanigans On Super SATA's Claimed Audio Qualities · 2010-08-19 08:28 · Score: 1

If the base signal is identical but you remove a source of mains hum by breaking a ground loop you can have a very audible improvement.

But that mains hum would have to enter *after* the digital->analog conversion, no? So the cable still wouldn't matter, unless you're saying that the cable itself is transferring hum from the dvd player to the analog amp.

Yes, the grounding shield in the cable is *exactly* what would be creating the hypothetical ground loop that would be broken by going to optical. Furthermore, the cable isn't doing so much of the transferring as it is acting as an antenna in concert with the cases and other cable grounds in the system. You might want to read up on ground loops.

Grounding equipment correctly is actually quite difficult. Books have been written on the subject (some good, some not so good). My personal favorite is "Grounding and Signalling Techniques in Instrumentation" by Morrison.

Re:Analog Computers on Chips That Flow With Probabilities, Not Bits · 2010-08-18 02:47 · Score: 2, Informative

It's not analog in the sense that we use op amps, we still use gates

What's the difference? A gate is just a high speed high gain ultra high distortion opamp.

Forgot your introductory digital design courses already?

Digital circuits are designed to reliably transmit or compute a digital value in to presence of noise. The way this is done is by excluding huge ranges of voltages and making very high gain op-amps that, while fast, do not need to be accurate. Accuracy is thrown out the window in favor of speed and noise immunity. You will (or should) never see a properly operating op-amp in a digital circuit putting out a voltage other than something in the range representing a 0 or 1 (in TTL-compatible circuits for example, 0 to 0.2 V for a 0 and 4.7 to 5.0V for a 1 ... note that I'm quoting output ranges not input ranges). The acceptable voltage ranges were designed such that a valid 0 signal when combined with inevitable noise would still be read as a 0 at the next stage; mid-range values are not permitted. See, eg, http://www.interfacebus.com/voltage_threshold.html .

Op-amps designed for accurate reproduction of analog values are an entirely different creature, one where accuracy is among the primary design requirements. In contrast to digital circuits, a mid-range value is not only permissible, but expected.

So while both digital and analog logic use op-amps, the design requirements and valid signal ranges are vastly different.

Re:Sounds reasonable on Ray Kurzweil Does Not Understand the Brain · 2010-08-17 05:18 · Score: 1

Myers goes off in a tangent about biochemistry which has nothing to do with the argument. I've never read anything hinting that the way to simulate a human brain would be to simulate how the molecules in the brain behave. We don't build airplanes with flapping wings either, machines can emulate the functionality of a living being without need to simulate the exact details.

I think you missed the point of the article: Kurzweil says that the genome has all the information we need (and then creates a lower not upper bound, and builds a prediction on that). The genome only contains, as far as we understand, descriptions of molecules to be constructed; the only way of getting a brain from the genome is to simulate those molecules. While many neuroscientists think that the complete wiring diagram of the brain should be sufficient to simulate the brain, the genome does not directly encode the wiring diagram. Myers gives some examples of how knowing the description of a protein is insufficient to understand the interactions that protein might have in the whole organism. By extension therefore, knowing just the genome won't allow us to predict how the organism, or the brain within the organism in particular, works.

I'm siding with Myers in this case: from the genome we conceivably might be able to simulate a whole organism, but we'd have to simulate the full developmental cycle to get a working brain. I doubt we'll be doing that in ten years for anything more complex than a single cell. A hundred to get to a full brain? Probably. Ten? Not likely, but I'd love to be proved wrong.

slightly negative on production cars on Cambered Tires Can Improve Fuel Economy · 2010-08-15 23:57 · Score: 1

Of course, there are negative effects too — namely increased tire wear and impaired ride quality — which is why production cars almost always have zero camber.

My understanding from hacknig cars for a couple of decades is that auto manufacturers tend to specify slightly negative camber, and even progressive negative camber that increases with tire deflection (when the steering wheel is turned) in order to IMPROVE handling. Without negative camber, cars tend to feel squirrley and difficult to control. With negative camber, the car tends to feel more stable, and, importantly, the steering wheel returns to center on its own.

Re:Matlab Structures on How Do You Organize Your Experimental Data? · 2010-08-15 12:09 · Score: 4, Interesting

I have to organize and analyze 100 GB of data from a single day's experiment. Raw data goes on numbered HDs that are cloned and stored in separate locations. This data is the processed and extracted into about 1 GB of info. From there the relevant bits are pulled into hierarchal matlab structures. Abstract everything that you can into a MATLAB structure array. Have all your analysis methods embedded in the classes that compose the array. Use SVN to maintain your analysis code base.

Yes, yes, yes.

I have very similar data collection requirements and strategy with one exception: the data that can be made human-readable in original format are made so. Always. Every original file that gets written has the read-only bit turned on (or writeable bit turned off, whichever floats your boat) as soon as it is closed. Original files are NEVER EVER DELETED and NEVER EVER MODIFIED. If a mistake is discovered requiring a modification to a file, a specially tagged version is created, but the original is never deleted or modified.

Also, every single data file, log file, and whatever else that needs to be associated with it is named with a YYMMDD-HHMMSS- prefix and since experiments in my world are day-based, are put into a single directory called YYMMDD. I've used this system now for nearly 20 years and not screwed up with using the wrong file, yet. FIles are always named in a way that (a) doing a directory listing with alpha sort produces an ordering that makes sense and is useful, and (b) there is no doubt as to what experiment was done.

In addition, every variable that is created in the original data files has a clear, descriptive, and somewhat verbose name that is replicated through in the MATLAB structures.

Finally, and very importantly, the code that ran on the data collection machines is archived with each day's data set so that when bugs are discovered we can know EXACTLY which data sets were affected. As a scientist, your data files are your most valuable possessions, and need to be accorded the appropriate care. If you're doing anything ad-hoc after more than one experiment, then you aren't putting enough time into a devising a proper system.

(I once described my data collection strategy to a scientific instrument vendor and he offered me a job on the spot.)

I also make sure that when figures are created for my papers I've got a clear and absolutely reproducible path from the raw data to the final figures that include ZERO manual intervention. If I connect to the appropriate directory and type "make clean ; make", it may take a few hours or days to complete, but the figures will be regenerated, down to every single arrow and label. For the aspiring scientist (and all of the people working in my lab who might be reading this), this is perhaps the most important piece of advice I can give. Six months, two years, five years from now when someone asks you about a figure and you need to understand how it was created, the *only* way of knowing that these days is having a fully scripted path from raw data to final figure. Anything that required manual intervention generally cannot be proven to have been done correctly.

Re:Who cares about lithography? on How Much Smaller Can Chips Go? · 2010-08-13 11:02 · Score: 1

The diameter of a silicon atom is roughly. 0.25 nm. That means that 32nm is about 120 atoms across. A 16nm line is about 60 atoms across.

For reliable use, there is going to be an approximate minimum to number of atoms in a line. Electron interactions among individual atoms are quantum events, so for any sort of predictability you're going to need enough atoms for the probabilities to average out enough. I don't know how many that is, but it pretty much has to be more than one.

I have a great deal of faith in the ingenuity of the companies involved, but there is a lower limit that's independent of fabrication, and we've got to be getting fairly close to it.

A single transistor channel is more than a one atom wide chain of silicon atoms N nm long. Averaging comes from not just the length but the width, too.

Re:Hold everything on Lasers Approach Their Ultimate Intensity Limit · 2010-08-12 04:51 · Score: 1

On the subject of light into matter, let me contribute a useless computation:

(Annual energy consumption of Earth population) / (Speed of light)^2 / (Mass of 1967 Volkwagen Beetle) = 6.3.

That's more than I expected...

Damn, I think that's my new favorite normalization!

More seriously, though, do you think we know the annual energy consumption of humans to within a factor of 10?

Slashdot Mirror

User: pz

Comments · 1,774