robbyjo · Slashdot Mirror

Re:Correlation != Causation on Blood Test of 4 Biomarkers Predicts Death Within 5 Years · 2014-02-27 16:12 · Score: 1

Oh the classical correlation != causation meme! Read the f***ing paper first and understand the arguments!

You should understand WHICH 4 biomarkers they are testing: VLDL, Albumin, Citrate, and Alpha-1 acid glycoprotein. If these four are high, chances are the metabolism behind these four indicators has been wrong for DECADES and is hardly reversible. It makes sense, therefore, to predict 5-y mortality rate with these 4 biomarkers. Sure the prediction isn't perfect, but boy are they good indicators of someone health just as fasting blood glucose, blood pressures, cholesterols and other measurements!

So, just quit this kneejerk correlation != causation reaction already and understand the science behind it!

Re:Correlations on Spoiler Alert: Smart Kids Become Successful Adults · 2013-05-10 07:27 · Score: 5, Informative

> Open articles. Ctrl-F "Controling" No results. Close tab. Nothing of value.

It does. It is abbreviated as "RGSC" on the article. Look at Figure 2 to see the model graphically and you see that RGSC is featured prominently on the top. Also, if you look at Table 2, the authors acknowledge the link between SES of origin AND math / reading abilities. But this paper shows that the math & reading abilities at 7 years old do predict mid-life SES above AND beyond the SES of origin.

Major challenge: Retrieval and storage on Researchers Achieve Storage Density of 2.2 Petabytes Per Gram of DNA · 2013-01-23 09:51 · Score: 3, Interesting

Okay, storing is "solved". How about retrieval? Especially random access retrieval that are simple and fast (relatively speaking) that allow such storage medium to be practical? Certainly not DNA sequencing that can take weeks to complete?

The second problem: DNA denature and fragment at room temperature. Certainly a -80C lab freezer for storage wouldn't be practical.

Third problem: DNA secondary and tertiary structure. The coding scheme must also solves the problem of DNA tendency to make secondary structure (like hairpin) or tertiary structure (like super-coil) that can hamper reading / access to the information. I think this is the reason why the storage uses short sequences. But short DNA sequences like the one proposed (~100 bp, from the figure) could still make such structures.

Re:It's a cheat. on A Few Million Monkeys Finish Recreating Shakespeare's Works · 2011-10-09 11:13 · Score: 3, Interesting

Then, it's not really monkeys. It's more of monkeys with an oracle. That oracle thing made a whole world of difference.

What constitutes a "real" name? on Fake Names On Social Networks, a Fake Problem · 2011-08-11 02:49 · Score: 2

What constitutes a "real" name? Take a look at Sun Yat-Sen, for example. Which one do you think is THE real name? The original name? Baby name? Genealogy name? Courtesy name? School name? Eventually, Sun Yat-Sen was famed in China because of the pseudoname he used in Japan. And Yat-Sen itself is a school name.

GUIMiner is most likely optimized for AMD cards on Bitcoin Mining Tests On 16 NVIDIA and AMD GPUs · 2011-07-13 11:29 · Score: 1

The performance of GPU-based codes is highly dependent on the video cards. I highly doubt the dismal performance of NVIDIA cards. I think the authors most likely optimized the kernel code to AMD cards. This is evident when you look at the CL kernel code and you see that there are so many hardwired constants and fixed arrays (aligned to 128 ints or longs). Moreover, the authors GUIMiner don't seem to take advantage of NVIDIA's more local workthreads (compared to AMD's).

I'd say that declaring AMD a victor is premature.

Re:Not going to happen on Ask Slashdot: How To Encourage Better Research Software? · 2011-04-29 09:16 · Score: 1

Does "agile" software development allow scrapping 100% of the code and radically change the spec (and thereby everything else) every about 6 months just because of new scientific publication? It may sound extreme, but this often happen in research. If we take time to "structure" our code, before we know it, we have to redo it all over again. We do use libraries like GSL, BLAS, ATLAS, etc. to make our lives easier. These won't change, but whatever we build on top of these often get scrapped at regular basis. So, we really don't have incentives to "beautify" the code.

Not going to happen on Ask Slashdot: How To Encourage Better Research Software? · 2011-04-29 07:21 · Score: 5, Insightful

Not only that most researchers are not proficient in programming language, they shape their codes more like prototypes so that they can modify the codes easily as the science progress. Conventional programmers will be frustrated with this approach since they want every single spec set in stone, which will never happen in research setting since research progresses very rapidly and specs can change dramatically in most cases. If you can set the spec in stone, it is usually a sign that the field has matured and is getting transitioned to engineering-type problems. Once the transition happens, it's no longer research, it's engineering. Then you can "make the code better".

Re:Exams in other cultures on Catching Exam Cheats With a Spectrum Analyzer · 2011-01-12 17:48 · Score: 2

In Ancient China, imperial exam was literally game-changing. The stake is high; it was virtually the only way peasants could become noblemen. Therefore, people did whatever it took to be successful. This system was copied and adapted to some degree in ancient Japan, Korea, or Vietnam. Hence similar attitude also pervades in these countries.

Re:Not necessarily popular with the Chinese, eithe on Chinese Written Language To Dominate Internet · 2011-01-11 09:05 · Score: 1

I think you should learn a bit into Chinese language and characters to understand how indispensable the characters really is. Consider English example of "bat", "bet", "bad", "bed", which are voiced very similarly. If spoken by non-native speakers with heavy accent, these words may be confused with "pat", "pet", or "pad". (Even in English, some accent-heavy people pronounce "pen" and "pin" identically!) A Chinese analog would disambiguate with "baseball bat" instead of just "bat" and so on. The problem is that such situation is much worse in Chinese than in English and it occurs even in daily use. This is why that most words are represented by two characters. Note that the pairing does not introduce new characters and thereby not adding to the "grinding". It's just adding new complexity to the language. Reading newspapers would require only about 4,000 characters (out of about 100K total) with about 300 tone-syllable combinations, giving about 13 of each left for disambiguation. Knowing about 2K is enough for daily conversation. Mind you these are still common use, including in formal signs or speeches. This is NOT uncommon as you've claimed.

Also, in Chinese, using more refined characters would show your erudition, politeness, or even social status. Politeness can mean everything for Chinese. So, you see, language isn't restricted for informational purposes only. It can also convey mood, politeness, formality, etc.

Note that new words are formed by juxtaposing two or more characters in an unusual way. With each character giving its individual meaning, the people could guess the meaning of the new word. If the people are deprived of the character and, say, have to read the pinyin, the meaning wouldn't be as obvious. Example: Xi3 yi1 = laundry becomes xi3 yi1 ji1 = washing machine. If the people don't know the characters, the meaning of xiyiji isn't immediately obvious. This fact makes Chinese language very intuitive and even facilitates learning. Children in China cope with this complexity pretty well. Their literacy rate is 97% in 2010.

The barrier of entry is as much as East Asian people learning English. Chinese and English are two completely different languages. For East Asian people, such barrier isn't as much, akin to the barrier of entry for learning French for English-speaking people.

Therefore, Chinese characters are indispensable.

Re:Is C++ ever the right tool for the job? on An Interview With C++ Creator Bjarne Stroustrup · 2011-01-11 08:06 · Score: 1

Yes. C++ (and Java) are indispensable for scientific software. In scientific software, the spec is ever changing as the science progresses and hence the flexibility to morph the programs as needed and maintainability are of paramount importance. On the other hand, we need the speed.

Some of these can be resolved by invoking ready-made C libraries and then called in higher level languages such as Python or R or Matlab. However, in many occasions, this luxury isn't available (e.g., Markov Chain Monte Carlo simulations or custom EM algorithm).

Re:Not necessarily popular with the Chinese, eithe on Chinese Written Language To Dominate Internet · 2010-12-29 09:04 · Score: 1

If you look at the wiki URL I cited, you'll immediately notice the problem. Chinese language IS a very terse and highly economical language with many symbols, sounds, and tones. In speech, people *disambiguate* words by pairing the words with "word-complements" (I don't know what they're called) to achieve the intended meaning. HOWEVER, the pairings are limited to daily use. Even then, there are still ambiguities. Take, for example, the word "shishi" in Pinyin. You get 23 matches. Even if you add tones, you STILL have ambiguities. If you look at the word list, they're not rare, right? If I say (in Pinyin) "shi4shi4 nan2 liao4", what does it mean? Is it "affairs of the world are hard to guess"? Or "everything is hard to guess"? Or "the state of the affair is hard to guess"? Or "affair of this world is hard to abandon"? In this situation, people disambiguate even further by putting in more "word-complements". Note that the phrase is a common complaint! It is so context specific.

Also, languages are NOT limited to spoken language. How about poems? Stories? Formalities? Jokes? Puns? If the words are written, especially in poems or terse narrative, they can be paired in almost every way and can create a very very powerful poem or narrative. Or puns! Oh man! There are so many puns based on this very fact.

Now, can you say that Chinese character is dispensable again?

Re:Not necessarily popular with the Chinese, eithe on Chinese Written Language To Dominate Internet · 2010-12-29 08:49 · Score: 1

I stand corrected, but no need to be inflammatory.

Re:Not necessarily popular with the Chinese, eithe on Chinese Written Language To Dominate Internet · 2010-12-28 08:32 · Score: 5, Informative

You seem to look at Chinese words from Japanese perspective. Correction:
1. Chinese characters are logogram.
2. Classical Chinese is mainly monosyllabic, while Modern Chinese is mainly disyllabic for disambiguation purposes. See: http://en.wikipedia.org/wiki/Lion-Eating_Poet_in_the_Stone_Den
3. Chinese characters *are* indispensable. Pinyin or other romanization techniques (plus tones) simply cannot convey the same meaning as the original characters, though you can guess. Remember that Chinese language is tonal and tones for one character can change depending on the other word(s) it is paired with. Even with the tonality marks, there are still ambiguities remain in the romanized version of the words. The same problems occur in other "simplification" or "phonetic abugidas" (e.g., bopomofo). Tonality does not exist in Japanese. See the wiki URL above.
4. Since Chinese characters are indispensable, you have to sight-read them. Yes, some phonetic clues do show up, but not always lead you to the right one. Also, there are false friends, alternative spelling (even worse in Japanese), and one dot or one slash difference may make dramatic differences in sound.

Re:Papers and Questions on NASA's 'Arsenic Microbe' Science Under Fire · 2010-12-08 04:58 · Score: 1

Why can't these scientists just take the samples and redo the experiments *the right way* (and defend it) to see whether it is indeed a methodological error? If it is a methodological error, the result will go away. Why whining?

Re:Papers and Questions on NASA's 'Arsenic Microbe' Science Under Fire · 2010-12-08 02:11 · Score: 2

Actually, an easy fix would be getting the sample from the said lake OR from the scientists themselves, and then redo the experiment to see whether they can reproduce the result. Why whining, right?

Re:Marketing Gone Wrong on Being Too Clean Can Make People Sick · 2010-11-29 09:30 · Score: 2, Informative

Follow up study on this topic (triclosan in toothpaste) in 2005:
http://www.ncbi.nlm.nih.gov/pubmed/16208383

Points still stand.

Re:Marketing Gone Wrong on Being Too Clean Can Make People Sick · 2010-11-29 09:27 · Score: 1

Triclosan covered toothbrushes? At one time science showed that triclosan reduces plaque. And what causes plaque in the first place? Bacteria! Don't put the blame solely on marketing. Be informed!
http://www.ncbi.nlm.nih.gov/pubmed/2638181

Re:Natural Resource on US District Judge Rules Gene Patents Invalid · 2010-03-29 16:08 · Score: 1

Even for a new allele, say a SNP, its combination is only A, C, T, or G. Unless they can show that it is highly unlikely that the patented modification would occur naturally, any new alleles should be patent free. And heck, they can't compare the chance with pure random chance since we know that mutations / gene modifications do not occur randomly either. Claiming so would be very hard.

New treatment may or may not be patentable as well. If the treatment involves a naturally occurring sequence from other people (I'm thinking of siRNA and the likes), they can't patent the sequence either. They can only patent the method to synthesize it. Even then, if the method is in fact a naturally occurring method (i.e., that's how the body of human or other creatures does it), then they can't patent it either.

It's a tough situation on Science and the Shortcomings of Statistics · 2010-03-17 14:50 · Score: 1

Actually, it's a tough situation. There is no real life experimental data can 100% fit the assumptions of commonly used statistical models. Real life data is messy. There is some degree of simplification. In addition, resorting to whiz-bang fancy methods that "fit" the real data may not be easily interpretable. Ease of result interpretability is what medical scientists want. There are other issues as well, such as computing time, equations derivability, etc.

In addition, many many medical scientists use statistics as a tool to filter things (e.g. candidate genes, target enzymes, treatments, etc). In this case, 100% accuracy is not really important. Once the scientists narrow down the genes, they can test the validity directly in either test animals or real people.

Depends on Open Source Software Meets Do-It-Yourself Biology · 2010-01-26 05:45 · Score: 2, Insightful

Many of these biology experiments require very expensive machines, such as microarray machines, as mentioned by the article. I don't know if purchasing refurbished machines is a wise choice since we don't want data quality to be compromised. Also, don't forget about service plans when the machines break or producing inconsistent output. Not to mention various reagents, other chemicals, and supplies such as microarray chips that make the experiment yields high quality data. These easily reach hundreds of dollars a piece. Also, purchasing such chemicals will get you labeled as a terrorist.

Another issue is gathering the samples. If you're collecting yeast, that would be simple. Arabidopsis, other small plants, mice, or other small animals, you probably need quite some space. Humans? That won't be simple at all. You have to clear privacy issues, getting the research review board to sign papers, etc. Sample collection alone can cost you lots of money and time. You can always resort to publicly available data. But chances are that you won't be able to impress scientists much for going that route. Also, most of the important discoveries are already done on this data. Most likely, all you can do is to confirm existing results or to provide some tangential additional info.

Re:Tax Exempt? on US Colleges Say Hiring US Students a Bad Deal · 2009-08-13 02:09 · Score: 3, Informative

No. Most international students have F-1 visas, not J-1. Most of exchange students are on J-1.

Re:Sounds like a good idea. on New Super Mario Bros. Wii To Include Official "Cheat" · 2009-06-19 00:32 · Score: 0

Agreed. In many cases, especially in party games like Mario Party or Super Monkey Ball, this is very useful. I don't need to play it to completion to play the party mini games. If I bring it to my friend's house, I don't need to bring my saved games just to unlock these stupid things.

Re:Are You Serious? on Should Undergraduates Be Taught Fortran? · 2009-06-13 09:42 · Score: 1

No numerical difficulties at all. But the algorithm was implemented in R. It uses smooth.spline which in turn was implemented in FORTRAN. The original FORTRAN code was GCV and PPPack from Netlib. As far as I know, nobody off the net other than me has ported these routines out of FORTRAN so that I can reasonably use them for QValue routine without invoking R whatsoever.

Re:Are You Serious? on Should Undergraduates Be Taught Fortran? · 2009-06-11 02:00 · Score: 1

I think ODE is going to be one of the simplest case of numerical library. I can simply copy algorithms from Numerical Recipes and get away with it. But, there are a LOT more written exclusively in Fortran that you don't want to touch with 10 foot pole.

For example, you should know R and why R does contain some Fortran libraries, especially BLAS and LAPACK. I think BLAS is the easiest one to translate over albeit tedious. It's simply matrix operations (add, subtract, multiply), but there are lots of them with multiple cases making it really fast. LAPACK (and other associated libraries like LINPACK, etc) requires a more intimate matrix theory. I have the necessary background to do, say, Singular Value Decomposition (SVD). But to date, the only fast SVD routines I know are in Fortran or derived from Fortran, ALONG with quirks and limitation of Fortran 77. Don't believe me? Check Jama's SVD routine and how it doesn't handle matrices with number of columns greater than number of rows with only one pass. You can get around that by invoking the routine twice (and believe me, that happens in a lot of places although 1 pass is possible), but that's partly due to the limitation of Fortran 77 of not being able to dynamically allocate arrays. The routine is so tight and fast that the option of translating it to other languages depend on the luck of machine translation (and making sense of it and try to clean up the mess, yada yada).

You have no idea that many of the libraries in the 80's are still in Fortran 77 and left untouched. Try Netlib (http://www.netlib.org/liblist.html) and try to pick one Fortran library and translate it to another language. See if you don't cry river. Believe it or not, many of them are still in wide use, usually as a part of other newer algorithms. I myself have done that. For example, a B-spline shrunk smoothing library of GCV (downloadable from Netlib) from 1985 is known to have a very good result. It's not your usual (and cheap) B-spline smoothing that you can find off the net, it's almost heaven-and-earth differences. This GCV is used in Q-value routine of 2003 to determine false discovery rates by smoothing over the P-values of thousands of genes. Nobody has tried to translate that off of Fortran. I did a daring job and spent 200 hours to translate that one library to Java with success. If I had the option not to do that, I'd rather spent 200 hours somewhere else and use whatever Fortran-Java glue to get around it. Seriously.

So, if you've never been into serious scientific library development, please don't make such arrogant and ignorant assertion. Although you can assert that Operating System is complex, the principle behind it is simple. Much simpler than a scientific formula, which requires much more math skills than just Calc1-3 and DE1-2. It's not simply translating differential or integral or what have you, that's the easiest part if somebody is giving it to you. The hard part is to read the scientific paper behind that Fortran code in order to try to make sense what the code is doing. Many of the algorithms contain some hack that makes the formula work. For example, certain algorithm define limits, magic constants, assumptions, etc that are NOT explained anywhere in the paper AT ALL. Some can be found from the papers cited by that paper, with reasons that might be unclear to you. Now, if you translate the mathematical formula off of the paper without reading it, wouldn't it be a recipe of disaster?

Many people may be gifted in coding, but very very rare have sufficient skills to translate highly numerical algorithms. Seriously. Most coders know absolutely nothing about higher-level maths. Netlib is just a start. They only contain algorithms of the 80's and early 90's (which are still widely used). Even Numpy uses plenty of untouched Fortran codes for its backend.

This ignorance of gigantic proportion of yours needs to stop. Now. If you still cling on your assertion, start an open source project that translates gigabytes of Fortran numerical library into a more modern language. See if you can even get some contributors. Good luck.

Slashdot Mirror

User: robbyjo

Comments · 355