Slashdot Mirror


20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report (winbeta.org)

An anonymous reader writes from a report via WinBeta: A new report from scientists Mark Ziemann, Yotam Eren, and Assam El-Osta says that 20% of scientific papers on genes contain gene name conversion errors caused by Excel. In the scientific article, titled "Gene name errors are widespread in the scientific literature," article's abstract section, the scientists explain: "The spreadsheet software Microsoft Excel, when used with default settings, is known to convert gene names to dates and floating-point numbers. A programmatic scan of leading genomics journals reveals that approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions."

It's easy to see why Excel might have problems with certain gene names when you see the "gene symbols" that the scientists use as examples: "For example, gene symbols such as SEPT2 (Septin 2) and MARCH1 [Membrane-Associated Ring Finger (C3HC4) 1, E3 Ubiquitin Protein Ligase] are converted by default to '2-Sep' and '1-Mar', respectively. Furthermore, RIKEN identifiers were described to be automatically converted to floating point numbers (i.e. from accession '2310009E13' to '2.31E+13'). Since that report, we have uncovered further instances where gene symbols were converted to dates in supplementary data of recently published papers (e.g. 'SEPT2' converted to '2006/09/02'). This suggests that gene name errors continue to be a problem in supplementary files accompanying articles. Inadvertent gene symbol conversion is problematic because these supplementary files are an important resource in the genomics community that are frequently reused. Our aim here is to raise awareness of the problem."
You can view the scientific paper in its entirety here.

7 of 349 comments (clear)

  1. Re:Wait, what? by pem · · Score: 5, Funny
    > Spreadsheets are wonderful things...

    Citation needed.

  2. Re:Excel can kiss my 5" wide anus! by FunkSoulBrother · · Score: 3, Funny

    The VisiCalc Song
     
    [ala' "Let's Get Physical", made popular by Olivia Newton-John]
     
    I'm savin' all of those back issues of "Byte"
    Making the micro conversion
    I gotta handle text just right
    Ya know what I mean?
     
    I took you to a local computer store
    Then to a compu-fair shopping spree
    There's nothing left to purchase now
    'less it's, programmability...
     
    [BEGIN Chorus (invoked later)]
    Let's get VisiCalc*, VisiCalc
    I wanna get Visi-Calc, let's invoke VisiCalc
    Let me hear your modem talk, your floppies squawk
    Let me hear your I/O rock...
    [END Chorus]
     
    I've used paper, I've used wood
    Tried to keep my pen on the table
    It's getting hard, this hardware stuff
    Ya know what I mean?
     
    I'm sure you understand what eleven's* do
    You know the software intimately
    You gotta know, you're bringing out
    the VisiPlot* for me...
     
    [Invoke Chorus]

  3. Re:LaTeX by NotInHere · · Score: 4, Funny

    LaTeX is not free of problems either. They are just different. If you care, you take the time to fix them, if you don't you don't fix them. Simple as that.

    https://pbs.twimg.com/media/Ci...

  4. Re:LaTeX by GodelEscherBlecch · · Score: 3, Funny

    still use Excel to do the crunching

    Good god, I hope not. When I saw the title for this article I thought for sure it was referring to errors caused by the aggregation of questionable digits resulting from machine precision floating point operations, not something as simple as type conversions. Excel has been the bane of my existence for years because testers keep trying to use it to verify results from a data processing framework I wrote where the operations for some use cases involve 20+ digit decimals. No matter how many times I explain to them the concepts of machine vs. arbitrary precision, decimal precision vs. accuracy, rational vs. decimal representations of numbers, etc. the spurious 'rounding error / does not match the XLS' bug reports just keep coming. Drives me nuts. The idea that scientists may be making the same mistakes with important research is kind of scary.

    Then again, I am usually shocked by the amount of error considered tolerable in the scientific / EE applications of the framework. The real anal retentives are the financial use cases, which tend to include 'penny allocation' algorithms for distributing fractions of pennies left as remainders from dollar amounts in the 10s of millions, and they absolutely will file a critical severity issue over a .00000000001 discrepancy.

  5. Not even in top 10 mistakes by burtosis · · Score: 3, Funny

    Just do a google scholar search for large hardon collider. None of them will ever live that down, doubly so when it's the title

  6. Re:Wait, what? by Dog-Cow · · Score: 3, Funny

    What kinds of major errands have gotten in...

    Grocery shopping, filling the gas tank and picking up the dry cleaning.

  7. Re: Including this one? by Maritz · · Score: 5, Funny

    I can see how it helps with people who don't spell very well. But for people who can, it's an outright hinderance. Also, they should grow up and add profanity, The puritan dictionary is a ducking disgrace, utterly shot.

    --
    I do not want your cheap brainburning drugs. They are useless for work. And I am a working man today.