20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report (winbeta.org)

Including this one? by Anonymous Coward · 2016-08-23 15:33 · Score: 0

Just wondering

Re:Including this one? by irving47 · 2016-08-23 17:28 · Score: 1

It was done with Open Office and NotePad

--
I had a sucky sig.
Re: Including this one? by Anonymous Coward · 2016-08-23 19:00 · Score: 1

It could be. The defective by default design of autocorrect seem to be creeping in everywhere, not just office, but sms, mobile, browser editing, file explorers. It takes in what you actually write, which is correct, and by default messes it up. Beyond catering to stupid.
Re: Including this one? by Maritz · 2016-08-23 21:11 · Score: 5, Funny

I can see how it helps with people who don't spell very well. But for people who can, it's an outright hinderance. Also, they should grow up and add profanity, The puritan dictionary is a ducking disgrace, utterly shot.

--
I do not want your cheap brainburning drugs. They are useless for work. And I am a working man today.
Re: Including this one? by Anonymous Coward · 2016-08-23 21:19 · Score: 1

There's exactly the same problem with OpenOffice spreadsheets.
Re: Including this one? by bickerdyke · 2016-08-23 21:39 · Score: 1

From today's bash.org successor:
It says a lot about you if your Autocorrect turns Voltaire into Voltaren or the other way round....

--
bickerdyke
Re: Including this one? by reboot246 · 2016-08-23 22:40 · Score: 1

I hate autocorrect with a passion and turn it off when possible. Some programs won't let you turn it off, or if they let you, they turn it back on the next time you use it.

I sure hope it's not on the voting machines this November!!
Re:Including this one? by Big+Hairy+Ian · 2016-08-23 23:27 · Score: 1

It's not just Excel. Anything using the MS ODBC CSV driver to load a csv file with the correct data will also get all of these issues enless you know how to fromat a scema.ini file.

--
Build a Man a Fire, and He'll Be Warm for a Day. Set a Man on Fire, and He'll Be Warm for the Rest of His Life.
Re: Including this one? by Nidi62 · 2016-08-24 04:56 · Score: 1

It could be. The defective by default design of autocorrect seem to be creeping in everywhere, not just office, but sms, mobile,
What annoys the crap out of me is every now and then when I am typing a text my phone decides that the first letter of every word in the message needs to be capitalized and automatically changes them.

--
The only thing necessary for evil to triumph is for it to be pitted against a slightly greater evil
Re: Including this one? by Anonymous Coward · 2016-08-24 05:00 · Score: 0

We don't speak nazi.
Re: Including this one? by RDW · 2016-08-25 04:04 · Score: 1

'Defective by design' is exactly it. On a standard Windows/Office system everything is set up to make this error the default behaviour, and the problem with gene names is so prevalent that we include a warning about it in our genomics course. The typical situation is: some upstream tool generates a very long gene list (maybe with thousands of rows) in CSV format. So far so good. But then a naive user wants to do some simple manipulation of the data and double-clicks the file to open it. Its icon includes the Excel logo and, sure enough, Excel is registered as the default application for CSV files, so it opens as a spreadsheet in Excel. Everything seems to be fine - there's a nice column of gene symbols that all seem to be correct. But hundreds or thousands of rows further down, something looks like a date and has been 'helpfully' converted into one by Excel. At this point, you can't reverse the change by changing the data type of the column - the corruption has happened silently on import and will be permanent in any saved (even CSV) version of the file. The counterintuitive but correct way to deal with a genomics CSV file (if you're mad or uninformed enough to use Excel in the first place) is to open Excel first, then run a file import with the data type specified for each column (for gene symbols, you need 'text' rather than 'general'). The answer to all this is education (avoid Excel, but if you must use it, understand the dumb way it works), but would it kill Microsoft to change the default behaviour to something more sensible (this can hadly be the only use case where this is an issue), and to include a global setting to switch it off?
Re: Including this one? by almitydave · 2016-08-25 04:42 · Score: 1

It says a lot about you if your Autocorrect turns Voltaire into Voltaren or the other way round....

...taken or applied to reduce inflammation...
That's the opposite of what Voltaire was trying to achieve, wasn't it?

--
my, your, his/her/its, our, your, their
I'm, you're, he's/she's/it's, we're, you're, they're
Re: Including this one? by Coren22 · 2016-08-25 05:34 · Score: 1

Afraid it might autocorrect Hillary to be Hitler?

--
APK likes to ask for responses to the same things over and over. Maybe he just likes the responses?
Re: Including this one? by RockDoctor · 2016-08-25 12:31 · Score: 1

but would it kill Microsoft to change the default behaviour to something more sensible (this can hadly be the only use case where this is an issue)
Science users with thousands of rows of data are a negligible market compared to beancounter-wannabes with a dozen (or maybe even up to 5 dozen) rows of data. So, to answer your question, yes it would kill Microsoft to change the default from behaviour that covers up common wannabe-beancounters errors.
Oh, the fuck-wittery of working round Excel v5 bugs like this to deal with combining hundreds of thousands of lines of data throughout the 1990s. And 2000s. And 2010s. And you an guess what I anticipate fucking with in the 2020s.

--
Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
Re: Including this one? by peawormsworth · 2016-08-25 12:50 · Score: 1

I think papers should use formats that are less proprietary and more community based, such as those supported by Open Office.
But why use ODS or XLS at all to share data? If data needs to be read as data, then maybe CSV would be better and allow the viewer to interpret the meaning and format. If the data is for visual appeal then maybe it should be in an entirely graphical form to ensure that data is interpreted as the authors intended. Or provide both, so there is no ambiguity.
Re:Including this one? by Zxern · 2016-08-25 13:18 · Score: 1

Or simply use the import function and set the data types manually, like you should be doing, instead of leaving it up to excel to try and figure out what the data types are supposed to be based on the data in the field.
The data looks like a date so excel sets it as date field. It's doing exactly what it is supposed to do. If you have complex data tables in a csv format you should be using import.
Re: Including this one? by chadenright · 2016-08-29 13:40 · Score: 1

Totally offtopic, Rockdoctor, but your sig has convinced me I ought to one day get a pet dinosaur. Do you know offhand which living family is closest to dromaeosauridae?
Re: Including this one? by RockDoctor · 2016-08-29 23:49 · Score: 1

(Phylo-)Genetially they're all similarly distant. Not as distant as crocodiles or tuataras (common other contenders for "surviving dinosaurs"), but they all come from a theropod group which shared ancestors with the dromaeosauridae.
In terms of behaviour and appearance, most commentators would suggest one of the ratites - ostrich, emu or ... cassowary. On the other hand, all of these are fairly likely to rip your liver out and show it to you while your vision turns red and fades out and they tuck in.
Psittaciformes (parrots) are one of the more deeply rooted families in the modern bird family tree. If you want a pet bird, a captive-bred (not, for fuck's sake, a wild-caught one) is a good place to start thinking. Since they often live into the high decades, be prepared for a lifetime commitment.

--
Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"

Wait, what? by Cinnamon+Beige · 2016-08-23 15:34 · Score: 4, Insightful

I think the thing that really surprises me is that all my professors told me it was hard to get published, when failing to make sure your data was correctly entered into whatever spreadsheet program you used to for number crunching (and creating graphs) was one of the dead basics of working there. Yet 1 in 5 papers has notable failures here? And nobody noticed before publishing them? What kinds of major errands have gotten in, then, if basic spot checks are getting failed?

Re:Wait, what? by Anonymous Coward · 2016-08-23 15:36 · Score: 0

Computer literacy?
Re:Wait, what? by AJWM · 2016-08-23 15:45 · Score: 5, Insightful

I don't know how much number crunching was actually involved here. I suspect the problem comes from using a spreadsheet as a database.
Because databases are, you know, hard.
And "errands" creep in all over the place. ;)

--
-- Alastair
Re:Wait, what? by MightyMartian · 2016-08-23 15:49 · Score: 1

That's my take on it. Spreadsheets are wonderful things... to a point, and then, when their developers and maintainers cross that line and start trying to use spreadsheets as querying engines, it can all get very ugly. Even if you get it to work, I have yet to see the spreadsheet software, and I was using them as far back as Multiplan, that didn't turn into a maintenance nightmare where one false step could lead to errors, or much worse, gibberish.

--
The world's burning. Moped Jesus spotted on I50. Details at 11.
Re:Wait, what? by Anonymous Coward · 2016-08-23 15:55 · Score: 0

" What kinds of major errands have gotten in, then, if basic spot checks are getting failed?"
Um, what?
Re:Wait, what? by dwywit · 2016-08-23 16:05 · Score: 1

Tell that to the CFO at my last employer.
Disclaimer - I recommended changes to nearly all of this:
This was in the Windows95 days. AUD$97 million budget, data fed to a master excel file from linked files in each department. Of course it was IT's fault when excel crashed and corrupted his master file, despite having told him that excel was not up to the job. He ordered a new laptop with more memory.
There was definitely a sense of schadenfraude some years later when he was "named" in an auditor's report. For those not in Oz, being named in an auditor's report is one step away from prosecution.

--
They sentenced me to twenty years of boredom
Re:Wait, what? by pem · 2016-08-23 16:09 · Score: 5, Funny

> Spreadsheets are wonderful things...
Citation needed.
Re:Wait, what? by MightyMartian · 2016-08-23 16:33 · Score: 2

I use them a lot for doing budgets and cash flow forecasts. That's the kind of job they were originally designed for, and within that milieu they can't really be beat. But going much beyond, where you have to use more and more query-like functionality is where they start to break.

--
The world's burning. Moped Jesus spotted on I50. Details at 11.
Re:Wait, what? by DNS-and-BIND · 2016-08-23 16:49 · Score: 1

That's what they told YOU. They said that because they want to weed you out and keep you from competing with them. They've got a nice position where they can make good money with not a lot of work and sloppy data. Why would they want anyone else to get in?

--
Shutting down free speech with violence isn't fighting fascism. It IS fascism!
Re:Wait, what? by pem · 2016-08-23 17:11 · Score: 1

I suppose you're right -- it's a poor workman who blames his tools.
Re:Wait, what? by tgv · 2016-08-23 17:17 · Score: 1

I agree with your surprise, but reviewers often look how an article fits with their own pet theories, or political allegiances. They only try to destroy it contradicts them. And many magazines allow you to propose a list of reviewers. The result is that quite a few articles just slip through.
It's often easier to kill a paper because its methods are sloppy or because not all steps have been thoroughly tested than to actually verify the data and the way it's been processed. The latter can take weeks or months in the case of a complex study, and reviewers have a few hours. So even then nobody looks at the data.
Re:Wait, what? by Anonymous Coward · 2016-08-23 17:27 · Score: 0

What? For the most part, this has nothing to do with number crunching. This is just a mistake in how the names are recorded, and amounts to there being a typo in a column label in some bar graph, not a wrong value in the size of the bar or error bars that would actually alter the results. It is however a problem for someone doing meta studies or review work and not finding the papers with typos and mistakes in the name.
Re:Wait, what? by rtb61 · 2016-08-23 17:40 · Score: 2

Spreadsheets are really useful tools but have limitations. One thing I did enjoy was using that spreadsheet layout for coding, it seems much more functional. One sheet with programming cells, another bunch of cells on the same sheet for output, various continuous variable checks, distributed where ever those checkpoints are required, multiple sheets with similar outputs in similar locations and notes all over the place. You could really turn a spreadsheet into a great custom coding interface, than once it works, clear out the spawl from the production version and leave it all in the development versions.
When it comes to errors, gees, load up a custom dictionary and you are done. Switch to Libre Office for the spell check and share your new free open source custom dictionary, done and finished. Whether or not you have M$ Office installed, you should still install Libre Office for flexibility, it's free, does not take up much space and most definitely does not spy on you. Also leaves that easy option for private dual booting, Libre Office on Linux for private work away from the old digital one eye prying up where is most definitely does not belong, M$ backorifice on top of windows probe 10, ugh, the buggers don't even wear protection as their compulsory upgrades screw up people's systems, to load a new bunch of compulsory ads, wow, just wow.

--
Chaos - everything, everywhere, everywhen
Re:Wait, what? by Dog-Cow · 2016-08-23 18:19 · Score: 3, Funny

What kinds of major errands have gotten in...
Grocery shopping, filling the gas tank and picking up the dry cleaning.
Re:Wait, what? by sg_oneill · 2016-08-23 18:25 · Score: 1

I actually worked doing government auditing in a department I shall decline to name. And as much as I railed and wrote STERN LETTERS and what not, nothing would shake the bureacrats love of their bloody spreadsheets.
The fact that the *only* asset list in the whole damn department of 10K+ employees around the country was kept on a single excel file that was collated from other excel files by a vbscript written by a secretary was not seen as a problem by anyone except me and the IT dept (who where also systematically ignored) meant that pretty much all my recomendations where systematically ignored. And now I work in the private sector again....

--
Excuse the Unicode crap in my posts. That's an apostrophe, and slashdot is busted.
Re:Wait, what? by Anonymous Coward · 2016-08-23 19:01 · Score: 0

As a jaded academic let me explain how it works: if your paper is sufficiently novel and confirms the biases of your reviewer without scooping a paper they're working on then it is likely to be accepted (usually with minor revisions so they appear to be doing their job). If it fails to confirm their biases then they will do the sort of thorough review that all papers should probably have and might just pick up these sort of errors (or not: no-one is going to check every single cell in a friggin' hugs spreadsheet). If it scoops their work... forget it.
Re:Wait, what? by SharpFang · 2016-08-23 19:38 · Score: 1

. Sheer numbers. There are literally millions of them. Manual entry into a database was checked and okay. Export from the database's proprietary, obscure storage format to common as dirt Excel was already too big to spot the fault.

--
45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
Re:Wait, what? by houghi · 2016-08-23 20:05 · Score: 1

Or they are used as something that is easy to make a list and some basic counting.
Name count
2-sept 10
1-mar 7
Total 17
Because that is what it is used for in other fields.

--
Don't fight for your country, if your country does not fight for you.
Re:Wait, what? by Anonymous Coward · 2016-08-23 20:26 · Score: 0

> Spreadsheets are wonderful things...
Citation needed.
=IF(VLOOKUP(1013437,A1:Y1,FALSE)="Citation Given")
Re:Wait, what? by Anonymous Coward · 2016-08-23 20:36 · Score: 0

Switch to Libre Office for the spell check and share your new free open source custom dictionary, done and finished. Whether or not you have M$ Office installed, you should still install Libre Office for flexibility, it's free, does not take up much space and most definitely does not spy on you. Also leaves that easy option for private dual booting, Libre Office on Linux for private work away from the old digital one eye prying up where is most definitely does not belong, M$ backorifice on top of windows probe 10, ugh, the buggers don't even wear protection as their compulsory upgrades screw up people's systems, to load a new bunch of compulsory ads, wow, just wow.
Or you could just turn off autocomplete.
Re:Wait, what? by jandersen · 2016-08-23 20:44 · Score: 1

I suspect the problem comes from using a spreadsheet as a database.
Or leaving the job of entering data to an assistant who doesn't quite understand the terminologies and abbreviations. Databases are not hard, as we all know, but creating a proper entry form to one is perhaps more work than seems worth it for an ad hoc job. A spreadsheet would do a better job, if it was possible to create custom formats for cells (which would then understand the data better); come to think of it, it probably is, I just haven't done it, so I don't know.
Re:Wait, what? by tijgertje · 2016-08-23 21:08 · Score: 1

We use them to display the data spit out by the CRM-system.
Re:Wait, what? by anarcobra · 2016-08-23 21:57 · Score: 1

Miss clicked during moderation.
Re:Wait, what? by bickerdyke · 2016-08-23 22:00 · Score: 1

I don't know how much number crunching was actually involved here. I suspect the problem comes from using a spreadsheet as a database.
Because databases are, you know, hard.
Yeah. Great.
But WHY are they so hard? Because usually, a "database" is something that is installed and stored on a network server and you can't mail a network server to your colleague, costumer or publisher. So even if you KNOW that a database would be the right tool for the job at hand, you end up with your data in a spreadsheet. Either to send it to someone or by receiving it from someone else.
The only alternative that would offer searching, filtering, sorting (in general: querying) features that you need to work with raw data (or even long lists) would be Access. I never understood why the wool that made using databases where it made sense to use them as easy as working with a doc file has been so frowned upon by the same "experts" that now complain that users use the next best thing instead: Excel.

--
bickerdyke
Re: Wait, what? by Anonymous Coward · 2016-08-23 22:38 · Score: 0

Assistant? Are you joking?
Re:Wait, what? by Anonymous Coward · 2016-08-23 22:48 · Score: 0

Blaming a tool that is broken by design isn't being a poor workman. Choosing Excel in the first place is, maybe, but then again, lots of people are forced to use brain-dead tools for their jobs because of terrible and overly-restrictive policies (IT department rules and clueless managers being primary causes of these).
Re:Wait, what? by AmiMoJo · 2016-08-23 23:03 · Score: 1

Maybe the original data was fine, exported from some other application and then imported into Excel for publication, at which point it was corrupted. CSV files don't have formatting or data type information, for example.
TFA seems to be saying that while the results are valid, Excel is a crappy format to distribute the data in.

--
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
Re:Wait, what? by Anonymous Coward · 2016-08-23 23:31 · Score: 0

Yea, but don't they read the hard science and tech news sites that tell them that spreadsheets are for playing DOOM and other science fiction games? And that was back in the early 90s of the last century! Did they soooo totally miss the Tyranny of Excel expose by these hardnosed journalists? Soon these very "scientists" will wonder where their cab driver went while Uber was taking them to their nooner. Get with the times, willyaz.
Re:Wait, what? by AmiMoJo · 2016-08-24 00:33 · Score: 2, Insightful

It's a shame that when defining open document formats they didn't bother to define an open macro format too. We have Excel using VBA macros, and OpenOffice using Python and BASIC. Google Docs uses Javascript. Wouldn't macro portability be great?

--
const int one = 65536; (Silvermoon, Texture.cs)
SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
Re:Wait, what? by Anonymous Coward · 2016-08-24 00:49 · Score: 0

In my experience, this is a non-issue. When I was publishing bioinformatics work, it was a "here is the paper, here is a dump of the data" situation. Firstly, your analysis isn't going to be re-run by reviewers (that was your job!) unless there is something glaring in the paper. Secondly, the data dump is there for replication.
In computer science, no one honestly expects to check code out from github and have it compile without making changes. One of both of you is using an obscure operating system. You don't have some dependency that the original author thinks is standard. They chose to use python 2.7.2 and not python 2.7.3. Or whatever.
In the bio world, it is the same thing. Items like "oh, the csv->excel->csv->excel->csv->conversion screwed up my gene names" is the normal part of grabbing someone else's data. Name error conversions, cloud import errors, non-default model model parameters, etc. are all part of the "doing research".
Re:Wait, what? by grasshoppa · 2016-08-24 01:02 · Score: 1

The problem is that spreadsheets and databases solve different problems, yet they are related enough that folks confuse the two. Not unlike what you did. Spreadsheets represent the full MVC concept, whereas databases are usually just the M(odel), with some (C)ontroller capabilities.
From there, the problem becomes somewhat more obvious; Because applications like Excel provide more complete functionality ( or try to at any rate ), that's naturally what anyone who needs to model data wants to use. Aside from programmers, who has time to construct a full data modeling environment using the right tools?
There's an opportunity here for MS to "fill the gap" here; provide the function-rich environment of Excel and tie it to a database backend simply. Or perhaps, considering the mess that is Access, that opportunity exists for someone other than MS.

--
Mod me down with all of your hatred and your journey towards the dark side will be complete!
Re:Wait, what? by Anonymous Coward · 2016-08-24 01:02 · Score: 0

This.
And what's worse is that I know a number of people in the scientific community (granted, mostly astronomy/physics related) that claim that they know how to code better than most profession programmers but somehow don't understand data types and/or MS Access... Me thinks these peoples' heads have gotten too big and they need to go back and address the basics of dealing with an office suite.
Re: Wait, what? by Anonymous Coward · 2016-08-24 01:09 · Score: 0

Whoosh.
Re:Wait, what? by Anonymous Coward · 2016-08-24 01:22 · Score: 0

Is she related to Miss Foreign Affairs?
Re:Wait, what? by Anonymous Coward · 2016-08-24 01:22 · Score: 0

When compared to PowerPoint.
Re:Wait, what? by jbengt · 2016-08-24 01:33 · Score: 1

Does no one do double entry bookkeeping anymore? Or do they just cut and paste the mistakes into both entries?
Re:Wait, what? by Anonymous Coward · 2016-08-24 01:54 · Score: 0

"I don't want to spend the cost and effor to write a program to to this. We will just do it in this spreadsheet." -Every boss that decided spreadsheets were better than databases, and ends up spending 10x more on maintenance.
Re:Wait, what? by Anonymous Coward · 2016-08-24 02:01 · Score: 0

Data types are very confusing to a lot of people. Excel doesn't convert anything unless the destination cell is formatted as "General," in which case it tries to guess the data type. If the destination cell's formatting is set to text nothing bad happens.
Re:Wait, what? by Anonymous Coward · 2016-08-24 02:07 · Score: 0

One of the largest government agencies in our region uses an Excel spreadsheet to manage building inventory of over 1,000 different properties. The spreadsheet is 200 megabytes. Another department in the same agency uses Excel to manage their entire project scheduling and status. We're talking about over $15 billion (yes, with a B) in projects managed in an Excel spreadsheet.
Re:Wait, what? by Anonymous Coward · 2016-08-24 02:09 · Score: 0

they are related enough that folks confuse the two
Where I work they actually refer to any workbook with more than one sheet as a database.
Drives me nuts.
captcha - grumble
Re:Wait, what? by Anonymous Coward · 2016-08-24 02:19 · Score: 0

And things are so much better in the private sector?
Re: Wait, what? by Anonymous Coward · 2016-08-24 02:22 · Score: 0

Almost certain these are large supplemental tables produced by dumping databases to delimited text.
It's when the scientist uses excel to open the file that excel corrupts the tables.
Re: Wait, what? by Anonymous Coward · 2016-08-24 02:50 · Score: 0

No kidding. Found the guy who has no clue what academia is like.
Re:Wait, what? by Anonymous Coward · 2016-08-24 02:59 · Score: 0

I think the real "errand" is in Excel.

Furthermore, RIKEN identifiers were described to be automatically converted to floating point numbers (i.e. from accession '2310009E13' to '2.31E+13').
Obviously, if you're converting 2310009E13 to a floating point number, it should be 2.310009E19 (or 2.31E19 if you're displaying it without that many decimal places).
Re:Wait, what? by Anonymous Coward · 2016-08-24 03:11 · Score: 0

All databases from servers that I know about can be exported easily to text format in several ways.
As tables (which could, if needed, be loaded into a spreadsheet):
address 1,James,5 Example Street 2,June,7 Any Road
As SQL statements (which can be loaded into another database server):
CREATE TABLE address (id PRIMARY KEY, name VARCHAR(), street VARCHAR()); INSERT INTO address VALUES (id=1, name="James",street="5 Example Street"); INSERT INTO address VALUES (id=2, name="June",street="7 Any Road");
These files can be sent easily to whoever needs them...
Re:Wait, what? by MightyMartian · 2016-08-24 03:12 · Score: 1

Of course we use an accounting system. But I've seen very few accounting systems at the lower and medium range price level that really have much in the way of decent forecasting tools. I have seen (though never used) high end accounting systems, often specialized to specific industries, that do these things, but I doubt my organization would want to pay $10,000+ with high annual support agreements just to get that functionality. What even low end accounting systems do offer is the ability to dump balance sheets, cash flow statements, income statements and the like to an Excel spreadsheet, and from their we can build forecasts.
Bookkeeping is only one part of financial management; a damned important part, but only part.

--
The world's burning. Moped Jesus spotted on I50. Details at 11.
Re:Wait, what? by __aaclcg7560 · 2016-08-24 03:13 · Score: 1

Computer literacy?
When I worked at the Google help desk, I had to walk a new CS graduate through the process of turning on his own computer and explained to him that a cubicle farm isn't the same as a university computer lab. You're be surprised by how many computer scientists don't know squat about hardware.
Re:Wait, what? by Anonymous Coward · 2016-08-24 04:06 · Score: 0

I suppose you're right -- it's a poor workman who blames his tools.

In general I agree. However, 25+ years of using Microsoft software daily has convinced me that using them is akin to forcing Bob Vila and Norm Abrams to build a house using nothing but Fisher Price power tools.
Re:Wait, what? by Shadow+IT+Ninja · 2016-08-24 04:19 · Score: 1

Papers are mostly reviewed by senior researchers, such as the head of research labs, and they seldom have more than a basic working knowledge of bioinformatics. They are not supposed to show the papers under review to anyone because they are sensitive pre-publication materials. That precludes seeking help from their own staff expert on the topic. On top of that, there is frequently a natural language barrier as well as a technical understanding barrier between the senior researcher and the local bioinformatics expert. The result is that these analyses go largely unchallenged. It is actually true, more often than not, that tables and figures from gene expression profiling experiments, for example, are not reproducible using only the information published in the paper. In fact, the methods section related to such analyses frequently don't even make sense. You have to contact the author and talk directly to the person who did the analyses to get the straight story.
Re:Wait, what? by Shadow+IT+Ninja · 2016-08-24 04:35 · Score: 1

Yes that's exactly the problem. I am one of a small minority of computational biologists who use PostgreSQL for gene expression profiling, DNA sequencing and related work. PostgreSQL interacts very well with R, the most important tool for analyses in my field. Besides the data conversion issue with Excel, it also introduces the issue of "misaligning" data. You frequently have to put together information from multiple sources, especially public databases, and merge them into one table. A lot of people take these data from different sources sort them in Excel and then cut and paste them together. This is referred to as "aligning data." This is prone to error "misaligning data." The thing is that there is almost always a common value, such as an official gene symbol or an official protein identifier which are primary keys from their source databases. So the better way to do this operation is with a database join. SQL does, in fact, make many processes in bioinformatics both easier to do and more accurate at the same time. By the way, foreign key constraints have saved me from major blunders in data processing on several occasions. Imagine that.
Re:Wait, what? by Anonymous Coward · 2016-08-24 05:00 · Score: 0

1 in 5 papers **That have supplementary data in excel format**
That's waaaaaaaaay less than 1 in 5 papers as most researchers supply csv files and not excel.
Re:Wait, what? by Anonymous Coward · 2016-08-24 05:21 · Score: 0

Spreadsheets may be o.k. for number management, but databases are for **information** management
Re:Wait, what? by JesseMcDonald · 2016-08-24 06:52 · Score: 1

The only alternative that would offer searching, filtering, sorting (in general: querying) features that you need to work with raw data (or even long lists) would be Access.
Or one of the perfectly adequate free alternatives, like LibreOffice Base.

--
"The state is that great fiction by which everyone tries to live at the expense of everyone else." - Bastiat
Re:Wait, what? by Obfuscant · 2016-08-24 07:21 · Score: 1

The problem is that spreadsheets and databases solve different problems, yet they are related enough that folks confuse the two.

That's today. The cause of today's problem is "creeping featuritis". Spreadsheets used to be spreadsheets. Now they plot data, do complicated calculations, and searches, because someone said "hey, we've got this spreadsheet, wouldn't it be great if we made it act like a database, too?"

Aside from programmers, who has time to construct a full data modeling environment using the right tools?
People who care if their data gets mangled because it looks like a number when it is really text, or looks like a date, or any of the other heuristic operations that Excel does to keep people from having to understand what they are doing.

Because applications like Excel provide more complete functionality ( or try to at any rate ), that's naturally what anyone who needs to model data wants to use.
The reason is much simpler: it's the tool that is already installed because it came as part of MS Office and it looks simple enough that it fools people into thinking they are experts when they aren't. The creeping featuritis issue has made the tool look like the right one when it truly isn't, but the basic problem is that "it's here, it's free, we might as well use it".
One of my earliest experiences with this kind of problem is when the sales department where I worked insisted on using a word processor to maintain customer records instead of a real database. Real database was too hard. Free format text in a document was much easier.
Re:Wait, what? by Bert64 · 2016-08-24 07:41 · Score: 1

Most organisations don't provide anything else, or actively refuse to... Most users have never used anything else because it's all they've ever been provided with...
In most cases it's a poor tool for the job, and something else would work much better.

--
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
Re:Wait, what? by Bert64 · 2016-08-24 07:43 · Score: 1

libreoffice supports vba and javascript too...

--
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
Re: Wait, what? by N!k0N · 2016-08-24 07:52 · Score: 1

I'd take the fisher price tools over excel any day.
Re: Wait, what? by Anonymous Coward · 2016-08-24 08:11 · Score: 0

That's not even approximately correctly written.
Re:Wait, what? by scdeimos · 2016-08-24 09:26 · Score: 1

I have a similar story from when I went back to uni to do a B.IT. The first week or two I was having to show multiple students where the power switches were on the desktop PCs in the labs - this was mid-1990's so being able to miss the massive round button on the front panel with a power icon on it was inexcusable. At the other end of the course... sure, these people might be able to write computer software in one or two languages but they still have no appreciation of what's going on in the hardware.
Re:Wait, what? by John+Allsup · 2016-08-24 09:34 · Score: 1

What is worse, is that those who work in health research often do not grasp what happens to structured logical reasoning when even one 'falsehood' creeps in. All papers whose conclusions depend upon papers with these errors must be considered suspect until checked, for example. Otherwise you are simply gambling that no errors are present. This is playing Russian Roulette with patients' lives when done in the context of the health system.

--
John_Chalisque
Re:Wait, what? by Anonymous Coward · 2016-08-24 10:44 · Score: 0

Yep. Spreadsheets are great for presentation. Graphs and pivot tables rock. They can also be useful for calculation, to a point.
But if you're using a spreadsheet to store your data, you're doing it wrong.
Re: Wait, what? by Anonymous Coward · 2016-08-24 11:12 · Score: 0

Yeah, it's called a form for data input. You can also set a custom format for cell data. This lets you not store unwanted data (eg., The dates in social security numbers and phone numbers) but keeps it looking right.
Re: Wait, what? by Anonymous Coward · 2016-08-24 11:19 · Score: 0

Dammit, the "dashes", not the "dates". Sigh.
Re: Wait, what? by Anonymous Coward · 2016-08-24 15:12 · Score: 0

I am posting as AC. I have no idea how to use a spreadsheet and I have never needed to learn. I am now in retirement but I worked in tech for years. You might even say that I forwarded an entire sector in tech.
I am not sure if my lack of use is a good or bad thing? I certainly have no idea how to use Excel.
Re: Wait, what? by Anonymous Coward · 2016-08-24 16:22 · Score: 0

dunno.
we were taught in highschool in our office tools class to use access for databases, like inventories etc.. not a spreadsheet. you would use a spreadsheet perhaps to calculate something out of said db. and this was in 1997(and office here refers to using computers in an office, not ms office in particular since other suites have equivalent tools).
i guess these researchers didnt take their automatic data processing classes in the first place though and went with bio because its a "soft" science(its not, if course).
okay okay, the real problem is that they are not scientists to begin and instead are just doing data transfer from one machine to another in a very poor manner and if nobody noticed the data wasnt in an usable form NOBODY was using it either! gene research is at a funny phase where you can basically publish bullshit and whats not bullshit is also basically just repeating the same process just for a different cell with nobody checking the papers even because its the same paper with different meaningless data nobody cares about and if the cell had something special the paper authors would not notice anyways! (and the real research thats not bs is private at monsanto etc.)
the papers should have been rejected. and now the people who checkes them should be penalized(but are not)
Re:Wait, what? by michael_wojcik · 2016-08-25 04:48 · Score: 1

Poor workers are those who don't blame the tools, when the tools are at fault.
This maxim (apparently originally a Vietnamese saying, according to Wikiquote) is one of the dumber pieces of commonplace folk wisdom floating around the IT industry. It's precisely the sort of thing that makes it difficult to diagnose and resolve real problems.
Those fond of this saying need to learn a bit about the theory of failure. I don't care how you go about it - maybe try Schulz's Being Wrong - but learn a bit about failure modes and factors that contribute to them.
Re:Wait, what? by Anonymous Coward · 2016-08-25 09:53 · Score: 0

This issue is separate from "science" errors. Going through a list of genes (or in this case, SNPs and their enclosing genes) is tedious as bloody hell, and when you have a 1000 of them (usually x3 sets) you are likely to do only an automated check (which would also occur prior to putting it all in an excel sheet for publications/supplementary).
In fact, here is an excerpt from such a list.
10,63805617,C,rs4948496, 4,Direct hit,Systemic lupus erythematosus, ARID5B|RTKN2|ZNF365, ARID5B|TMEM26|RHOBTB1, -, chr10:63655000-63660000|chr10:63805000-63810000, -,
12,56435412,C,rs705704, 1f,rs1701704,Type 1 diabetes autoantibodies,RPS26|ERBB3|PA2G4,IKZF4|SUOX|RAB5B,DHS,chr12:56390384-56390586|chr12:56389865-56390264,chr12:56137320-56137481,-,
12,56435412,C,rs705704, 1f,rs2292239,Type 1 diabetes autoantibodies,RPS26|ERBB3|PA2G4,IKZF4|SUOX|RAB5B,DHS,chr12:56390384-56390586|chr12:56389865-56390264,chr12:56137320-56137481,-,
12,56435412,C,rs705704, 1f,rs2456973,Vitiligo,RPS26|ERBB3|PA2G4,IKZF4|SUOX|RAB5B,DHS,chr12:56390384-56390586|chr12:56389865-56390264,chr12:56137320-56137481,-,
12,56435504,C,rs705705, 2a,rs1701704,Type 1 diabetes autoantibodies,RPS26|ERBB3|PA2G4,IKZF4|SUOX|RAB5B,DHS,chr12:56390384-56390586|chr12:56389865-56390264,chr12:56137320-56137481,Hit,
12,56435504,C,rs705705, 2a,rs2292239,Type 1 diabetes autoantibodies,RPS26|ERBB3|PA2G4,IKZF4|SUOX|RAB5B,DHS,chr12:56390384-56390586|chr12:56389865-56390264,chr12:56137320-56137481,Hit,
12,56435504,C,rs705705, 2a,rs2456973,Vitiligo,RPS26|ERBB3|PA2G4,IKZF4|SUOX|RAB5B,DHS,chr12:56390384-56390586|chr12:56389865-56390264,chr12:56137320-56137481,Hit,
Perhaps you can see how errors might be missed by human eyes after 1,000 of these? It's not an excuse in any way but I hope you understand that it's not a simple case of "doublecheck and the problem is solved. This is a clear type of "eyes are bleeding I'm gonna go ahead and assume everything is fine here" miss.
Re: Wait, what? by Zxern · 2016-08-25 13:22 · Score: 1

Again the problem isn't with excel. When you simply open a csv file, you're asking excel to try and guess what data type each field is. And when a field has data that looks like a date, surprise surprise excel sets it as a date field.
This is why you use import.
Re:Wait, what? by eric_harris_76 · 2016-08-27 13:42 · Score: 1

And now I work in the private sector again....
Because they went bust and quit government-ing, so you had to? ("Went out of business" just sounded wrong.)
Ha ha. Just kidding. That almost never happens.
Well, a few cities in California.
And Ferguson, MO was headed that way, when traffic ticket revenue took a nosedive for entirely non-mysterious reasons. Fortunately for Mayor Knowles and his buddies, the idiot voters recently out-voted the non-idiot voters, and some new taxes were imposed.

--
There's no time like the present. Well, the past used to be.

LaTeX by 0100010001010011 · 2016-08-23 15:36 · Score: 4, Insightful

Why in God's name are you using a Microsoft product for scientific documents?

Re:LaTeX by Anonymous Coward · 2016-08-23 15:50 · Score: 2, Insightful

LaTex and/or TeX take time to learn. Time that can be used to play with genes.
Excel is a GUI, crunches numbers, makes pretty graphs and keeps all your data in nice grids and is easy to learn (until you hit the details like these format errors). (La)Tex is great if you want to write scientific papers that are nicely laid out and can typeset mathematical formulas (Knuth's itch). However, you'll most likely still use Excel to do the crunching/graphing/saving.
How many excel sheets have you seen where it's only purpose is to keep lists/whatever in grid format?
Re:LaTeX by Anonymous Coward · 2016-08-23 16:24 · Score: 0

Troll much? The problem isn't Microsoft. The problem is that some people don't know how to correctly use their software.
Re:LaTeX by hcs_$reboot · 2016-08-23 16:32 · Score: 1

Just add a single quote ' at the beginning of the text and Excel considers it to be a string, doesn't perform conversions.

--
Slashdot, fix the reply notifications... You won't get away with it...
Re:LaTeX by hcs_$reboot · 2016-08-23 16:34 · Score: 1

+1. Not using Excel (I'm on Linux / LibreOffice), but, seriously, add that quote to tell Excel not to convert the text.

--
Slashdot, fix the reply notifications... You won't get away with it...
Re:LaTeX by Anonymous Coward · 2016-08-23 16:45 · Score: 0

I agree. Excel is NOT for science. It is for quick and dirty calculations and finance (and for even that there are better options).
I got my girlfriend using Matlab for some data analysis and graphing, and she found it way easier to deal with and learn than Excel, and she is not a programmer type person.
Re:LaTeX by NotInHere · 2016-08-23 16:52 · Score: 4, Funny

LaTeX is not free of problems either. They are just different. If you care, you take the time to fix them, if you don't you don't fix them. Simple as that.
https://pbs.twimg.com/media/Ci...
Re:LaTeX by GodelEscherBlecch · 2016-08-23 17:03 · Score: 3, Funny

still use Excel to do the crunching
Good god, I hope not. When I saw the title for this article I thought for sure it was referring to errors caused by the aggregation of questionable digits resulting from machine precision floating point operations, not something as simple as type conversions. Excel has been the bane of my existence for years because testers keep trying to use it to verify results from a data processing framework I wrote where the operations for some use cases involve 20+ digit decimals. No matter how many times I explain to them the concepts of machine vs. arbitrary precision, decimal precision vs. accuracy, rational vs. decimal representations of numbers, etc. the spurious 'rounding error / does not match the XLS' bug reports just keep coming. Drives me nuts. The idea that scientists may be making the same mistakes with important research is kind of scary.
Then again, I am usually shocked by the amount of error considered tolerable in the scientific / EE applications of the framework. The real anal retentives are the financial use cases, which tend to include 'penny allocation' algorithms for distributing fractions of pennies left as remainders from dollar amounts in the 10s of millions, and they absolutely will file a critical severity issue over a .00000000001 discrepancy.
Re:LaTeX by Anonymous Coward · 2016-08-23 17:31 · Score: 0

How capable is LibreOffice as a PowerPoint replacement? The only thing left keeping me on Mac is the Microsoft product PowerPoint ;)
I especially like dual screen and presenter view.
Thank you
Re:LaTeX by teakillsnoopy · 2016-08-23 17:45 · Score: 2

As someone who proofreads engineering and medical papers for a living, MS Word is unfortunately the standard format (authors insert little images of complex typography into the main text; it's as bad as it sounds). Some equation-heavy papers use LaTeX, but they account for maybe 1% of the total.
Re:LaTeX by dcollins · 2016-08-23 17:50 · Score: 2

Non sequitor. Calculations need to be made somewhere in the workflow process, and LaTeX does not help with that.

--
We know where leadership by an anti-intellectual "strongman" who scapegoats minorities and likes boisterous rallies goes
Re:LaTeX by dcollins · 2016-08-23 17:52 · Score: 1

Probably not a data-entry problem. More like a file conversion problem (importing text/CSV to the spreadsheet application).

--
We know where leadership by an anti-intellectual "strongman" who scapegoats minorities and likes boisterous rallies goes
Re:LaTeX by dcollins · 2016-08-23 17:53 · Score: 2

As I said upthread, it's probably not a data-entry problem. More like a file conversion problem (importing text/CSV to the spreadsheet application).

--
We know where leadership by an anti-intellectual "strongman" who scapegoats minorities and likes boisterous rallies goes
Re: LaTeX by Anonymous Coward · 2016-08-23 17:55 · Score: 0

Mate, I've seen the dumbest phd student learn latex from scratch... And matlab to do the crunching and plotting. But... Who the heck thing of excel as 'data mining' tool??? Are we crazy??? Not surprised that biosciences and medical science has a HUGE problem of reproductibility... Just just professional* tools for god sake.
* matlab, python, r... Anything but a spreadsheet!
Re:LaTeX by sexconker · 2016-08-23 18:25 · Score: 1

In finance, you're not supposed to use any floating point math.
Re:LaTeX by blindseer · 2016-08-23 18:28 · Score: 1

That's great for people that have access to Matlab from the college they attend or from their employer.
If you can prove you are a student the $500 price tag is a bit much. I don't recall how much I spent on the student version of Matlab I bought many years ago but being a student version there were imposed limits on the size of arrays it would hold. As such the one time I needed it for a project it proved worthless since it could not hold my data set. Guess what I used instead? Excel.
The "pro" version of Matlab costs over $2000 while Excel is effectively free. Microsoft has been so successful in marketing it's office products that few even think about how much that software costs. I have to wonder how many people even use Excel for it's intended purpose. People buy Office to get Word and Powerpoint, the fact it comes with Excel makes it nearly impossible for anyone to compete with that even in cases where people should know better to use a more appropriate tool.
We've seen the US government, and other governments, crack down on Microsoft for their near monopoly and largely fail. I don't think trying that again is going to break this trend. Even if somehow Microsoft is forced to break up the Office bundle there is still the matter of the inertia of the files floating around that people need to use. There is the matter of corporate culture where "no one got fired for buying Microsoft". Then there must be a product that competes with it. If the product is free (as in open source) there is an implied lack of value and quality due to no price tag on it. Someone might try to package FOSS applications as an alternative using that money to lure people from the way to get the same thing for free by promising additional proprietary features and capability. This packaging must then strike the balance between price, features, and packaging to give the right impression that people would want to buy it.
After going to college years ago for computer engineering I'm taking courses now on "big data". I have access to SAS, Matlab, Mathematica, and more that I cannot recall as I use them so irregularly. What I find myself doing is use Excel to format the data, produce some pretty graphs, and then paste them into a Word document for my final reports. This is because the tools cannot make things look as pretty, or it's just too hard to figure out when the assignment is due at midnight and being late can cost a 10% reduction in the grade.
How can this problem be solved? I don't know but I imagine it means the people that sell Matlab will have to gamble on selling their product at 1/10th the current price in the hopes they get 10X the number of buyers. That's just a minimum since the increase in support costs for the new influx of novice users will need to be covered too.

--
I am armed because I am free. I am free because I am armed.
Re: LaTeX by GodelEscherBlecch · 2016-08-23 18:39 · Score: 2

No shit. I don't. What do you think 'arbitrary precision' means?
Re:LaTeX by blindseer · 2016-08-23 18:55 · Score: 1

Because a proper tool costs money and learning to use it takes time.
Research costs money and the people that can do the research with less costs get to do more research. Microsoft Office is "free" to most because it effectively comes with the computer. Getting the right software can costs thousands per seat.

--
I am armed because I am free. I am free because I am armed.
Re:LaTeX by Trongy · 2016-08-23 18:56 · Score: 1

It could be either.
If the cell is General format rather than Text, excel can interpert what is typed in as a date or numeric format it it matches certain patterns.
When using the import wizard, to import text files, you can chose how Excel interprets each column (General, Text, Date or Skip) on the third step of the wizard. If you click Finish on the second step excel will default to general and perform conversions.
Re:LaTeX by dbIII · 2016-08-23 19:08 · Score: 1

Web pages or PDFs are a powerpoint replacement. That shit should just die because if the content is any good it's going to have to be converted into something else later anyway.
So while "simpress" does the powerpoint slide construction task you are probably better off doing a "save as PDF" from it instead of using it as a presentation tool. If you want video effects make a video file with something purpose made - powerpoint and simpress suck at both video style presentations and static documents paged through as slides. They are a "view once" format as distinct from being able to present and then give people who want another look the PDF that will work on their phones/tablets/everything.
Re:LaTeX by Anonymous Coward · 2016-08-23 19:12 · Score: 0

I had assumed that excel was being used for calculations, not writeup. Also much as I prefer LaTeX for writing papers some journals actually require word documents, so compromise is occasionally needed.
Re:LaTeX by Anonymous Coward · 2016-08-23 19:43 · Score: 0

...easy to learn (until you hit the details like these format errors).
That makes Excel a tool that appears to be easy to learn but is not. I think I prefer tools that make it obvious there are things you should learn right from the start.
Re:LaTeX by Anonymous Coward · 2016-08-23 19:50 · Score: 0

We're talking about supplementary data files, rather than scientific documents. But that's still no excuse for using a proprietary format. In my field (astronomy), data are either rendered down to simple text, or put in open, domain-specific formats (like FITS) which can be read and written by any of several open-source programs.
Re:LaTeX by Anonymous Coward · 2016-08-23 20:39 · Score: 0

A decent boss will look at the rounding errors in a billion dollar contract and write them off. It cost more to track down $1000 than is does to find it. Another department can find the embezzler and promote them.
Re:LaTeX by Anonymous Coward · 2016-08-23 20:42 · Score: 1

That depends actually. If you want to calculate what something is worth you tend to transcendental functions. So you could just as well calculate those with floats and doubles. If it is about profit your margin is specified in percentage, so precision of the price value is floating.
Not even the tax-man cares as he ignores cents. Which is anoying because their checking algorithm expects the result to be exact so you need to fudge the rounding on your balance and profit sheet to match their check.
Re:LaTeX by GuB-42 · 2016-08-23 21:19 · Score: 1

In accounting, when you add up the numbers, the result must be exactly zero, not 0.0000000001.
It is not for saving pennies, it is for verification purposes. If there is a penny missing, it means that there is a mistake somewhere. It may be a small rounding error, but it may also be several million dollar mistakes (or fraud...) adding up to one penny.
Re:LaTeX by Anonymous Coward · 2016-08-23 21:36 · Score: 0

You don't need to go to matlab to circumvent that problem. R is completely free, and so is pythonic's alternative to matlab python+matplotlib+scipy (not to mention julia, but this is still in development). Not a penny.
Even octave can be alternative if you don't need specific toolboxes (just dump the data, do some basic `crunching' and nice plots). There is no explanation other than laziness to use spreadsheet software to process data. And if that happens with the analysis of the data, God only knows what happens in the laboratory...
Re:LaTeX by thegarbz · 2016-08-23 21:52 · Score: 1

Why in God's name are you using a Microsoft product for scientific documents?
Because there's nothing wrong with doing so providing you know it's limits and it's faults.
And because these problems of user error transcend all software including those specifically designed for scientific purposes.
And because writing a document in LaTeX has little to do with crunching numbers and doing lookup tables in a spreadsheet. But hey you got your anti-Microsoft rant of the day out so success right?
Re:LaTeX by Anonymous Coward · 2016-08-23 22:15 · Score: 0

Why? Select your formatting type (or make your own) for cells. Done forever.
Re:LaTeX by Anonymous Coward · 2016-08-23 23:45 · Score: 0

Edgy edgepost, edgelord.
Re:LaTeX by Antique+Geekmeister · 2016-08-23 23:52 · Score: 1

> LaTex and/or TeX take time to learn.
I'm afraid that they're also quite useless for cut & paste transfer of data among documents. Their GUI tools are also _extremely_ limited.
Re:LaTeX by Antique+Geekmeister · 2016-08-23 23:55 · Score: 1

> Probably not a data-entry problem. More like a file conversion problem (importing text/CSV to the spreadsheet application).
Especially via cut & paste.
Re:LaTeX by Gilgaron · 2016-08-24 00:44 · Score: 1

There's not really anything in biology that can be measured with enough accuracy to have 20 significant digits... the 'rounding errors' there are smaller than the inherent noise in the data.
Re: LaTeX by Anonymous Coward · 2016-08-24 01:08 · Score: 0

To be fair, precision and IEE 374 style floating point representation aren't orthogonal issues. In general, fixed point is the opposite of floating point and is what you are using to defeat rounding errors. Arbitrary precision only means that you can change how many bytes back your fixed point number.
Re:LaTeX by Anonymous Coward · 2016-08-24 01:09 · Score: 0

How capable is LibreOffice as a PowerPoint replacement? The only thing left keeping me on Mac is the Microsoft product PowerPoint ;)
I especially like dual screen and presenter view.
Thank you
I use it exclusively. Works fine. Only problem is when people see it they get excited, then disappointed when it isn't "real" PowerPoint.
Re:LaTeX by Anonymous Coward · 2016-08-24 01:32 · Score: 0

TeX gives you a Turing complete programming environment. That is all anyone could possibly want.
Re:LaTeX by Anonymous Coward · 2016-08-24 03:00 · Score: 0

Because LaTeX has limited math functions compare to Excel? I dunno... just my guess.
Tell us all about your LaTeX experience, n00b.
Re: LaTeX by GodelEscherBlecch · 2016-08-24 03:01 · Score: 1

Uh, OK. Fixed-length buffers and overrun faults aren't the same thing either, but generally people don't feel the need to point that out.
Re: LaTeX by GodelEscherBlecch · 2016-08-24 03:08 · Score: 1

These are not rounding errors, which I already stated are not acceptable. They are mathematically correct remainders resulting from pre-realtime planning numbers being slightly off from realtime equipment telemetry. The non-profit nature of the entity doing these calculations demands that they rectify this somehow, hence the penny allocation.
Re: LaTeX by GodelEscherBlecch · 2016-08-24 03:11 · Score: 1

That makes sense, I was thinking more of physics applications.
Re: LaTeX by GodelEscherBlecch · 2016-08-24 03:14 · Score: 1

Thanks! I can always use another auto-fail interview question.
Re:LaTeX by Anonymous Coward · 2016-08-24 03:17 · Score: 0

Excel is effectively free.
As free as herpes:
Re:LaTeX by Anonymous Coward · 2016-08-24 03:46 · Score: 0

Problem: your text cannot possibly fit into a box with the font size you have selected.
LaTeX: This cannot possibly work, fix the mess you left in line x if you want something sane.
Word: It looked right in Word 95, lets fix it with an additional line break and shift the cover image to the next page. Nobody will notice.
Re:LaTeX by Anonymous Coward · 2016-08-24 04:16 · Score: 0

GNU Octave is free and open source and for most use-cases is compatible with Matlab.
https://www.gnu.org/software/octave/
Re:LaTeX by petermgreen · 2016-08-24 04:27 · Score: 1

What happens if you just load a CSV into excel? (which doesn't result in a wizard at all IIRC)

--
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
Re: LaTeX by Anonymous Coward · 2016-08-24 04:39 · Score: 0

In finance it's all integers (decimals are shifted integers), no floating point of any kind allowed.
Re:LaTeX by Shadow+IT+Ninja · 2016-08-24 04:39 · Score: 1

Most of us, who are actually experts in the field, use R for number crunching. Bench scientists, who know enough about bioinformatics to be dangerous, crunch their numbers in Excel.
Re: LaTeX by GodelEscherBlecch · 2016-08-24 05:03 · Score: 1

Uh, yeah. That's how arbitrary precision works - exponents and rationals, at least in any sane implementation I've ever seen. Any evaluation to a decimal form is deferred until the very last minute as a human presentation step.
Re:LaTeX by BitterOak · 2016-08-24 08:55 · Score: 1

LaTeX is a document preparation system, not a database or spreadsheet. It would make perfect sense to say that people should be using LaTeX instead of Microsoft Word, but it makes no sense to say people should be using LaTeX instead of Excel. Perhaps those using Excel should be using PostgreSQL for database type work, and perhaps they should be using R for data analysis, but it makes no sense to use LaTeX as a substitute.

--
If I can be modded down for being a troll, can I be modded up for being an orc, or a balrog?
Re:LaTeX by John+Allsup · 2016-08-24 09:35 · Score: 1

Because, being medicine people, they believe the are God, and hence exempt from being subject to making errors with Excel.

--
John_Chalisque
Re:LaTeX by Anonymous Coward · 2016-08-24 10:00 · Score: 0

Well technically it can, just like it can be used directly to graph functions etc. It's just that no sane person would do that.
Re:LaTeX by Anonymous Coward · 2016-08-24 18:02 · Score: 0

There's no way that is 9,895 pt too wide. Closer to 98,95 pt too wide.
Re:LaTeX by cwsumner · 2016-08-26 04:10 · Score: 1

Fixed-point math has rounding errors, too. You just don't notice them because they truncate.
P.S. Don't believe what people tell you, half are wrong and half are lying. Except a few like me... 8-)
Re: LaTeX by cwsumner · 2016-08-26 04:11 · Score: 1

Uh, yeah. That's how arbitrary precision works - exponents and rationals, at least in any sane implementation I've ever seen. Any evaluation to a decimal form is deferred until the very last minute as a human presentation step.
That actually sounds correct, to me. 8-)
Re: LaTeX by GodelEscherBlecch · 2016-08-26 05:43 · Score: 1

Yeah that's the user education battle I've been fighting. With arbitrary precision it is up to the configuring user to select the desired accuracy length and they are responsible for ensuring that it is sufficiently far out so as to make any rounding errors that occur insignificant to the result. This is another source of headaches from excel users - they always want it to 'just work like excel' even when I demonstrate cases where excel gives the wrong answer. The only other option is for me to iteravely perform the calculation, check the result for inaccurate rounding, extend the accuracy and repeat, which would obviously tank performance.
Re: LaTeX by cwsumner · 2016-08-26 09:50 · Score: 1

That sounds like something similar to the warehouse system I did a few years back. It used floating point to store quantities, and rounded to the user selection after doing any calculations or conversions. The display format was set to the same user selection, of how many decimal points. Each material type could have a different number of decimal places, in the same table column.
Only disadvantage is that, a dump to a different system requires my export app, so it can look up the settings.
The users set it to match their bank or supplier. Of course if excel hit a bug, it didn't recreate that. But I think they ended up doing work-arounds in their excel entry coding.
Still working fine...

due to the EULA MS owns your genes now! by Joe_Dragon · 2016-08-23 15:39 · Score: 0

due to the EULA MS owns your genes now!

Incentives by Anonymous Coward · 2016-08-23 15:42 · Score: 0

Scientists are rewarded by "impact", so there's an incentive to publish early, publish often. Failing to do so leads to career death.

The "impact" does not count integrity, conservancy, or maintainability of data/code/other digital products. You don't get credit for your time and labor for improving the maintainability of code or the type-correctness of data fields in a table.

No wonder scientists often publish shit code and generate shit data.

Re:Incentives by MightyMartian · 2016-08-23 16:02 · Score: 0

It's almost like science needs some sort of process where research is reviewed by other scientists in the same field, as some sort of control. I sure hope they develop peer review some day!

--
The world's burning. Moped Jesus spotted on I50. Details at 11.
Re:Incentives by MightyMartian · 2016-08-24 04:01 · Score: 1

Apparently the science hating mods either don't get irony, or don't like how it demonstrates the parent's point is utter crapola.

--
The world's burning. Moped Jesus spotted on I50. Details at 11.
Re:Incentives by PatientZero · 2016-08-25 12:13 · Score: 1

I sure hope they develop peer review some day!
That sounds like a fantastic PhD research topic.

--
Freedom to fear. Freedom from thought. Freedom to kill.
I guess the War on Terror really is about freedom!

Not strictly Excel's fault by Todd+Knarr · 2016-08-23 15:57 · Score: 5, Informative

Those conversions look like cases where the column type during import was left at "General" instead of being set to "Text" as it should have been, telling Excel to try and infer the actual type from the format of the column's contents. It's an awkward situation where the user should be telling Excel what the data type for each column is, but it's not strictly Excel's fault for doing what the user told it to do. IMO Excel should be either changed to not have a default type and to not allow an import until the user's selected a type for each column, or it should throw up an error if it infers different data types for a column for different rows.

Re:Not strictly Excel's fault by Streetlight · 2016-08-23 16:24 · Score: 1

But don't people inputting data in a spreadsheet read what's entered? It's pretty simple to format cells, a range of cells, blocks of cells to be text, numerical with a format attached, etc. Copy and pasting or moving data around in a spreadsheet can be a bit of problem if you're not paying attention, but as the data is moved around, take a peek at what's shown.

--
In a time of universal deceit, telling the truth is a revolutionary act. George Orwell
Re:Not strictly Excel's fault by Anonymous Coward · 2016-08-23 16:27 · Score: 2, Interesting

Just because the stupid default behavior can be changed doesn't excuse it from being stupid in the first place.
(See also: ads built into Windows 10)
Re:Not strictly Excel's fault by AvitarX · 2016-08-23 16:36 · Score: 1

If it's imports from text files (where I've seen what you describe) what I'd like it them to all be treated as text, I can change the column when/if I need to, there's no reason to assume it's not text initially (as in I can't see a reason how that'd damaging).

--
Wow, sent an e-mail as suggested when clicking on "use classic" banner, and got a fast response that addressed my msg
Re:Not strictly Excel's fault by chipschap · 2016-08-23 16:44 · Score: 1

While I'm no Microsoft fan, and the Excel default behavior could certainly be better, I can't see my way to making Microsoft take the hit for this. (LibreOffice can be similarly abused, for that matter.)
The issue is multifold:
1) Using a tool not intended for the purpose; spreadsheets may be easy and convenient but everything is not a nail for the spreadsheet hammer.
2) Lack of understanding about how the chosen tool works.
3) Failure to do simple proofing and verification.
This one is down to the researchers, not the software maker.
Re:Not strictly Excel's fault by Anonymous Coward · 2016-08-23 17:38 · Score: 0

Typically the data is output from a sequencer program of some sort as a csv according to a programmed plaintext format. The problem comes in when people import it afterwards but the genes you're interested in aren't ones that suffer from the conversion error so you don't even think about looking for it.
Re:Not strictly Excel's fault by Anonymous Coward · 2016-08-23 17:41 · Score: 0

Yeah... I'm sure everyone working in transcriptomics loves going through 50000+ lines spreadsheets, trying to manually find mistakes and typos... seems "pretty simple".
Re:Not strictly Excel's fault by dcollins · 2016-08-23 17:58 · Score: 1

"I can change the column when/if I need to"
Actually, I have major problems trying to convert text to a number format once it's in the spreadsheet program. (I use LibreOffice, but I assume Excel functionality is the same.) If I take a column that's text-formatted, and click Format > Cell > Number, then all of the numeric stuff gets a single quote prefixed, sabotaging the attempt to treat it as a number. Your proposal would generate a massive user outcry for that and other reasons.

--
We know where leadership by an anti-intellectual "strongman" who scapegoats minorities and likes boisterous rallies goes
Re:Not strictly Excel's fault by sexconker · 2016-08-23 18:30 · Score: 2

Select top left cell, format as table, make sure "my data has headers" is checked.
Click just below the header row to select and entire column of data (excluding the header row). Format as you please, validate as you please. You can even right click, insert a new column, then create a formula that verifies the data and outputs nothing when it's good and "OHSHIT" when it fails verification. Then you can non-destructively filter your table on "OHSHIT".
If you're using Excel at least learn to use it. This shit is up there with people who hit Enter/Return twice to generate a blank line in Word.
Re:Not strictly Excel's fault by blindseer · 2016-08-23 18:46 · Score: 2

That's easy to say but people need to understand what they are looking for. As a Specialist in the Army I happened to be in the ops office while one of the sergeants was working on a spreadsheet that handled some sort of inventory. He knew I knew something about computers and so he asked me to come over to look at the funny formatting that Excel was doing on him. What I saw was a number that could have been a date, price, part number, or something else. I started asking the sergeant what he was trying to do with the number and what kind of a number it was supposed to be. It didn't take long before he became frustrated with me and I was told I had somewhere else I needed to be.
This was probably something better managed by a database but databases are hard, and does Office even include a database product any more?

--
I am armed because I am free. I am free because I am armed.
Re:Not strictly Excel's fault by l0n3s0m3phr34k · 2016-08-23 19:19 · Score: 1

Pretty much any suggestion from IT that involves users doing ANYTHING "new" or "different" causes a "massive user outcry." It's something you get used to, and eventually laugh about the end-users behind their backs. Like today, corporate said the end users complain if we move their office and it takes too long to move their voip phones. My response was "well, we fixed that. We never allow our users to move out of their original offices."
Re: Not strictly Excel's fault by Anonymous Coward · 2016-08-23 19:20 · Score: 0

This is spot on. The problem is defective by default design, like autocorrect messing with your typing, unsafe type assumptions, bad defaults, etc. Typically there are no simple workaround that works in all cases for the principle of avoiding surprises. The peeson to fix such issues could be the next Steve Jobs, not random know betters.
Re:Not strictly Excel's fault by AvitarX · 2016-08-23 19:51 · Score: 1

I just tested this,
A CSV file didn't let me adjust import types in Excel (2007), but I did a tab delimited text and imported a column of numbers as text, then changed it to numbers, and the behavior was not as in Libre Office. I tested in Libre, and it was indeed a huge pain.

--
Wow, sent an e-mail as suggested when clicking on "use classic" banner, and got a fast response that addressed my msg
Re:Not strictly Excel's fault by Anonymous Coward · 2016-08-23 21:08 · Score: 0

New column. =(cell reference that happens to be a number) * 1. Job done.
Re:Not strictly Excel's fault by bickerdyke · 2016-08-23 22:30 · Score: 1

1b) Lack of a appropriate tool to wrap up stuff that should be in a database (or has been extracted from a database) for display and distribution.
There is a reason why people the spreadsheet-hammer - it is still better suited than the word processor screwdriver or the plain text chainsaw., or the pdf/png belt sander.

--
bickerdyke
Re:Not strictly Excel's fault by bickerdyke · 2016-08-23 22:34 · Score: 1

I think that feature is build into excel since the Office of 95. And it still is somewhere.
And each time I need it I am not able to find it for the life of me.

--
bickerdyke
Re:Not strictly Excel's fault by Anonymous Coward · 2016-08-23 23:23 · Score: 0

Not necessarily to the researchers.
As pointed out several times, importing data into excel doesn't necessarily mean it leaves things alone.
I've seen perfectly correct inventories totally screwed by excel.
In this case, it could be the publishers screwing the form. The original data could have been correct. In fact, I would expect the original data WAS correct as the researchers would not have been able to even use the bad data...
Re:Not strictly Excel's fault by Anonymous Coward · 2016-08-24 01:07 · Score: 0

I'd say it's 100% Excel's fault. Every cell should be formatted as "text" unless otherwise specified. There's nothing more annoying than trying to copy in data that starts with a "-" and having Excel take a shit because for some reason it thinks that you're trying to put in some formula (despite the entire line being a string)
Re:Not strictly Excel's fault by dcollins · 2016-08-24 01:56 · Score: 1

Repeat for every numeric column.

--
We know where leadership by an anti-intellectual "strongman" who scapegoats minorities and likes boisterous rallies goes
Re:Not strictly Excel's fault by Anonymous Coward · 2016-08-24 03:00 · Score: 0

Which works brilliantly until you try to sum a column of "text" consisting of "125.12", "879.25", and "916.08".
So you either force the users to pre-format their fields, or you let them enter things immediately and try to guess what they want.
Even if you choose the latter, the user is still able to format the fields before or after entering data. So this is entirely a user issue.
Re:Not strictly Excel's fault by Anonymous Coward · 2016-08-24 03:26 · Score: 0

That's because Microsoft cleverly relocates all the useful tools with every release. Otherwise people might get the idea that, *gasp*, they don't really need to upgrade.
Re:Not strictly Excel's fault by ooloorie · 2016-08-24 04:43 · Score: 1

Which works brilliantly until you try to sum a column of "text" consisting of "125.12", "879.25", and "916.08". So you either force the users to pre-format their fields, or you let them enter things immediately and try to guess what they want.
Not at all. Arithmetic operators could simply work on text-formatted fields, like they do in many scripting languages. Optionally, when you apply an arithmetic operator to a text-formatted field, it could offer to change the format.
Excel's type system and formatting system is simply poorly designed.
Re:Not strictly Excel's fault by John+Allsup · 2016-08-24 09:36 · Score: 1

Or researchers should simply be banned from using Excel. There is enough money in the area to customize Libreoffice to avoid these kinds of errors.

--
John_Chalisque
Re:Not strictly Excel's fault by Anonymous Coward · 2016-08-24 14:55 · Score: 0

Exactly. Even if Excel can be blamed for doing "stupid stuff" these people are supposed to be "our best and brightest" and they can't properly learn and use a tool that's what, 20 years old? What does this say about other aspects of academic due diligence?
Re:Not strictly Excel's fault by Zxern · 2016-08-25 13:47 · Score: 1

The problem is they aren't importing at all. They're simply opening the csv in excel and getting upset when it doesn't guess the data type correctly.
User error, simple as that.

Example by Anonymous Coward · 2016-08-23 16:02 · Score: 0

http://www.treehugger.com/clean-technology/cigarette-butts-make-better-bricks.html

"There are about 6 trillion cigarette butts produced every year, creating about 1.2 trillion tons of cigarette butt waste. "

So 5 butts weigh 1 ton?

Headline: Typical example of /. fall from grace by psinet · 2016-08-23 16:05 · Score: 0

Selectively omitted non-factual click-bait Headline: "20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report"

Proper headline: "20% of Scientific Papers On Genes Contain NAME Conversion Errors Caused By Excel, Says Report" /. - we made you famous by coming here. There was a good reason for that. You are clearly getting worse. We don't need - or want - glitter and sparkles.

Further proof by ArchieBunker · 2016-08-23 16:20 · Score: 2

That nobody reads all that shit. The only people reading scientific papers are the people writing scientific papers.

--
Only the State obtains its revenue by coercion. - Murray Rothbard

Re:Further proof by Tablizer · 2016-08-23 17:30 · Score: 1

Further proof that nobody reads all that shit.

Just tell Republicans that Hillary wrote it and a warehouse full of lawyers and clerks will analyze every character.

--
Table-ized A.I.
Re:Further proof by Anonymous Coward · 2016-08-24 05:03 · Score: 0

What, you think they're supposed to clearly convey a scientific concept so that it can be used by non-researchers? Nahhhhh, it's all about being published and getting more grants.
Re:Further proof by thegarbz · 2016-08-24 07:14 · Score: 1

That nobody reads all that shit. The only people reading scientific papers are the people writing scientific papers.
We should give that a catchy name, like "peer review".

Excel is (almost) a great grid oriented editor by Anonymous Coward · 2016-08-23 16:21 · Score: 2, Informative

Many, many people use Excel as a grid oriented editor to capture a mix of text and numeric data. It's massive overkill, and the helpful features like automatic date conversion DO get in the way.

But it's ubiquitously available (either in MS form or various and sundry clones)...so it gets used.

what idiot uses Excel for anything of value? by wardk · 2016-08-23 16:37 · Score: 1

so basically any gene researching using shit tools like excel is wrong.

nice.

Re:Excel can kiss my 5" wide anus! by FunkSoulBrother · 2016-08-23 16:49 · Score: 3, Funny

The VisiCalc Song [ala' "Let's Get Physical", made popular by Olivia Newton-John] I'm savin' all of those back issues of "Byte" Making the micro conversion I gotta handle text just right Ya know what I mean? I took you to a local computer store Then to a compu-fair shopping spree There's nothing left to purchase now 'less it's, programmability... [BEGIN Chorus (invoked later)] Let's get VisiCalc*, VisiCalc I wanna get Visi-Calc, let's invoke VisiCalc Let me hear your modem talk, your floppies squawk Let me hear your I/O rock... [END Chorus] I've used paper, I've used wood Tried to keep my pen on the table It's getting hard, this hardware stuff Ya know what I mean? I'm sure you understand what eleven's* do You know the software intimately You gotta know, you're bringing out the VisiPlot* for me... [Invoke Chorus]

No surprise - same erorrs in finance & ops by RichPowers · 2016-08-23 16:49 · Score: 2

In the year 2016, a disturbing amount of human activity is run through Excel instead of proper databases.

A similar study from 2009 tested for errors in various operational spreadsheets and concluded, "Our results confirm the general belief among those who have studied spreadsheets that errors are commonplace." The Financial Times commented on the prevalence of spreadsheet errors in business, saying it's probably a function of training and organizational culture.

I've heard from a few salespeople in the software industry that their biggest competitor in the SMB space isn't $BigCRMCorp, but Excel spreadsheets that have acreted over the years.

Re:No surprise - same erorrs in finance & ops by Anonymous Coward · 2016-08-24 04:48 · Score: 0

I used to work at a well-known national bank which shall remain nameless in the credit risk group. This involved tracking all the loans we made, looking at late payment rates, default rates, and so on broken down by various factors, and tuning statistical models to predict such things for new loans, which went into the decisions on what interest rates to offer (and whether to offer a loan in the first place).
One of my responsibilities when I was first there was building a set of reports every month, for which the process was as follows:
1. Just before lunch, open a workbook in Excel.
2. Go to lunch, because it would take Excel 45 minutes to finish opening the workbook and become responsive again.
3. Upon returning from lunch, Copy and paste--literally Ctrl-C, Ctrl-V--the latest month's data from SAS into one of the sheets of the Excel workbook.
4. Click button to run a shitload of macros
5. Go for a walk, read a book, etc., because it would take another 1-1.5 hours for the macros to run.
6. Take the new graphs that were generated and put them in a powerpoint.
7. Update commentary from last month's report as necessary.
8. Send powerpoint to boss, who would then present it to her bosses.
After the second time doing this I wrote a SAS script that replaced everything up through step 6. Took me a couple of days but once it was done it did in under 10 seconds what took an entire afternoon with Excel.
Re:No surprise - same erorrs in finance & ops by Voyager529 · 2016-08-24 05:11 · Score: 1

In the year 2016, a disturbing amount of human activity is run through Excel instead of proper databases.
A similar study from 2009 tested for errors in various operational spreadsheets and concluded, "Our results confirm the general belief among those who have studied spreadsheets that errors are commonplace." The Financial Times commented on the prevalence of spreadsheet errors in business, saying it's probably a function of training and organizational culture.
I've heard from a few salespeople in the software industry that their biggest competitor in the SMB space isn't $BigCRMCorp, but Excel spreadsheets that have acreted over the years.
This absolutely doesn't surprise me. The concept of thinking about where one's data lives is nearly extinct outside of technical circles, and even Access is seen as "too complicated" by a lot of people. The utility of third normal form is obvious to us, but lots of people are perfectly served with pivot tables. How many people receive formal training in any form of database anymore? Even lots of web designers who use MySQL on the back end of their CMS software don't do a whole lot in PHPMyAdmin unless they have to.
Excel is very simple, ubiquitous, and has a low ceiling of functionality. It's the lowest common denominator, and unfortunately, it's "good enough" for lots of people.

In other news... by acoustix · 2016-08-23 16:49 · Score: 1

...scientists are too fucking stupid to use Excel properly. Format your cells correctly. It's not Excel's fault. It's no different than any other occupation that needs to use the tool correctly.

--
"A plan fiendishly clever in its intricacies"- Homer Simpson

Re:In other news... by vtcodger · 2016-08-23 18:30 · Score: 1

"...scientists are too fucking stupid to use Excel properly."
So, who, other than you of course, is Excel's target audience?
I can tell you, it's not me.
I'm only willing to fight with Excel/Open Office when I wand to do plots. And that's only because gnuplot is even more obtuse than the spreadsheets.

--
You can't see ANYTHING from a car, You've got to get out of the goddamned contraption and walk...Edward Abbey
Re:In other news... by Anonymous Coward · 2016-08-23 19:42 · Score: 1

Biologists and medical doctors are not scientists. Therein lies the problem.
Re:In other news... by Anonymous Coward · 2016-08-23 22:27 · Score: 0

So, who, other than you of course, is Excel's target audience?
Office workers, thus why Excel is part of the Microsoft Office suite.
Re:In other news... by Anonymous Coward · 2016-08-23 22:53 · Score: 0

Format your cells correctly. It's not Excel's fault.
If you first enter the data and then format, the corruption already happened and does not get fixed by formatting changes.
If you first format, you need to know how many cells you need, or use an ugly workaround of formatting a whole column/row.
The correct answer is to use the paste wizard to select correct format while entering the data. At least in Libreoffice Calc 5.0.6 you need to know you can actually right click the columns in the wizard and find the option there. I'm not sure if it's any better in Excel. An user interface problem is the program's fault, not the user's.
Re:In other news... by thegarbz · 2016-08-23 22:55 · Score: 1

So, who, other than you of course, is Excel's target audience?
Excel's target audience is people who use Excel. Whether those people know how to use it is the issue. Word edits documents just fine too, but that doesn't stop you getting someone manually typing a list of numbers, putting new-lines in instead of paragraph separators, and then everyone wondering why the hell Word's formatting systems proceed to screw up the entire document.
We in the world put a large number of people in front of a large number of systems without providing any of the basic training required to use it. The above examples I get in my workplace from a consultant, someone who's paid to write documents and yet has never done a course in Word. In his opinion Word was broken because it messed up his documents. In everyone else's opinion the guy should have done some basic training before starting his job.
Re:In other news... by Anonymous Coward · 2016-08-23 23:02 · Score: 0

...scientists are too fucking stupid to use Excel.
FTFY
Re:In other news... by Anonymous Coward · 2016-08-23 23:59 · Score: 0

I tried to use a screwdriver to hammer in a nail. The project was not a success let me tell you. Damn that screwdriver.
Re:In other news... by vtcodger · 2016-08-24 01:19 · Score: 1

Sorry, I didn't make my point well. If people smart enough to decode genomes are too dumb to use Excel "properly", what hope do ordinary users have?

--
You can't see ANYTHING from a car, You've got to get out of the goddamned contraption and walk...Edward Abbey
Re:In other news... by BradleyUffner · 2016-08-24 01:48 · Score: 1

Sorry, I didn't make my point well. If people smart enough to decode genomes are too dumb to use Excel "properly", what hope do ordinary users have?
I wouldn't call them "dumb", but it could easily fall under "Lazy", or "Not Trained Properly" when it comes to the software.
Re:In other news... by david_thornley · 2016-08-25 07:27 · Score: 1

I'm too fucking stupid to use Excel properly for any serious scientific analysis. I'm also too fucking stupid to figure out how to drive a screw with a ball-peen hammer.

--
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes

It was user error, not a spreadsheet problem ... by perpenso · 2016-08-23 16:51 · Score: 2

I don't know how much number crunching was actually involved here. I suspect the problem comes from using a spreadsheet as a database. Because databases are, you know, hard.

It was user error, not a spreadsheet problem. Prefacing the names with a ' would have identified them as a string not to be interpreted as numeric or date. If the researchers couldn't manage this do you really think they could have used a database?

That said, yes spreadsheet are overused and abused.

Wide range of quality for publications available by Anonymous Coward · 2016-08-23 16:56 · Score: 0

I think the thing that really surprises me is that all my professors told me it was hard to get published, ...

In well regarded publications. However at the other end of the spectrum there are publications that will print just about anything. How else do all the professors keep their jobs in a publish or perish world? There are not enough slots at the well regarded end of the spectrum to keep everyone employed.

Absolutely not limited to scientific publications by Sax+Russell+5449D29A · 2016-08-23 17:02 · Score: 1

Spreadsheets should be used to present data in the form of a view. Use databases to store data and spreadsheets to create views and statistics of it.

--
-SR

Lazy by SeattleLawGuy · 2016-08-23 17:06 · Score: 1

I think the thing that really surprises me is that all my professors told me it was hard to get published, when failing to make sure your data was correctly entered into whatever spreadsheet program you used to for number crunching (and creating graphs) was one of the dead basics of working there. Yet 1 in 5 papers has notable failures here? And nobody noticed before publishing them? What kinds of major errands have gotten in, then, if basic spot checks are getting failed?

It is hard-ish, at least in a number of fields, unless you are either a big name or doing something of popular interest at the moment.

Ultimately, though, this kind of error comes down to laziness more than vetting for quality--because a very minor editing mistake should not affect paper publication, but should be caught before publication. All it takes to spot the error is a careful reading of the paper by one of the professor, the grad student, a reviewer, an assistant editor, or an editor. And while hopefully they will add automated checks after this, the fact that that read is not taking place is discouraging. Scientists in particular should be reading every image or graph in their paper closely and asking "what exactly does this mean." The level of precision that goes into editing legal academic articles is insane (italicizing something wrong is a stoning offense), but you don't have to get to nearly that level of detail before at least three different people would spot an error like this.

--
Real lawyers write in C++

Re:It was user error, not a spreadsheet problem .. by 93+Escort+Wagon · 2016-08-23 17:08 · Score: 1

I think the real source of the problem is many scientists' insistence on naming everything using (what they think is) a clever acronym. it's become a virtual plague in most technical disciplines over the past two or three decades. I think it's directly related to the desire by scientists to somehow become famous beyond their little niches.

It's gotten so bad that, much of the time, the spelled-out names don't really even make sense.

--
#DeleteChrome

Blame it on the tool ... by Anonymous Coward · 2016-08-23 17:16 · Score: 0

... instead of accepting that you are too incompetent to verify the results.

Don't knock it by Tablizer · 2016-08-23 17:18 · Score: 1

...mutations propelled life.

--
Table-ized A.I.

Not even in top 10 mistakes by burtosis · 2016-08-23 17:19 · Score: 3, Funny

Just do a google scholar search for large hardon collider. None of them will ever live that down, doubly so when it's the title

Re:Not even in top 10 mistakes by Anonymous Coward · 2016-08-23 23:28 · Score: 0

Ah - spellcheck in action... :-)
Re:Not even in top 10 mistakes by Anonymous Coward · 2016-08-24 00:01 · Score: 0

Just do a google scholar search for large hardon collider. None of them will ever live that down, doubly so when it's the title

Oh, don't be so hard on them.
Re:Not even in top 10 mistakes by judoguy · 2016-08-24 01:50 · Score: 1

Obligatory Seinfeld reference: "Not that there is anything wrong with hardon collisions."

--
Peace is easy to achieve, just surrender. Liberty is much harder get/keep.

Im sure MS would make them a custom app by JosephDoeden · 2016-08-23 17:31 · Score: 1

The project managers who choose excel to enter the data are at fault here. They picked the app AND they did not do the proper quality control. How else can you really see it? It would be obvious at 20%, if that is true, that there were significant errors and the process itself needed to be changed. Instead of you have untold tens of thousands of lost labor hours of data collection for unreliable data in an industry where you certainly cannot settle for data variation. Blaming Excel is entirely lame and it's the same BS lack of accountability that everyone has these days, the same they complain about in their politicians and corporations, while offering none themselves. I see TONS of people use spreadsheets because it formats the data for them. They use it as Word Processor template more than a spreadsheet. This may not be the case here, but the point is you have to use the app within the limits of the app, not expect your needs to magically define the app without an investment of developer time and money to make that happen. I'm sure MS would be happy to work with the needs of an industry that large, wealthy and holding so much potential. Staring the conversation out by blaming their product seems lame. This is what happens when news is all for profit... the comment become the only thing worth read.. and then you realize half of them are just witty bots..........crapception complete

It's just a tool... by ndykman · 2016-08-23 17:35 · Score: 1

And is limited to how well you use it. As the article noted, OpenOffice and LibreOffice will do they same thing as well. They noted Sheets doesn't, but I can't get Google Sheets to handle dates consistently at all sometimes.

This is just an excellent example of what works for a large population of users can be a bad thing for a small set of users. After this paper, I expect the error rate to drop dramatically, given how easy the fix is.

TL;DR by Anonymous Coward · 2016-08-23 17:42 · Score: 0

Some scientists are fucking idiots.

Re: TL;DR by Anonymous Coward · 2016-08-23 18:01 · Score: 0

Otherwise we wouldn't do science, to be honest.

Re:It was user error, not a spreadsheet problem .. by dcollins · 2016-08-23 17:43 · Score: 5, Insightful

It could rather be a conversion error. For example, if you have the original in a CSV file (possibly output from one program) where strings have no lead colon, and then load into Excel or LibreOffice, it will (by default) turn everything it can into a numeric format. One needs to be aware of that and ask that the column be converted to text -- which is easy to overlook if you have a column that's mostly non-ambiguous, but somewhere far below is a single date-like name.

I've gotten hit by this many times with CSVs coming out of our school's learning management system, with long numeric student IDs that get turned into scientific notation in the spreadsheet application. In some sense that's easier to catch, because it will hit a whole column of data at once; but even so it's distressing how often I need to backtrack to resolve that.

--
We know where leadership by an anti-intellectual "strongman" who scapegoats minorities and likes boisterous rallies goes

That makes two things excel is not good at by lusid1 · 2016-08-23 17:49 · Score: 1

Its never been good at math, now we know its not good at presenting data sets either.

Re: It was user error, not a spreadsheet problem . by Anonymous Coward · 2016-08-23 17:58 · Score: 0

Yes, it's quite annoying that acronyms are used, especially in an age where both random access memory and hard disk drive space is virtually unlimited. Even my solid state disk had many gigabytes more space than I need. Furthermore, bandwidth is not at a premium; my internet connection provides over 20 megabits per second through my asynchronous digital subscriber line modulator/demodulator.

So basically by Kokuyo · 2016-08-23 18:05 · Score: 1

That means genetic scientists can't be arsed to format the relevant columns as text?

Re: It was user error, not a spreadsheet problem . by Dog-Cow · 2016-08-23 18:24 · Score: 1

I think you somehow managed to bolster the GP's point through some weirdly-inverted way. Acronyms are a form of compression to make communication easier. Cutesy acronyms whose long forms are twisted to fit the acronym do not ease communication.

Shitty autocorrect in shitty programm ... by Qbertino · 2016-08-23 18:34 · Score: 1

.. fucks up data handled in program. Film at eleven.

Honestly, this isn't really news. I consider quite a few pieces of contemporary software on the brink of unusable.
That also includes modern Word processors, Excel and the Facebook UI. (I use facebook sporadically with a spoof account)

A software that tries to think for me without communicating this, that tries to babysit me (remember Clippy?) is bound to be somewhere between extremely annoying and dangerous, depending on the situation you want to use it in.

The first thing you do in Excel is turn off autocorrect entirely. I don't know if that works, but it should, otherwise Excel is more than worthless.
And, of course, in a spreadsheet autocorrect should be disabled by default. In fact, autocorrect should only run when users specifically tell it to, wether in a word processor, spreadsheet or whatnot - that's my opinion anyway.

This is not the first time that a modern Spreadsheet - or shall we just call it by it's name? - Excel has screwed up scientific data or other more-or-less critical data big time.

Excel is a powerful tool often used incorrectly or in situations where it simply doesn't fit. It's a highly complex tool and if it's got rounding errors or similar problems and doesn't tell you how to handle them when they arise with big numbers then somebody has built a shit product. Which with MS would not really a surprise to be honest.
I'd say a modern spreadsheet should warn you if you're using numbers that are to large for the programms math engine to handle without rounding errors or such. And you probably agree.

Conclusion:
I do see fault in the scientists. After all, they should know the tools that they are using and not mess around like some n00b. Although it was probably some unpaid and overworked doctorate who entered the data and there's little to blaim those. ... But I do also see the fault with MS and their mediocre software that has only been banking on it's quasi-monopoly for the last 18 years.

--
We suffer more in our imagination than in reality. - Seneca

Re:Shitty autocorrect in shitty programm ... by bickerdyke · 2016-08-23 22:57 · Score: 1

A software that tries to think for me without communicating this, that tries to babysit me (remember Clippy?) is bound to be somewhere between extremely annoying and dangerous, depending on the situation you want to use it in.
"Hello! It seems you just imported a long list of names, but some dates and a few numbers slipped into what otherwise seems to be text." Would haven be the kind of "babysitting" that told the user that some values showed an "anomaly" (compared to the others) and would either prevent mistakes (table header slipped into data rows) or reminded the user to set the data type for a column.

--
bickerdyke

One of the most annoying features in a great tool by LostMonk · 2016-08-23 18:42 · Score: 1

Excel's auto-formatting causes me no-end of grief.
I receive technical sheets of tools from clients, and if you don't know how to process them, Excel will trim leading zeros from manufacturer part numbers and abbreviate long numbers to scientific notation and will convert sizes marked in fractions into dates . . . and in general turn data into meaningless mess.
And before you jump and yell at me that Excel isn't the right tool for data management - I know . . . There's no avoiding it, everyone out there uses excel.

I know all the moves and switches and tricks to avoid that, but for the life of me, I don't understand why isn't there a simple switch in the settings that will disable all that auto-formatting crap once and for all.

If you're using Excel you're doing it wrong by Anonymous Coward · 2016-08-23 18:42 · Score: 0

My favorite saying at work: If you're using Excel you're doing it wrong!

It's a gross generalization, but 98% of the Excel usage I've seen in in appropriate. I've been in a number of work places where XL has been used for stage gating designs, keeping track of stock, staff time sheets etc. Everyone of those usages are better served by a tool built for the job.

The other 2% has been used for plotting/analysing recorded data. I'll accept that Excel is probably the best/easiest tool for that!

Re:If you're using Excel you're doing it wrong by bickerdyke · 2016-08-23 23:08 · Score: 1

98% of the Excel usage I've seen in in appropriate.
But for 98% of THAT, the "appropriate" tool would have been a database.
Which usually requires a dedicated database server (we DO want to do it right after all this time), a DBMS including team for operating and maintanance, complete knowledge of the database design beforehand (in research?) and admin resources to set up the database.
So we blew out thousands of $ and haven't stored a single line of data yet. But at least we did it right.
And we still don't have a useable frontend for data entry and reporting, and can't send out the data to a reviewer in a different organisation in a file they can open as they would an Excel file
All in all, doing things the the "appropriate" way is wishful thinking. So people are going for 2nd best alternative to a proper database, but Access is not installed for various reasons. And that's why they end up with Excel. sad, but nothing you can blame the user for.

--
bickerdyke
Re:If you're using Excel you're doing it wrong by Anonymous Coward · 2016-08-24 01:17 · Score: 0

Which usually requires a dedicated database server (we DO want to do it right after all this time), a DBMS including team for operating and maintanance, complete knowledge of the database design beforehand (in research?) and admin resources to set up the database.

Um, why? If Excel can handle the data and if you feel your data is safe enough using the normal methods of file storage using Excel then you may as well be using Access just the same and it comes with the standard Office license.

There isn't a single reason to use Excel over Access for this kind of thing aside from not knowing how to use Access. I can understand why someone would scoff at Access but the reasons don't apply if you're trying to use Excel as a DBMS. And if someone came to me with data that they wanted brought over into a professional DBMS I'd much rather they hand me a mdb file instead of a xls.
Re:If you're using Excel you're doing it wrong by Anonymous Coward · 2016-08-24 03:48 · Score: 0

98% of the stuff people think they need a database server for could be done with mysql, and 98% of that could be done with sqlite. There's a reason those things in databases are called "tables".
The problem with "simple" database front ends like Access or Base is that they're plagued with creeping (nay, galloping!) featurism, because acquisition decisions tend to be based on how many "ooh, shiny!" features a thing has rather than how easy it is to use day in and day out. (Just remember, every feature is one more thing that can break -- confuse the user, introduce bugs, and/or increase the attack surface.)

No problem by Anonymous Coward · 2016-08-23 18:46 · Score: 1

Nobody reads them anyway

Software trying to be too smart by WaffleMonster · 2016-08-23 18:53 · Score: 4, Insightful

Type inference in excel has wasted countless hours of my time trying to make sense of corruption caused by third parties using excel. Has gotten to the point where we actively recommend people avoid excel when handling any data they care about. I do fault excel itself because these errors are pervasive. They could have better structured the data imports or made them less creative or asked users for more feedback or have the import do a pass over the entire datasets checking for outliers that may suggest a different type.

When a critical mass is "doing it wrong" becomes pointless and counterproductive in the real world to continue to point fingers at users. Tools are supposed to be useful and if they tend not to be then that's on them.

Re:Software trying to be too smart by drinkypoo · 2016-08-24 00:17 · Score: 1

Type inference in excel has wasted countless hours of my time
No, no it hasn't.

trying to make sense of corruption caused by third parties using excel.
So close and yet so far: your own comment explains that it was caused by third parties using excel... badly.

I do fault excel itself because these errors are pervasive.
No, it's because you're ignorant. Every spreadsheet has exactly the same problem. They are designed to work on numbers, first and foremost. If you want to work on text in them, you have to take additional steps.

They could have better structured the data imports or made them less creative or asked users for more feedback or have the import do a pass over the entire datasets checking for outliers that may suggest a different type.
They could have, but then they could have easily got it wrong and made people angry in that way. Instead, they are being consistent, and always doing it the same way.

When a critical mass is "doing it wrong" becomes pointless and counterproductive in the real world to continue to point fingers at users. Tools are supposed to be useful

Spreadsheets are very useful for doing the job they are meant to do. The problem here is users doing something stupid with a spreadsheet. They should be using a database. You can whip up a web database with CSV import and export from a CMS in a few minutes' time just by clicking. It will enforce data types and it won't mangle anything. They're not even doing spreadsheet-type manipulations, so there is no value to using a spreadsheet. You don't redesign screwdriver handles because some people are using them as hammers.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:Software trying to be too smart by Anonymous Coward · 2016-08-24 00:55 · Score: 0

Even tools require training to use. Or at least some basic instruction, which most Office users I see in the workplace don't get.
Does it come up in college programs these days? The workplace doesn't train either, so people just have Excel and Word and Outlook dumped on them and everyone assumes that it's fine. But that's how I end up getting requests to redo a SQL query because someone wanted the data in a different order in the spreadsheet that I submitted.
Re:Software trying to be too smart by WaffleMonster · 2016-08-24 05:27 · Score: 1

No, no it hasn't
Yes it has.

So close and yet so far: your own comment explains that it was caused by third parties using excel... badly
I only care about results not excuses. If I see a pattern of bad data due to avoidable fast and loose type conversation when people use a certain tool I am going to recommend people not use that tool. I don't give a fuck what they did wrong or how someone chooses to characterize the mistake. The only thing I care about is outcomes.

No, it's because you're ignorant.
Your right everyone who makes these mistakes is ignorant. So now that we've established pervasive ignorance and blamed the user what good has come from this exercise?

Every spreadsheet has exactly the same problem. They are designed to work on numbers, first and foremost. If you want to work on text in them, you have to take additional steps.
My comments are not about spreadsheets themselves they are explicitly about interface between spreadsheets and external data. This is not about the value proposition of loose typing within a spreadsheet.
A frequent problem is people directly opening up CSV files and the like or cutting and pasting to excel rather than using the import tool. When you do this there are no questions asked and excel assumes whatever it feels like subject to it's own whacky interpretations. This could be avoided up front with better UX.

They could have, but then they could have easily got it wrong and made people angry in that way. Instead, they are being consistent, and always doing it the same way.
Statements that cannot be falsified convey no useful information.

Spreadsheets are very useful for doing the job they are meant to do. The problem here is users doing something stupid with a spreadsheet. They should be using a database.
Where are you getting information errors expressed by TFA stem from using spreadsheets for purposes they are not suited?

Re:It was user error, not a spreadsheet problem .. by dbIII · 2016-08-23 18:56 · Score: 0

You again? Well I suppose after the "beige box is a hard drive" type definition of an operating system you'd be here to spread other bits of weirdness normally dispelled by computing 101.

Here is why for those without the background.
Typed variables in a database completely eliminate this issue.

If the users had used a database, even one of the MS ones (one of which newbies can deal with within a week), that would have solved it which was the poster above's point that you were unable to grasp.

If the researchers couldn't manage this do you really think they could have used a database

Of course they fucking could - how condescending can you get? These are scientists not the "beige box is a hard drive" crowd.

Re:It was user error, not a spreadsheet problem .. by Anonymous Coward · 2016-08-23 19:08 · Score: 0

That was the best post of the thread! Combined appropriate experience and practical advice on an often overlooked element of data management.

Re:It was user error, not a spreadsheet problem .. by l0n3s0m3phr34k · 2016-08-23 19:11 · Score: 1

Or maybe ctrl-A, right-click, Format Cells, "text". Then Excel won't change anything around, and will treat all cells as plain "text" and won't convert anything. Takes maybe 5-10 seconds.

A minor ephiphany by vtcodger · 2016-08-23 19:50 · Score: 5, Insightful

Without diminishing the other comments, it crossed my mind that the issue here is probably not whether Excel was used in the research. It's one of getting backup (supplementary) data into publishable form. That'd occur after the authors(s) had written their paper using their normal toolset for their work and gotten the paper through review. At this point, they are supposed to package up their data in some format dictated by the journal they are publishing in. Apparently .xls is an acceptable format -- which is not irrational. The format is documented and widely supported.

Anyway, the authors are just cleaning up and getting on with their lives -- cleaning the glassware (if any), paying any bills, archiving their data and scripts, returning borrowed equipment, etc. They are going to convert their data to .xls using whatever quick and dirty tool they can find. I doubt they are going to type tens of thousands of genome codes in manually. They'll use some tool they got from a buddy or write something themselves in Perl or Python or whatever scripting language they know. And they'll check the output to make sure that Excel loads it and that it's about the right length and that the first page or so and the last page look reasonable. And off it goes.

I don't think most folks outside of IT (and probably most in IT) are all that aware of Excel's flaky and sometimes bizarre data conversions. And, assuming that there's an unambiguous one to one translation between gene codes and excel mangled gene codes, this probably isn't a big deal. Anyone using the archived data will scratch their heads, maybe ask around, figure out what's happened, fix the data, and get on with THEIR research.

--
You can't see ANYTHING from a car, You've got to get out of the goddamned contraption and walk...Edward Abbey

Re:A minor ephiphany by Anonymous Coward · 2016-08-23 22:45 · Score: 1

What scientist in their right mind would ever use Microsoft Excel for data analysis? If the journals require a lingua franca data format for the data from the study use comma-separated-values (csv), not a spreadsheet. There is enough fraud in scientific and academic research these days, we should not accept inferior and blatantly inept tools as part of the process.
Re:A minor ephiphany by Anonymous Coward · 2016-08-23 23:01 · Score: 0

Anyone that has worked with an inventory would know.
Specially when items disappear due to the changes in serial/model numbers.
Re:A minor ephiphany by staalmannen · 2016-08-23 23:15 · Score: 2

Since I am in biomedical sciences I probably can answer this : most are pretty computer illiterate and actually use MS Office for everything. Entering large data-sets in the spreadsheet is done by copy-paste and not by fancy scripts. In my lab, I am the only linux user, which sort of makes me have to resort to LibreOffice/Zotero for collaborative writing instead of doing LaTeX.
Re:A minor ephiphany by Gilgaron · 2016-08-24 00:28 · Score: 1

Excel is in the standard system image provided by IT, and laboratory software is, as a rule, pretty terrible and can have annoying licenses. I'd much rather use Excel than the software that comes with the equipment. If we do anything often enough for it to be worthwhile, we're lucky enough to have software developers that will code it up in C#, but for piddly stuff Excel is more flexible and easier to use than dealing with procuring, licensing, and training everyone on anything else.
Re:A minor ephiphany by umafuckit · 2016-08-24 00:55 · Score: 1

Apparently .xls is an acceptable format -- which is not irrational.

Yes it is. Why not CSV? These are just data tables.

--
soylentnews.org
Re:A minor ephiphany by Applehu+Akbar · 2016-08-24 01:32 · Score: 2

What's wrong with simply preformatting the Name column as text?
Re:A minor ephiphany by vtcodger · 2016-08-24 01:33 · Score: 1

Well, I'd probably use csv. But few folks outside of IT have any idea what it is. And, in fairness xls does allow formulas and allow plots to be specified. If your journal publishes other kinds of papers besides those dealing with lengthy lists of gene codes, maybe xls is a suitable one-size-fits-all data archive format ... maybe ... I guess ...

--
You can't see ANYTHING from a car, You've got to get out of the goddamned contraption and walk...Edward Abbey
Re:A minor ephiphany by Lord+Crc · 2016-08-24 01:46 · Score: 1

Why not CSV? These are just data tables.
We get Excel files, we ask for CVS files because of the same issues mentioned in the article, we get Excel files saved as CVS files... with of course the issues mentioned in the article.
Re:A minor ephiphany by neilo_1701D · 2016-08-24 01:49 · Score: 1

What's wrong with simply preformatting the Name column as text?
You would be shocked at the number of people I encounter who have no idea that is even possible...
Re:A minor ephiphany by MrLint · 2016-08-24 02:47 · Score: 1

Ummm so a CSV would be find for actual data storage, but you still need something to actually *do* the analysis. You have conflated a tool with the data storage format.
Re:A minor ephiphany by rbrander · 2016-08-24 05:54 · Score: 1

Thanks for that. The damning statement is how all these people that the rest of us regard actually more highly than rocket scientists - who haven't put anybody on the moon lately, and biomedical scientists could save our lives - are "computer illiterate".
There was this time when the excuse for being computer illiterate was age; the dang things just came up on business too fast. But now I'm the retired one, explaining simple Excel things to people 20 years younger. These "biomedical researchers" are mainly under 45, that is, had computers since Jr. High and Windows since college; they've had Excel to study for 20 years, all their careers.
I saw it with engineering - I was the formal IT guy for 7 years, then switched to become one of the engineers, albeit the local power-user and covert developer. I had expected to become obsolete as I aged, overrun by the superior expertise of people who grew up with computers, programming in elementary school. And there was ONE hacker, 20 years my junior, who could outstrip me on complex bits of configuration and development - and oddly enough, he had become a techie while a biomedical technician, writing Perl scripts to parse endlessly long DNA strings. But then there were nearly 100 engineers in the same company that would make the most eye-rolling mistakes and never even try to learn any underlying understanding of why the spreadsheet does certain things.
Over and over and over, I would correct something and try to teach some basics, but be put off with a request to just fix that exact problem, they were in a hurry. Not infrequently, they would be back in six months, asking me to do it again, "I forgot, I'm sorry, what was that again?" The uptake on a little bit of real instruction on the 2nd go-round was better, but still not 50%.
Poor understanding of how to use computer applications is still the greatest barrier to using computers to improve productivity.
Re:A minor ephiphany by oh_my_080980980 · 2016-08-24 06:21 · Score: 1

You do realize this is Excel's default behavior and not something you can disable. IT'S THE FAULT OF EXCEL FOR DOING IT IN THE FIRST PLACE!

Jesus people stop dick sucking Microsoft. No one asked for Excel to do this. Only ass-holes in their infinite wisdom thought this would be a good idea.
Re:A minor ephiphany by thegarbz · 2016-08-24 07:04 · Score: 1

we should not accept inferior and blatantly inept tools as part of the process.
Inferior is in the eye of the beholder. Just like I fire up calc and not matlab when I want to add two numbers there's a hell of a lot of data analysis you can do in excel easily and faster than any other tool that isn't a standard part of every computer within arms reach.
Re:A minor ephiphany by thegarbz · 2016-08-24 07:09 · Score: 1

What has data storage got to do with the tool used to analyse the data?
Sub Question: If someone doesn't understand how to take care of data-types in excel, what makes you think they will understand how those data types are changed when you save the file as CSV, especially since you will get a warning message saying "features" used in the document won't be saved.
If you use excel the only sane thing to do is save as an XLS file. The only sane way to share data is in this form. Converting to CSV introduces exactly the problem that the summary is talking about, conversions.
Re: A minor ephiphany by Anonymous Coward · 2016-08-24 08:28 · Score: 0

When you say .xls, you probably mean .xlsm, which are xml files, and far more robust than csv. So what's the problem?
Re: A minor ephiphany by Anonymous Coward · 2016-08-24 15:26 · Score: 0

what? understanding the tools of your trade? thats so 1800's.
the new norm is that management doesnt understand people, devs dont understand build tools, and scientists just enter data from one machine to another.
they should have checked if there is any correlation between shitty provided data and research quality.
Re:A minor ephiphany by Coren22 · 2016-08-25 05:40 · Score: 1

This error actually highlights this problem. This isn't a problem with Excel, it is a problem with the keyboard chair interface (user error for non IT people). If you change the data type of the field to Text, it will correct the entire issue. Excel is merely helping you with data entry, it can't help it that the short name of certain genes look like scientific notation or dates, it is a function of the program.
Funny, I went to test this out, and copied and pasted 2310009E13, which for some reason, Chrome (or Excel on the backend?) converted to scientific notation on copy. I had to do Paste Special in Excel to paste that entry into a Text field. The other issue I saw is that if the field was General when you typed it, it is already too late, it has to be Text when you paste the data or it converts it to date notation in text as well.

--
APK likes to ask for responses to the same things over and over. Maybe he just likes the responses?
Re:A minor ephiphany by crispin_bollocks · 2016-08-25 11:08 · Score: 1

Presumably the cells contain the original information. If so, it's just a matter of formatting the columns to display properly. First encountered this 20-plus years ago when Excel was popping up dates for my numerical data. That's when I learned about Julioan dates!
Re:A minor ephiphany by Zxern · 2016-08-25 13:14 · Score: 1

Actually many people asked for this functionality.
Just because you don't know the proper way to load data into excel (import) not simply open, is not the fault of the program, but the user.
Re:A minor ephiphany by cwsumner · 2016-08-26 03:41 · Score: 1

There are Engineers, and then there are engineers. I am one type, but I have met plenty of the other.
It's probably true in many fields of endeavor...
Re:A minor ephiphany by nobodie · 2016-08-28 12:34 · Score: 1

I am doing some numbers for a small piece of research and using SPSS Statistics. Once I have finished what I need to do it's time to export and.... the only choice is Excel, not because of the software, but because the receiver doesn't know how to use anything else. People at work think I am old (correct) cranky (not really, just opinionated) and crazy (well, ok) because I keep telling them the same thing: FOSS or give up control of your information. Once it is in MS formats it belongs to MS and they can do whatever they like with it. To keep control over it you need free formats, it really is that simple.

--
Subversion of spatial scale luxury decoration ideas.
Re:A minor ephiphany by Nunya666 · 2016-08-31 02:14 · Score: 1

What's wrong with simply preformatting the Name column as text?
You would be shocked at the number of people I encounter who have no idea that is even possible...
Same here, and I work with accountants every day. Most of them know how to use Excel, but they only use what others have built. Most don't understand formulas, and nested formulas absolutely baffle them.

I am still shocked at how often accountants ask me for help with an Excel problem. Questions about complex problems make sense, but I'm talking about simple formulas, or just getting data from one tab/sheet to another tab/sheet.

Their lack of knowledge is pretty good job security for me, but wow.

Enough /. for a while. Back to VBA and automating another XL workbook.

Buy Microsoft Problem for every solution by Anonymous Coward · 2016-08-23 20:01 · Score: 0

But really, if you've been using Excel that extensively, haven't you learned its quirks by now?

1. If entering text data, select the whole column and set its type. Or type a single quote (') before each text value.

2. If importing a CSV file, just make sure each text value in the file is ended with a tab (ASCII 9) character.

Re:It was user error, not a spreadsheet problem .. by Phillip2 · 2016-08-23 20:10 · Score: 2

The problem is that the not all tools require a ' to stop this behaviour. And, in fact, adding this may well break these tools.

Ultimately, this is not a new problem. It was first noted about a decade ago in yeast (which uses a lot of very date like gene names). It's a bit depressing it's still happening.

I'm unconvinced that this can be classed as a user error, though. Excel is using a heuristic to determine the data type of a field (probably on a per cell, not per column basis). And that heuristic is failing.

Re: It was user error, not a spreadsheet problem . by Phillip2 · 2016-08-23 20:13 · Score: 2, Interesting

No, this would be a disaster. The acronyms have the advantage that they are, relatively, semantics-free., If we turned them into long hand, then they would describe the gene in some way. Which means that the descriptions in the knowledge would go out of date, or would have to be changed. It's a recipe for instability.

It is very easy to laugh at biologists and think that you know how to manage data better than they do. In some cases, you may be right, but in this case it is not so. Identifiers are there to identify and not describe. This is something we learned with Linneaus, and have stuck with since.

Have no fear!!! by Vinegar+Joe · 2016-08-23 20:25 · Score: 1

Climatologists don't use Excel.

--
"The average reporter we talk to is 27 years old......They literally know nothing." - Ben Rhodes

Re:Have no fear!!! by Vinegar+Joe · 2016-08-23 20:28 · Score: 1

My mistake! They *do* use Excel.
http://regclim.coas.oregonstat...

--
"The average reporter we talk to is 27 years old......They literally know nothing." - Ben Rhodes

Financial 'Industry' and generally too. by hughbar · 2016-08-23 20:54 · Score: 2

See http://www.forbes.com/sites/sa... for example. It's endemic. They'll probably derail civilisation as we know it eventually, bridges falling, weapons launched. Then, when we're back to the caves/trees and eating nuts and berries (not a bad life outside the cubicles, really), we will curse the evil god Ex-cel and provide blood sacrifices on altars (inscribed with A1, C2 etc.) to keep him away?

--
On y va, qui mal y pense!

Re:It was user error, not a spreadsheet problem .. by Anonymous Coward · 2016-08-23 21:04 · Score: 0

You'd be surprised how much overlap there is. Expertise in one domain is absolutely not correlated with expertise in another domain.

See also: (Medical) doctors. I'm sure just about everyone has been to a doctor that is gallingly close to computer illiterate.

Re:It was user error, not a spreadsheet problem .. by bickerdyke · 2016-08-23 22:04 · Score: 2

Which is too late if the damage has already been done. And if the first 2000 lines look ok, no one is going to do that because, hey, the auto import worked!

--
bickerdyke

Re:It was user error, not a spreadsheet problem .. by OneSmartFellow · 2016-08-23 22:06 · Score: 5, Insightful

Oh, yeah, the single quote - so obvious that the solution to the problem of Excel's stupid, stupid, stupid default behavior of *silently modifying your data* is to put a freaking Single Quote character in front of every cell data element.

Why can't the default behavior be that the data is just not modified, at all, unless I tell Excel I want it to be.

goddamn clippy by Anonymous Coward · 2016-08-23 22:06 · Score: 0

This is all your fault clippy!

Re:It was user error, not a spreadsheet problem .. by bickerdyke · 2016-08-23 22:10 · Score: 1

Here is why for those without the background.
Typed variables in a database completely eliminate this issue.

If the users had used a database, even one of the MS ones (one of which newbies can deal with within a week), that would have solved it which was the poster above's point that you were unable to grasp

And how would that helped when publishers requested their supplementary data in Excel format?

They are scientists and more likely than not the did use some tools for scientific number crunching ("R") or similar that kept and processed data in typed variables or even kept the raw data in an actual database. But you can't ship your Oracle server to reviewers by email and in lack of a document-like database format, peope are turning to Excel.

--
bickerdyke

Re:It was user error, not a spreadsheet problem .. by dbIII · 2016-08-23 22:35 · Score: 1

And how would that helped when publishers requested their supplementary data in Excel format?

Did they do that? Obviously you do not know and are making things up. Why bother to lie over something so trivial?

Unsurprising... by Anonymous Coward · 2016-08-23 22:35 · Score: 0

Given that Microsoft's target audience in every case is composed of business fucks, the issue is unsurprising.

Re:Absolutely not limited to scientific publicatio by bickerdyke · 2016-08-23 22:41 · Score: 1

then you only need a way to attach that database to that email with the excel file, too....

But yes, that would be the proper way. But we would still need a lightweight database-as-simple-document format. Heck, that even COULD BE Excel with a special type of data-worksheets in a document that enforce data structure and do not allow formatting. For all teh use cases where you don't need the performance or multi-user or transactions/data integrity/replication of an actual database server.

The task of mailing out a file with database-data should be much simpler.

--
bickerdyke

Introduction of random gene mutations? by Anonymous Coward · 2016-08-23 22:58 · Score: 0

It sounds like Excel is introducing random gene mutations into the genome. It would be interesting to see how these mutations are manifested and passed along from generation to generation.

Oooh! I think I may have the topic of my next research project!!!!

Re:It was user error, not a spreadsheet problem .. by Anonymous Coward · 2016-08-23 23:11 · Score: 0

It is well known for FAR longer than 10 years ago.

Back in 1998/99 inventory lists were corrupted the SAME WAY.

Misusing a tool is an old failure. This one may not be the researchers - though they are supposed to review the pre-publication documents, that still doesn't mean the publishers didn't screw it up for the final print.

Re:It was user error, not a spreadsheet problem .. by Christopher+Fritz · 2016-08-23 23:35 · Score: 1

The single quote won't help when pasting in multiple cells of unformatted data, however. One should always change the column's data type from something other than "General", such as to "Text", before pasting in unformatted data.

Test-driven Excel by trawg · 2016-08-23 23:35 · Score: 1

I'm a relative newbie to Excel but the first thing I learned was never to trust any of the cells where any calculations are performed.

As my spreadsheets got more and more complex I quickly realised small errors in one worksheet could manifest themselves in really ugly - but very subtle - ways. One simple-looking calculation on one worksheet could blow out an entire model if it there was even a small typo.

I suspect many people using Excel haven't learned this lesson yet. I was lucky that I noticed it myself before learning the hard way.

My solution was to have entirely separate worksheets where I would basically apply rough/simple TDD principles - have some known quantities and results in any complex calculation sections and make sure they were clearly visible at all time. That way as the spreadsheet evolves it can help you catch small errors before they ruin their day.

I'm sure pro Excel people have many more useful tricks.

So 1/4 of all papers, huh? by Anonymous Coward · 2016-08-23 23:51 · Score: 0

So a fourth of all the papers, huh?

(I used Excel to convert to a fraction, for those people who don't understand percentages.)

Re:It was user error, not a spreadsheet problem .. by Big+Hairy+Ian · 2016-08-23 23:55 · Score: 1

Proper use of Schema.ini or renaming the file to .txt importing it and manually specifying which columns are text, date & numeric. Basically as far as the csv imports concerned it's the ODBC driver that's being too helpful

--

Build a Man a Fire, and He'll Be Warm for a Day. Set a Man on Fire, and He'll Be Warm for the Rest of His Life.

Submitted without comment by ThatsNotPudding · 2016-08-24 00:01 · Score: 1

What kinds of major errands have gotten in, then, if basic spot checks are getting failed?

Re:Submitted without comment by Anonymous Coward · 2016-08-24 03:27 · Score: 0

I applauded that one in my head.

Mod Parent UP! by Anonymous Coward · 2016-08-24 00:07 · Score: 0

This is user error. The failure to format a cell or column is not a failure of the application, it is a failure of the user.

The exact same issue occurs if the data is entered into Open/Libre Office Calc. If you don't format the column and tell the application what type of data it is, then the application tries to be helpful and formats what are most often date abbreviations as dates.

SUrely, on a site full of techies and programmers, the concept of type declaration isn't so surprising,

Re:It was user error, not a spreadsheet problem .. by drinkypoo · 2016-08-24 00:09 · Score: 1

One needs to be aware of that and ask that the column be converted to text -- which is easy to overlook if you have a column that's mostly non-ambiguous, but somewhere far below is a single date-like name.

My process for importing into a spreadsheet includes always making sure to check the data type for each column. If you do that, then it is very difficult to overlook.

The basic problem here is not that the tools function as designed, which is actually very useful. I use spreadsheets fairly regularly for mundane list-mangling tasks. The problem is people not knowing how to use the tools. You see this everywhere you go in academia. Literally every position on a college campus typically now requires familiarity with Microsoft Office, and just about every college campus has got a class in Office, yet an IT pro working at a college will spend a significant amount of time answering stupid questions about Office that anyone with even passing familiarity could answer, and which can trivially be answered with the help system. And you can identify the problem, and communicate it to their superior, and they still won't wind up enrolled in the Office class. They got the job on a fraudulent basis (claiming knowledge they didn't have) and the school would rather not educate instructors to bring them up to the requirements for some reason, probably because they'd have to pay them to attend the class which would qualify them to have the job they've already been given.

Perhaps before you're allowed to make a scientific paper, you should be forced to learn how to use a computer.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"

So....biased as usual... by Anonymous Coward · 2016-08-24 00:11 · Score: 0

...because it's not really Excel that is the problem. It's the user. They don't format the cell properly to accommodate the data.

Re:It was user error, not a spreadsheet problem .. by drinkypoo · 2016-08-24 00:11 · Score: 1

Why can't the default behavior be that the data is just not modified, at all, unless I tell Excel I want it to be.

The default behavior is to treat the field as whatever you've told the spreadsheet that it is. By default, every cell is set up for numeric data types. The same is true on a CSV import. In either case there is a simple way to prevent it happening; select a text format for cells in the sheet, or select a text type for the column while importing. The problem is misuse of tools, not a problem with the tool. You wouldn't complain that a band saw is capable of removing fingers, would you?

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"

Re:It was user error, not a spreadsheet problem .. by Anonymous Coward · 2016-08-24 00:30 · Score: 0

The problem is people not knowing how to use the tools.

I disagree. The tool in question, Excel, used to be a tool for accounting, but it is not so anymore, since at least 2007.
The defaults "smart" conversions are harmful to general usage. Watch how regular users go around MS word auto-formatting and you'll see what I mean.
I personally turn off all of the automatic features since they ruin my flow and hamper my productivity. What I see from people who never bother to go to the preferences, is that they just OCD themselves out of the features, by already expecting to delete the automatic text that comes next...

Anyway: wrong tool for the job.

Re:It was user error, not a spreadsheet problem .. by BringsApples · 2016-08-24 00:35 · Score: 1

With things in the IT world ever-changing, it's difficult even for IT professionals to keep up sometimes. I imagine that scientists are only using the tools available to them in the best way they know how. I seriously doubt that much of their time is spent keeping up with whatever new shit excel implements. In fact, when I think of who gets shit on the most by the constant changes to the programs that we all use, it's scientists. We all know that probably most scientists have the brain-power to understand/adjust to changes in the programs that they use, but they probably don't have the time.

--
Politics; n. : A religion whereby man is god.

Re:It was user error, not a spreadsheet problem .. by Gilgaron · 2016-08-24 00:39 · Score: 1

You aren't going to send a publisher your raw data in its original format... Excel is a super common way to transmit a subset of data from a database. There are new standards coming out next year for file formats for some sorts of studies. It should alleviate the issue some.

Mod parent up. Sloppy editing of science articles. by Futurepower(R) · 2016-08-24 00:39 · Score: 1

Wow! I'm amazed that an organization could make the mistake of calling the hadron collider a "hardon collider".

? Why Excel? by Anonymous Coward · 2016-08-24 00:41 · Score: 0

And why aren't the graduate students ruining their eyesight checking the data attachments for errors? That's their job.
Calculator ( a program ) has normal, programming, and scientific modes.... why can't Excel?

Makea for great click-bait headline by Anonymous Coward · 2016-08-24 00:48 · Score: 0

"Excel Is Mutating Our Genes!"

Re:It was user error, not a spreadsheet problem .. by jbmartin6 · 2016-08-24 00:50 · Score: 1

Yes! My thought on the article was, what is the problem? It's not like Excel alters the underlying data, all you have to do is correctly change the column type. This is just a display problem, not even a conversion problem. If someone is taking the data into some other tool it will be correct.

--
This posting is provided 'AS IS' without warranty of any kind, implied or otherwise.

Who cares about HUMANS? by Anonymous Coward · 2016-08-24 00:54 · Score: 0

Machines will rule the world, resistance is futile.

Auto conversion is the bane scientists by Bohnanza · 2016-08-24 00:59 · Score: 2

Once we ran all our British chemical patent info through software to convert it to good old-fashioned American English. All instances of Tungsten Carbide (WC) were converted to "Toilet".

--

-----

Sorry, I'm only a 1336 h4x0r.

Re:Auto conversion is the bane scientists by Anonymous Coward · 2016-08-24 09:25 · Score: 0

During my PhD, forced to use Word by coworkers who could not learn Latex, Word regularly changed the name of my co-author from 'Shi' to Shit.
Re:Auto conversion is the bane scientists by Anonymous Coward · 2016-08-25 09:02 · Score: 0

Next time, use Wolfram Carbide. Even a non-English speaking English translator software can understand that.

Re:It was user error, not a spreadsheet problem .. by OneSmartFellow · 2016-08-24 01:05 · Score: 1

If I open a new spreadsheet (without a template) the default behavior is to convert data types to randomness. Some numbers are interpreted as dates, and so on.
Not only that, but frequently, having converted it, I am unable to reset. Instead Excel will display #ERROR or some such nonsense, even though clicking on the cell will show the original data in the toolbar area.
Please tell me you understand why this is a design flaw. The default behavior for an application like Excel should be to preserve data, not convert it. Just as the default behavior for a Word Processor should not include making decisions about how I want text formatted without me explicitly formatting it - changing font, indentation, line spacing, etc. Yet, Microsoft Word does this too, and then doesn't even include an option to show me what formatting codes are responsible, so that I can remove them, and restore the text to un-formatted.

This is simply crap programming from MS, as per usual.

For heaven's sake by dhaen · 2016-08-24 01:13 · Score: 1

Since the beginning of Lotus 123 I've been entering "just text" into cells, either by formatting or originally by adding an apostrophe in front of the text. Just use the tools you've been given, properly.

Re:It was user error, not a spreadsheet problem .. by allcoolnameswheretak · 2016-08-24 01:23 · Score: 1

It's not a conversion or a user error. The Excel sheet is applying random mutations on the data. It's called evolution. Look it up, buddy.

It's actually 19.4% by radish · 2016-08-24 01:32 · Score: 2

But they used the wrong rounding mode.

--

---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"

Excel - Where Good Data Goes to Die by Anonymous Coward · 2016-08-24 01:40 · Score: 0

While this is not the best example, for a long time my colleagues in computational biology and bioinformatics have found researchers that use spreadsheets as databases, or as a sort of hybrid database-data exploration environment, the bane of our existence. There's not just issues going from CSV to Excel and back again, but auto-corrections that make subtle changes, peculiar date behavior, not to mention what becomes awkward habits of structuring information. The fact that so much information is locked up in a crude assembly of semi-formed spreadsheets distributed across hundreds of desktops with cryptic titles and versions is a nightmare.

Excel is where good data goes to die.

wot, me troll much? by Anonymous Coward · 2016-08-24 01:41 · Score: 0

What almost 200 comments and not a single :

grabble grabble climate change grabble rabble corrupt datasets fnerrr, I told ya it wasn't a real effect

Re:wot, me troll much? by Anonymous Coward · 2016-08-24 02:22 · Score: 0

The deniers are down to a very small, very insane group. It was always driven by cult-like belief in far-right-wing con-men, but the con is a lot harder to maintain as ice continues to melt.

Re:It was user error, not a spreadsheet problem .. by jbengt · 2016-08-24 01:44 · Score: 2

This is just a display problem, not even a conversion problem.

No, it is a conversion problem.
True, if you make the column text in the import .csv "wizard" , it won't convert.
But if Excel does convert text to a date or to a number, you lose the original text and have to import again to get it back.
Numbers displayed in scientific notation (or any other format) and dates displayed however Excel is set up to display them, are stored internally as just numbers, and the original text is not saved anywhere.

we're onto you, auto-correct! by Anonymous Coward · 2016-08-24 01:45 · Score: 0

Auto-correct is a vast conspiracy fueled by the Illuminati to repress scientific advancement (or any other productivity) by the common folk. Hopefully this is the beginning of the end for auto-correct! Let's build a wall around auto-correct, and make typing great again.

The worst one I had (partly due to auto-correct, partly due to other issues) was when I tried to text my wife "Me and the children are fine. My phone is almost dead so I'll call you later" and what she got was "the children are dead"

Ever hear of proof reading by Anonymous Coward · 2016-08-24 01:45 · Score: 0

So SCIENTISTS don't proof read their scientific findings before publishing? Don't blame the software, where's the due diligence to make sure you know wtf you are publishing?! And no one noticed this before a study?? Is anyone even reading these things? I have to assume there isn't both a
2310009E13 and 2.31E+13 RIKEN identifier. No one said "wow, this doesn't even make sense, there is no 2.31E+13!

Sounds like a bunch of idiots just throw up on paper and no one (writers or readers) really thinks about it and now they want blame software for their mistakes.

Not the first time Office led to published errors by Scoth · 2016-08-24 01:50 · Score: 1

Back in the day, old versions of Word only had "co-operation" in its dictionary, and would autocorrect "cooperation" to "Cupertino". This sounded like one of those urban legend things until I searched Google Books for some common constructs. You'll find hundreds of examples like the cupertino of and Cupertino Between (has some real uses but most aren't)

This still? by thrig · 2016-08-24 01:50 · Score: 1

You should have seen the researcher's face when I told them Excel was (and had been) corrupting his data input (of yes genes). I forget what year this was, somewhere in the 1999-2002 range.

Re:It was user error, not a spreadsheet problem .. by jbmartin6 · 2016-08-24 01:58 · Score: 2

You are right, I stand corrected. I thought that the converted cell would revert back to the raw data if the column type was changed, or if one exported as CSV. A little testing shows I was wrong.

--
This posting is provided 'AS IS' without warranty of any kind, implied or otherwise.

ISBNs, too by Anonymous Coward · 2016-08-24 02:01 · Score: 0

I work in libraries and this is a common problem with tables of books along with their corresponding ISBNs.

HAHA by Anonymous Coward · 2016-08-24 02:05 · Score: 0

Should've learned TeX, bitches!

Re:It was user error, not a spreadsheet problem .. by Yunzil · 2016-08-24 02:14 · Score: 1

Why can't the default behavior be that the data is just not modified, at all, unless I tell Excel I want it to be.
Flag as Inappropriate

Because 99% of the time that's not what everybody else wants.

Re:It was user error, not a spreadsheet problem .. by EndlessNameless · 2016-08-24 02:20 · Score: 1

I'm unconvinced that this can be classed as a user error, though. Excel is using a heuristic to determine the data type of a field

I disagree.

If you're using typed data, you have to cast it as something when it's stored. As far as I can tell, the Excel heuristic hasn't changed much since the 1990s.

With a 20+ year history, it's user error if they do not account for it.

It is dead simple to select a column and set its data type to Text, which is appropriate for gene names and will prevent any alteration of the input.

If there is an option that accomplishes exactly what the user wants and the user does not select that option, well, that's basically the definition of user error.

--

---
According to the latest ruleset, this post should be modded as Vorpal Flamebait +5.

Re:It was user error, not a spreadsheet problem .. by Yunzil · 2016-08-24 02:23 · Score: 1

Please tell me you understand why this is a design flaw.

No, because it isn't.

This is simply crap programming from MS, as per usual.

Nope, it's pretty good programming. They're just not programming for you.

According to my calculations... by Anonymous Coward · 2016-08-24 02:23 · Score: 1

20% is a full 1 in 7 (via Excel 2016)...

Fun with Excel by rickb928 · 2016-08-24 02:24 · Score: 1

Try entering your credit card number into an Excel sheet. Yup, it's unexpected. You Excel mavens know this already, so move on. This is Excel goofiness for dummies.

First, you'll see it in scientific notation. Go ahead, set it Number format, and remove the decimals.

Now, see what ya got, the whole number, right?

No. Your credit card doesn't end in a zero, does it?

Oh, re-enter it, since the zero is there forever. I double-dog-dare you to get that last digit back without re-entering it.

Try a custom format: Make Type 0000000000000000 (that's 16 zeros for those of you in Rio Linda).

Yeah, that fixed it, right? No?

Ok, last time, re-enter the card again, but put an apostrophe in first...

Success?

Microsoft can't yet imagine why anyone would need 16 significant digits. Or more likely the binary needed to do that is more than 8 bits, and so there we are, do we go all 16-bit and drive storage? Or something.

Oh, and that apostrophe? Save the sheet as CSV. Open it in a text editor, replace '4 or '5 with just 4 or 5. Hope you didn't have a string '4 or '5 for some other purpose. Enjoy.

Researchers probably don't proof their spreadsheets for publication.

--
deleting the extra space after periods so i can stay relevant, yeah.

Re:Fun with Excel by BundesSheep · 2016-08-24 03:51 · Score: 1

I wonder why they don't just leave the original contents, as entered or imported, alone and just convert when viewed. Wouldn't that solve the problem?

I wish Excel had custom data types by swb · 2016-08-24 02:25 · Score: 1

And not just data formatting.

It would be nice to be able to define a data type and some rules and limits of progression.

I could see the value in defining an arbitrary data type that was comprised of a fixed set ("Apples", "Pears", "Oranges", "Bananas") with no progression (ie, no set member has precedence or rank) or perhaps some with progression or rank (fetus, infant, toddler, child, adolescent, adult, senior). Cells formatted as belonging to a data type would only accept those values as valid entries, and sorting would apply the set's rules of simple progression if there were any.

It might help for other numeric-based data types, such as IP addresses, where it would be helpful to define rules of progression around some kind of delimiter. If they could only add one new data type, I wish it was IP addresses.

There's probably complex ways of doing this with macro/scripting, but, they end up being complex and one of the main reasons so many people use Excel because it makes it trivial to manage lists. Trivial tasks that get made complex end up being done sloppy.

Hype by MobyDisk · 2016-08-24 02:32 · Score: 1

Is this problem need for alarm by Microsoft or is this a case where more scientists should be double-checking their work?

No.

The data isn't lost. Excel just displays it in an inconvenient way. Highlight column - right click - Format cell - Text - OK. Done. Sheesh.

Re:Hype by Dwedit · 2016-08-24 05:58 · Score: 1

The data IS lost if it was copy-pasted as text from such a formatted sheet into another formatted sheet.

Not entirely a user problem. by moeinvt · 2016-08-24 02:32 · Score: 1

A 20% error rate on 35,000 files isn't entirely a user problem. Yes, the user ultimately has complete control and the issue could have been corrected if the user had carefully verified the data. In that sense, it's a user problem. However, if the tool is so counter-intuitive that roughly 20% of a large sample people make the same mistake, it's Excel's problem too.

I wonder if an aluminum extension ladder analogy is a first on /. ?

Consider an aluminum extension ladder... :-)

Suppose the locking mechanism on the ladder worked properly when the user carefully verified that it was engaged. The user has complete control. If the mechanism was so counter-intuitive that 20% of the users ended up making the same error and falling off, it wouldn't be brushed off as a problem with stupid or careless users. There's no question whatsoever that the manufacturer would be held partially responsible. Hell, if they sold 35,000 ladders and found out that there had been 100 accidents because of confusion about the lock, they'd yank the product off the market immediately and probably face lawsuits.

Re:It was user error, not a spreadsheet problem .. by Anonymous Coward · 2016-08-24 02:42 · Score: 0

Exactly. Why not assume all cells are text and never change anything user entered, unless requested by user?

Cell type could be only evaluated at runtime, for example when there is a formula that assumes a cell to be numeric,
then if user enters letters and presses Enter or clicks to another cell
then an error popup could tell that there is a formula that assumes that cell to be numeric etc.

This is classic MS. Add a feature that is marginally useful to few people but screws up millions and don't make it easy to turn off.
I think even if you find the setting to turn it off it only applies to current Excel sheet and not saved as a general preference.

The never-ending saga of No Editor present by gordguide · 2016-08-24 03:07 · Score: 1

This is just another example of the seemingly complete absence of any Editor at any publication in the modern world. It is the Editor's job to read and select submissions, and have said submissions vetted for a laundry list of errors. The more obvious ones, such as spelling and basic grammar, seem to be absent from a huge number of publishers (online and in print).

If you are a Scientific or Medical Journal, whose business it is to publish Papers and collect advertising dollars or subscription fees (or both), then it's part of the Editor's job to vet errors that are common in Scientific or Medical Papers. Such as this one identified by the article.

If there is but one Editor, then that person is required to complete all the tasks expected of an Editor. If there is money for more staff, then it's the job of the Editor to assign duties to junior employees to do such grunt work, and then it's the Editor's job to monitor these employees and their work. Again, the ultimate responsibility is the Editor.

If you publish online or in print, you need someone, maybe it's you, maybe it's an employee for hire, but someone none the less, to perform this task. It is not optional. Yet for some reason various "publishers" show their lack of skill at their chosen profession.

There has always been bad magazines, journals, and newspapers. Generally they went broke because people would punish this behaviour by not buying whatever it was they were selling (advertising, library subscriptions, individual subscriptions, etc) and they would mercifully dissapear.

Today we can add blogs and online news or science sites to the list. The average internet user has shown themselves to be less discerning than the print reading public of the past, when the dull and stupid simply didn't read anything.

Today everyone from moron to genius reads online, and there is money to be made from serving ads to the idiots of the world, as they can be reached at an economically viable number.

Maybe we will someday return to a publishing world where the poor examples of the art flounder and die. Regardless, if you don't have an Editor or someone performing an Editor's function, you are putting yourself firmly in the lot of the un-dererving of respect. Journals who would like to instead be respected should be vetting submissions for these well-known (amongst actual Science and Journal Editors of competence) errors and correcting them before accepting the paper for publication.

It really is that simple.

Re:It was user error, not a spreadsheet problem .. by Anonymous Coward · 2016-08-24 03:11 · Score: 0

You have hit it spot-on. I'm a grad student in immunology and I see this happening a lot.

"Big Data" experiments such as RNA-seq or proteomic mass spec typically result in raw data in a .csv or tab-delimited format. Importing into Excel under default settings will convert some gene symbols into dates or numbers. It's easy to miss this happening, because only a handful out of hundreds or thousands of gene symbols are affected.

Stupid problem that could be fixed with a little awareness. Data management training is unfortunately not a usual part of academic biology curricula.

Here's a place where the phrase by Anonymous Coward · 2016-08-24 03:27 · Score: 0

"you are using it wrong" is applicable

I mean, really, defined the damn column as TEXT when creating the spreadsheet.

Re:It was user error, not a spreadsheet problem .. by Anonymous Coward · 2016-08-24 03:44 · Score: 0

The default behavior is to treat the field as whatever you've told the spreadsheet that it is.

This is false. The default behaviour is what happens when you haven't specified anything else. To say "the default behaviour is whatever non-default you've set it to" is nonsensical.

By default, every cell is set up for numeric data types.

This is true, but incomplete. The complete statement is: By default, every cell is set up for numeric data types, which means that entered data may be silently modified.

editors by ooloorie · 2016-08-24 04:31 · Score: 1

This is the job of editors and peer review. In fact, it is their primary job.

A peer reviewed paper need not be factually true (editors and peer reviewers can't determine that), but it should be largely free of obvious formal errors, like using a date instead of a technical term.

strictly Excel's fault by ooloorie · 2016-08-24 04:34 · Score: 1

Automatic formatting of spreadsheet data was a mistake. Furthermore, it was a mistake Microsoft could have fixed (and still can fix) by flagging and highlighting autoformatted cells that seem inconsistent with the cells around them.

Re: It was user error, not a spreadsheet problem . by Anonymous Coward · 2016-08-24 04:47 · Score: 0

Data> import text> select csv, choose columns to import as text. Voila no conversion bs. It's ridiculous how many of these "problems" have clearly documented solutions and that a two second search on Google could have easily remedied user ignorance. Jesus Christmas this whole article is basically 20 percent of the scientific community is fucking retarded and can't be bothered to spend a minute learning the software they use. It's not about tech savvy it's just laziness. If you have a problem, do you just say oh well, that's the way it is? Or does the goddamn scientific method just mystically not apply when it involves a mouse and keyboard? I could understand this article applying to government workers at the social security office, but gene researchers? What. The. Fuck.

sloth is eternal by epine · 2016-08-24 04:56 · Score: 1

The default behavior is to treat the field as whatever you've told the spreadsheet that it is. By default, every cell is set up for numeric data types. ... The problem is misuse of tools, not a problem with the tool.

A process of "five whys" applied to the present discussion immediately reveals "default numeric" as bad policy in academic research.

A sane default would be "untyped" or "exactly as entered" which shifts sins of omission into sins of commission, this being far more compatible with the culture and standards of scientific journal publication than what Microsoft originally chose, mainly for the convenience of boutique-reseller power demos. Also, the more collaborative the environment, the more important it becomes to enforce a strong-typed, sin-of-commission data model.

This is all covered in the first week of Graybeard 101 as taught with slate tablets back in the stone age. I was there in 1985. Microsoft has had wool in its ears since forever. Still doesn't make it right, does it?

Furthermore, anyone who really cares about data pipeline integrity writes an export function from the derived format back to the raw input format, until they come out exact, or every difference is adjudicated and signed off, which is incorporated into an automatic validation task which can be repeated at any point in time for the life of the project.

CRAN Task View: Reproducible Research

LaTeX was originally written in the early 1980s by Leslie Lamport at SRI International.

Leslie Lamport won the Turing Award in 2013 for his uber graybeard rectitude, if anyone cares to notice. Douglas McIlroy made his seminal contributions in 1968 (Bill Gates was thirteen, but perhaps he was already set in his ways). John Backus delivered his Turing Award lecture "Can programming be liberated from the von Neumann style?" in 1977, which inaugurated the modern tradition in functional languages (Bill Gates was then twenty-three).

Competence is hard. Sloth is eternal. We continue to seek a third way.

self-response addendum by epine · 2016-08-24 05:29 · Score: 2

Penn Jillette on Donald Trump, Hillary Clinton, And Why He's All in on Gary Johnson — 2 August 2016

I watched this video yesterday. There a fabulous exchange 24:30–30:00 on truth and naievity.

You go through a period when you're sixteen, seventeen, eighteen when truth really obsesses everybody. And then I think you're supposed to kind of sort of grow out of it. And I didn't. It really remains of complete interest to me. ... I'm not bothered at all by people being wrong. ... I have such a naive point of view, to almost not believing it, that people can have information and represent the opposite of that. I just find that so appalling and, in a certain way, fascinating.

Once upon a time I would have ventured that most Slashdot readers would want to view this. It had me thinking about my own life 1985–1995 where I watched the software industry turning into a train wreck, where every seventh train car is painted bright orange and lettered in an ominous Area 51 black stencil font "patch Tuesday", with sparks flying off wheels seized (and reseized) for so long they resemble lopsided pentagons.

I used to think to myself "surely these are just temporary conditions due to the extreme rate of expansion of the software industry, and it will all settle back down to sanity as we crest the exponential growth phase". But no. Like Jillette, I was a die-hard naievitarian. Lesson learned.

Don't use Excel for CSV files! by gosand · 2016-08-24 05:32 · Score: 1

It's not like Excel alters the underlying data, all you have to do is correctly change the column type.

Oh! but it does - once you save it.

If you open a CSV with Excel by default, it will simply read in the values and format it how it sees fit.
Then if you save it, even as a csv, it will give you a warning saying something like "some of the features are not compatible with this format type"
If you proceed, your file is now changed. I have seen scientific notation changed like this. Many columns and rows, you may miss a malformatting and save it as csv. Boom, your data is now toast.

It is why I always look at my CSV files with a text editor first, and only open copies in Excel.
And if you use a real editor like vi, even opening files with millions of rows isn't an issue.

--

My beliefs do not require that you agree with them.

Re: Don't use Excel for CSV files! by skipandmary1017 · 2016-08-24 07:17 · Score: 1

I use .csv file often, without issues. HOWEVER, I NEVER open a .csv file with Excel. Rather I IMPORT the data into Excel via Data > Get External Data ... I control the data type of each column in this way, and columns that I do not want converted, I assign as TEXT, and columns I want converted as dates, I assign as MDY, DMY, YMD as required.
Re: Don't use Excel for CSV files! by gosand · 2016-08-25 05:06 · Score: 1

Good point, I will have to remember that.
However, it's all too tempting to double-click that csv file that is associated with Excel. How could it not work - THERE IS AN EXCEL ICON RIGHT THERE ! :)

--

My beliefs do not require that you agree with them.

Re:Excel can kiss my 5" wide anus! by Anonymous Coward · 2016-08-24 05:38 · Score: 0

Awesome.
I remember those apps from the 1980's.
The best part of working in a computer store at that time was constantly explaining that there was no printer on the planet that could output anything readable from VisiPlot. It was an amazing software product that was better than the resolution of the monitors or the printers.

Sorry to hear about your anus though. Must be hard to find comfortable chairs.

Latter day Computer Science research.... by Anonymous Coward · 2016-08-24 05:49 · Score: 0

The weird thing is why the geneticitsts tried to put genes into Excel rather than into test tubes or cell nuclei. It seems a rather demented kind of CS research to just take scientific data and put into whatever computer program is avaialable and see which one produces the most interesting garbage.

When you already know that every model looks best when pimped up with Photoshop....

Re:It was user error, not a spreadsheet problem .. by Anonymous Coward · 2016-08-24 06:04 · Score: 0

Yip, its pretty dumb. If you have a CSV file with phone numbers formatted as +1-234-567-8901 (not uncommon if you're an international company and you want the numbers stored in the global address book to be directly dial-able from phones anywhere!!).

Excel makes a right mess of that. And so far there is no way to stop it that I can find.

Re: It was user error, not a spreadsheet problem . by 93+Escort+Wagon · 2016-08-24 06:04 · Score: 1

The acronyms have the advantage that they are, relatively, semantics-free., If we turned them into long hand....

This is a false choice. Just because shorthand names are preferable doesn't mean they have to be acronyms, whose tortured derivations often distance the "shorthand name" from its derivation.

Disease researchers, for example, seemingly don't fall into the trap - H1N1 is nice and short, is descriptive of the virus, easy to remember AND is not an acronym.

--
#DeleteChrome

User Error by Anonymous Coward · 2016-08-24 06:08 · Score: 0

Failing to format the cells (naming irony) is really the problem. Excel just makes assumptions about the data when the cell is undefined.

Real scientists use R and S and C on Linux by WillAffleckUW · 2016-08-24 06:26 · Score: 1

Seriously, we have Terabytes of data storage, even in collapsed form, So we use shell scripts written by code in Perl and CGI to run against massive data files spit out by large databases.

Just do the Math.

Now, if you want to say small labs doing biological research have occasional false hits, due to open access searches against common repositories, I might believe you. To be frank, though, you should have read all the notes yourself.

I'm far more worried about a tired research assistant or grad student doing stuff like this and it not being caught before publication.

--
-- Tigger warning: This post may contain tiggers! --

Unwanted Spreadsheet Conversions by Anonymous Coward · 2016-08-24 06:57 · Score: 0

Spreadsheet novices often get bit. I am an Excel user and have often seen these kinds of 'errors'.
Excel, for instance, converts almost every piece of data you enter. From your keyboard you enter 10, which is literally the CHARACTERS ONE AND ZERO. Excel analyses the characters you've entered, determined that since you entered only numeric characters you intend the entry to be the numeric value 10, which is binary 1010, which is a different binary value than the characters 10. So in storage is 1010, but what is displayed is the characters 10, since 1010 is a non-display value.
But that conversion is what we expect and want. The examples in this article are unexpected and unwanted.
If you were to enter 1/2 in a cell, Excel assumes that you're entering a DATE. Excel parses the entry, assuming month 1, day 2, year current year and converts characters 1/2 to number 42371, and changes the Number Format to Date d-mmm.
If you were to enter 6: in a cell, Excel assumes that you're entering a TIME. Excel parses the entry assuming hour 6, minute 0, second 0, and converts characters 6: to number 0.25, and changes the Number Format to Time h:mm.
If you were to enter 171e2 in a cell, Excel assumes that you're entering a NUMBER in scientific notation. Excel parses the entry assuming coefficient 171, exponent 2, and converts the characters 171e2 to number 17100, and changes the Number Format to Scientific 2 decimal places.
So what a savvy Spreadsheet user does is FIRST, before any entry is made, change the Number Format to TEXT, or prefix the entry with an APOSTROPHE, as '1/2, or '6: or '171e2 if you intend to prevent any conversion of the data you enter. It's that simple!

Re:It was user error, not a spreadsheet problem .. by Anonymous Coward · 2016-08-24 07:13 · Score: 0

This is why I use PowerQuery to import CSV files or anything really into Excel. It's easier to make sure your data is typed correctly on the way in.

Proof Reading by theshowmecanuck · 2016-08-24 07:20 · Score: 1

I guess proof reading papers is now a lost art.

--
-- I ignore anonymous replies to my comments and postings.

Re:It was user error, not a spreadsheet problem .. by Obfuscant · 2016-08-24 07:26 · Score: 1

Usually we butt heads on this forum, but here I basically agree. It's a corollary to "you can't make things foolproof because fools are too ingenious".

"If you make a tool foolproof then fools will use it." And the remainder of that sentence could be "and then complain when they get the wrong answer."

Try this by Anonymous Coward · 2016-08-24 07:39 · Score: 0

=-2^2

Re:Absolutely not limited to scientific publicatio by Sax+Russell+5449D29A · 2016-08-24 07:49 · Score: 1

then you only need a way to attach that database to that email with the excel file, too....

Well, not really. If you consider a spreadsheet as a view of a normal relational database, unlike database views, a spreadsheet is a permanent representation of the original data and can be for example emailed and viewed easily. In most circumstances the recipient would likely not need the database nor know how to use it.

This way the data is strongly typed in the database and data operations are a lot safer. Software often used in scientific studies, such as IBM's SPSS, are pretty much a database anyway, so it's not a big leap to use an actual relational database.

But I don't see this happening. Ever. So back to broken Exceling. :-(

--
-SR

Re: It was user error, not a spreadsheet problem . by Anonymous Coward · 2016-08-24 11:08 · Score: 0

Format the column as text and it will disable calculation in that column. You can do the same thing for individual cells, but phone numbers tend to be stored in columns, so you probably just want to do the entire column at one time.
If you are importing a file, read the inport text wizard dialogue box that opens and set the offending column to text then.

CSV by dbIII · 2016-08-24 13:16 · Score: 1

I call bullshit and point you at CSV if they are going to use MS Excel.

Re:CSV by Gilgaron · 2016-08-25 01:56 · Score: 1

And most people interact with CSVs with Excel, botch it all up, and paste it into their Word document for their report.

Re: It was user error, not a spreadsheet problem . by Anonymous Coward · 2016-08-24 17:42 · Score: 0

just use the import tool for importing csv.

bettet yet don't use csv! provide a .txt. if they care about the data, they can parse it back.

just noticed? by DriveDog · 2016-08-26 01:24 · Score: 1

No surprise. Most MS stuff tries to guess what you want and gets it wrong more often than right. Excel's guessing at data types is horrible and causes a lot more problems than just this. What's more, the guessing at what you want repeated is sometimes unreliable and unpredictable, leaving people with a repeated date when they wanted an incremented one or more often the other way around. While I'm at it... Excel made spreadsheet output "pretty", but still doesn't offer a true third dimension (that can be easily referenced within formulae).

Re:It was user error, not a spreadsheet problem .. by cwsumner · 2016-08-26 03:58 · Score: 1

This is a consequence of trying to make things easy enough for anyone to use. The usual attempt is to make it automatic, but making the computer smarter than the user is to lose control over what is happening.

"Make a machine even a fool can use, and only a fool would want to use it."

But computers are too new, for the actual knowledge to have propogated through humanity, yet...
Maybe another hundred years?

Slashdot Mirror

20% of Scientific Papers On Genes Contain Conversion Errors Caused By Excel, Says Report (winbeta.org)

349 comments