Spotify Is Writing Massive Amounts of Junk Data To Storage Drives (arstechnica.com)
An anonymous reader quotes a report from Ars Technica: For almost five months -- possibly longer -- the Spotify music streaming app has been assaulting users' storage devices with enough data to potentially take years off their expected lifespans. Reports of tens or in some cases hundreds of gigabytes being written in an hour aren't uncommon, and occasionally the recorded amounts are measured in terabytes. The overload happens even when Spotify is idle and isn't storing any songs locally. The behavior poses an unnecessary burden on users' storage devices, particularly solid state drives, which come with a finite amount of write capacity. Continuously writing hundreds of gigabytes of needless data to a drive every day for months or years on end has the potential to cause an SSD to die years earlier than it otherwise would. And yet, Spotify apps for Windows, Mac, and Linux have engaged in this data assault since at least the middle of June, when multiple users reported the problem in the company's official support forum. Three Ars reporters who ran Spotify on Macs and PCs had no trouble reproducing the problem reported not only in the above-mentioned Spotify forum but also on Reddit, Hacker News, and elsewhere. Typically, the app wrote from 5 to 10 GB of data in less than an hour on Ars reporters' machines, even when the app was idle. Leaving Spotify running for periods longer than a day resulted in amounts as high as 700 GB. According to comments left in the Spotify forum in the past 24 hours, the bug has been fixed in version 1.0.42, which is in the process of being rolled out.
Bandwidth, memory, clock cycles....don't matter. Use more shitty layers of abstraction over layers built into high level languages, then kick it out the door.
Nah, then you'd see an increased network usage. This is probably just Firefox's fsync bug repeated: in order to ensure data integrity, SQLite has a mode that fsyncs on commit. (After all, if the data isn't written to storage, it isn't really committed.) If you combine that with autocommit after every minor transaction, you get a ton of fsyncs and massive data usage.
Im not sure its a problem solved by that.
I think the gist of it is that for every small change to the data they store on your device, they are re-writing the entirety of the dataset they are keeping. So for instance they are logging a record that says "didnt play music this minute" but are re-writing the entire multi-year log.
I blame XML and other formats that are used for the stupid reason that "we already have XML routines so lets use it for everything"
"His name was James Damore."
Why the hell would you put a pagefile in a ramdisk? "Yo dawg, I heard you love pages?"
What part of "shall not be infringed" is so hard to understand?
If you're writing enough to pagefiles, you need more RAM anyway.
If you're writing a lot to temporary areas, you need to stop doing so.
That said, I'm on an SSD machine at the moment that has been running for 6 months, with absolutely no special treatment, imaged from a years-old working PC without changing anything, and it's written 1.5TB. 1TB of that was the initial imaging process.
It's the main workhorse in an IT Office in a school, use for 10+ hours every single day for everything imaginable. Client machines rarely use much.
It has a write-life of 100TB. If it dies, I just hit F12 and re-image cleanly.
At current usage (not including the initial image), I count that as 1TB of write a year, which gives longer the expected lifetime of the PC itself, however far out I am.
There's no need for special treatment, no need to use special SSD transfer software, no need to over-provision, or increase RAM cache or anything else. Just have a PC that isn't slogging itself to death, and slap an SSD in.
Don't expect it to last forever, but you shouldn't need to adjust ANYTHING at all.
And I've done this on all the staff work machines earlier this year - zero failures so far and it has made much more of a performance difference than doubling the amount of RAM. In fact, where machines had motherboards that were limited in RAM, we SSD'd and saw HUGE performance increases better than those clients whose RAM we doubled but are running on traditional hard disks.
At home I have a 1TB EVO 850 and that's the same. Literally imaged byte-for-byte, and is stupendously fast and no need for any software changes whatsoever, and the write numbers are predicting 20+ years of life despite a similar 10+ hours a day of usage.
Don't RELY on it never failing. But they are going to be in warranty (whether that's by number of years, or data written) for the life of your machine, under even heavy usage, unless you're doing something incredibly stupid (like use in NVR, RAID, or similar without buying a high-write-endurance model).
From the comments on Ars, it seems pretty clear that there is a bug in the app causing it to repeatedly compact the sqlite database it uses. I'm sure we all know that that is something which should be done only when actually needed, so that's clearly a bug, not inefficiency.
Pagefiles I don't put on software ramdisk (had to clarify that), but on HDD instead
So you put the things that benefits the most from fast i/o on your slowest storage device instead of your ssd? Why not put it on a floppy drive, or a mounted network share connected to a VPS hosted on the other side of the country if you like to slow things down?
Or maybe you just love that spinnig hdd sound.
lucm, indeed.
Should I also move my HOSTS file to a ram disk?
This sounds like some smart software architect to the abstraction of the persistance/storage layer of the Spotify stack too far whilst at the same time storing to much of miniscule datapoints in Spotifys objects. Because once abstracted properly, adding attributes to your objects and the entire stack is trivial.
Think of it:
If your stacks ORM neatly abstracts everything concerning persistance and on the backside syncs on neatly whenever it has the opportunity, all you need is app-side developers and software designers storing every little piece of data they can find and that changes evers millisecond and then you have your bandwidth/load disaster as described.
If something like this is the case with Spotify, which I do strongly suspect, it is a good example that goes to show that you can take clean-room design too far. And that a haphazard duct-tape and chickenwire approach to product development can have significant advantages, as you build around unforseen roadblocks on a daily basis and only add the features really needed.
I see an example of this every day, as I am currently doing WordPress development and building a WordPress pipeline for an agency. Large parts of the WP legacy architecture are an abysmally convoluted mess built by people who shouldn't have been let near a keyboard 15 years ago. But having a non-developer build a production capable demo of a website in WP is significantly faster than starting with an actual UX prototype, which quickly leads our team into real-world problems that we often haven't suspected. And suddenly a proper ORM and cleanroom design would cause hassle at one end or the other.
My to eurocents.
We suffer more in our imagination than in reality. - Seneca
Generally there is no reason to do that, but there are some poorly coded applications that will page memory to disk, even when they don't need to.
Is it just my observation, or are there way too many stupid people in the world?
Rust spinners wear out too. This can be a particular problem if it's constantly bringing the drive out of power-down.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
I use Google Play Music. Not only can it cache songs, you can also upload your own collection. And now that Google has acquired and integrated Songza, their playlists are awesome.
lucm, indeed.
Problem solved.
This is a non issue on desktops, really.
It takes a pretty small worldview to not be able to imagine people on limited bandwidth / unreliable internet connections.
I blame XML and other formats that are used for the stupid reason that "we already have XML routines so lets use it for everything"
XML is like violence - if it doesn't solve your problem, you aren't using enough of it.
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety."
This isn't a bandwidth issue, nothing is being downloaded, It takes a pretty dense worldview not to read the article you are posting on.
Sorry, teleporters just kill you and then make a copy. A perfect, soul-less copy.
Here is a possibly related complaint from almost three years ago.
If you're using XML to solve a problem, you actually have two problems.
That is all.
If you leave browsers up all all the time, they have the same problem. Firefox and Chrome. https://www.grc.com/sn/sn-580....
You can lose something that is loose, so tighten the loose item so you don't lose it.
Wait, there's articles now?
You must be new here
(notices UID) err, now I'm just confused...
"Government is like fire; a handy servant, but a dangerous master." -- George Washington
Seems developers don't consider to optimize disk I/O. Recently I saw a live event streamed using firefox (from a not so great website, i guess it uses flash) and it kept my disk 100% all the time. Why should a streaming service write all those video data into disk, can't it just cache in RAM n display n forget the bits?
Such unnecessary disk i/o wears my disk down, increases power use (if I'm on say battery on my laptop) and of course creates a kind of internal DoS as it hogs the disk i/o and rest of processes can't get disk i/o or get delayed -- resulting in a sluggish OS response even to say some file explorers. ie a well behaved app/software should not hog any shared piece of hardware/resource (like disk-io) leading to system instability.
Apps should be benchmarked not only on their memory foot print or CPU usage (like algorithm/big(Oh) s) but also on their external data traffic usage like disk/network i/o.
I regularly watch my disk i/o usage by processes and get rid of any if I suspect they are hitting it unnecessarily hard.