Backblaze Hard Drive Stats for 2017 (backblaze.com)
BackBlaze is back with its hard drive reliability report. From the blog post: Beginning in April 2013, Backblaze has recorded and saved daily hard drive statistics from the drives in our data centers. Each entry consists of the date, manufacturer, model, serial number, status (operational or failed), and all of the SMART attributes reported by that drive. As of the end of 2017, there are about 88 million entries totaling 23 GB of data. At the end of 2017 we had 93,240 spinning hard drives. Of that number, there were 1,935 boot drives and 91,305 data drives. This post looks at the hard drive statistics of the data drives we monitor. We'll review the stats for Q4 2017, all of 2017, and the lifetime statistics for all of the drives Backblaze has used in our cloud storage data centers since we started keeping track.
Hi I’m Bilal, cofounder of ClearBrain (https://clearbrain.com). ClearBrain helps you automatically build predictive models for which of your users are most likely to convert or churn in your app. Think AmazonML for marketing analysts.
Our founding team comes from Optimizely & Google where we built similar predictive tools for our marketing teams. At each company, we kept building the same components of a predictive pipeline - javascript snippets to collect data, ETL jobs to transform that data, and cron jobs to run a regression. We were spending hours a week maintaining these pipelines, but the time-consuming part wasn’t the algorithms (as they’re open sourced) it was the transformations.
So with ClearBrain we decided to automate the data transformation steps. We built our system in Spark ML (scala), Data Pipeline, and Go. Instead of instrumenting yet another Javascript snippet, we use existing data in Segment (YC S11) and Heap (YC W13) through standard integrations. And because every Segment/Heap dataset has the same schema, our system can process it with the same transformations into a machine-readable feature matrix. When a customer selects a user action tracked in Segment/Heap to predict, our transformed matrix is run through a logistic regression via Spark ML, and outputs a probabilistic score for each user to perform that action based on users who performed it in the past.
This distills the predictive modeling process to a simple UI to identify high-probability users in minutes. We’ve built the tool with marketers in mind, to help them identify which users may convert or churn, and export those users to marketing tools like Facebook Ads, Hubspot, etc. We’ve also found good reception from startups that have marketing objectives but lack the resources to deploy ML-driven campaigns themselves.
We look forward to feedback from the /. community! :)
Bilal