Slashdot Mirror


And the Pulitzer Prize For SQL Reporting Goes To... (padjo.org)

theodp writes: Over at the Stanford Computational Journalism Lab, Dan Nguyen's Exploring the Wall Street Journal's Pulitzer-Winning Medicare Investigation with SQL is a pretty epic post on how one can use SQL to learn about Medicare data and controversial practices in Medicare billing, giving the reader a better appreciation for what was involved in the WSJ's Medicare Unmasked data investigation. So, how long until a journalist wins a Pulitzer for SQL reporting? And for all you amateur and professional Data Scientists, what data would you want to SELECT if you were a Pulitzer-seeking reporter?

27 comments

  1. And the award goes to... by Anonymous Coward · · Score: 5, Funny

    Little Bobby Tables!

    Wait, where did the award go...?

    1. Re:And the award goes to... by tommeke100 · · Score: 1

      Hey went to play with Drop, Drop Schema.

  2. Family talent show by Anonymous Coward · · Score: 0

    Look, Johnny will now replicate the results written up in a WSJ article using SQL he wrote all by himself!

  3. Huh? by Anonymous Coward · · Score: 0

    WTF is a "journalism lab"? Is everything outside of a normal lecture class now considered a "lab"? Whoever can answer this question is welcome on my lawn.

  4. If we can choose.... by hawguy · · Score: 2

    And for all you amateur and professional Data Scientists, what data would you want to SELECT if you were a Pulitzer-seeking reporter?

    SELECT convert_style(story, MY_WRITING_STYLE) FROM all_the_stories WHERE interest_score >= PULITZER_LEVEL;

    Though I'd probably put a LIMIT on there so I don't publish too many Pulitzer winning stories at once.

  5. SQL earns more money than winning a Pulitzer by Anonymous Coward · · Score: 0
  6. Trend Analysis by Inzkeeper · · Score: 5, Insightful

    I can certainly appreciate a well written complex piece of SQL. Writing major summary reports in SQL can be unbelievably complex. However, it doesn't need to be complex in order to impress me. It just has to answer the correct question. Particularly true when querying a data warehouse, it is all about getting sums and averages over time periods, right? Now you take those results, throw them into a crosstab engine, start spitting out charts and looking for trends. Then you can start to see the anomalous trends.

    An award winning SELECT statement, in my opinion, would simply be one that asks an insightful question.

  7. That's not what a Pulitzer Prize is for by DNS-and-BIND · · Score: 1

    I think people don't understand what the Pulitzer Prize is for these days. Once upon a time, it was given to inspire journalist excellence, but these days it is just a back-patting that says, "we endorse and agree with your leftist views." Just look at the prizes for Investigative Reporting - seven years into Obama's term and not a single Pulitzer has been awarded for investigating corruption and criminal behavior in his administration. Not one. And this from a scandal-ridden Presidency that is ripe for old-fashioned, shoe-leather, crusading journalists to expose to the light of day.

    I don't see anything condemning any leftists at all. You have to go back 20 years ago to find a Prize awarded for an investigation of the Nation of Islam's questionable business dealings. Can you imagine this sort of thing going on today? A journalist would never receive a Pulitzer for doing this sort of story, in fact they'd probably be publicly shamed, fired, and rendered unemployable. So, let's all remember what the Pulitzer Prize actually is and what it stands for.

    --
    Shutting down free speech with violence isn't fighting fascism. It IS fascism!
    1. Re:That's not what a Pulitzer Prize is for by Anonymous Coward · · Score: 0

      Just look at the prizes for Investigative Reporting - seven years into Obama's term and not a single Pulitzer has been awarded for investigating corruption and criminal behavior in his administration. Not one.

      Really? That's absolutely outrageous. We heard about all kinds of corrupt practices in the financial sector when Bush was President. Where's the even handed reporting?

    2. Re:That's not what a Pulitzer Prize is for by wonkey_monkey · · Score: 1

      Just look at the prizes for Investigative Reporting - seven years into Obama's term and not a single Pulitzer has been awarded for investigating corruption and criminal behavior in his administration. [pulitzer.org] Not one.

      How do you define "in his (Obama's) administration?" And versus how many during the Bush administration? I'm not suggesting their weren't any; I just don't know the numbers. I assume you do, since your statement implies, at the very least, that the answer is "more than one," and presumably also that the number is high enough to be statistically significant when it comes to exposing the bias you propose exists (and which, let me clear, I have no reason to actively doubt).

      --
      systemd is Roko's Basilisk.
    3. Re:That's not what a Pulitzer Prize is for by Anonymous Coward · · Score: 0

      >B-B-But Bush also did it...
      Obama bombed a school full of children to kill 2 terrorists who weren't even there that day. Did the NY Times or Washington Post run stories on it? No. It was big news in Pakistan. Think if china bombed a middle school in Indiana because Falun Gong members were going to give a lecture. Wouldn't that be a big deal?

      http://www.washingtonsblog.com/2015/01/american-drones-killed-civilians-bombing-cambodia-vietnam-war-died-911.html
      http://www.washingtonsblog.com/2011/03/is-nobel-peace-prize-winner-obama-more-brutal-than-bush.html

      Here's the closest story to it, and it OPENS with a defense for Obama:
      http://www.nytimes.com/2015/04/24/world/asia/drone-strikes-reveal-uncomfortable-truth-us-is-often-unsure-about-who-will-die.html

      You have to look at FAR LEFT (salon article from ALTERNET) news sites to find any mention of his bombing campaign:
      http://www.salon.com/2014/01/28/the_terrible_human_price_of_obamas_drone_war_partner/

  8. Here ya go: by Anonymous Coward · · Score: 1

    SELECT a.Headline, AVG(r.Rating) as 'AvgRating', COUNT(DISTINCT v.IPAddress) as 'Views'
    FROM `slashdot`.`articles` a
    JOIN `slashdot`.`sources` s ON (s.ID = a.SourceID)
    LEFT JOIN `slashdot`.`ratings` r ON(r.ArticleID = a.ID)
    LEFT JOIN `slashdot`.`articleviews` v ON(v.ArticleID = a.ID)
    WHERE s.Name LIKE '%dice%' OR (s.Name LIKE '%cowboy%' AND s.Name LIKE '%neal%')
    GROUP BY a.ID
    ORDER BY AVG(r.Rating) DESC, COUNT(DISTINCT v.IPAddress) DESC;

  9. Re:SQL is for cows. by khelms · · Score: 1

    The duck goes quack. The pig goes oink. The sheep goes baaaa.

    Yeah, we've all seen a See 'N Say, but few of us are still playing with one.

  10. interest_score by Anonymous Coward · · Score: 0

    as Anonymous Coward was perhaps alluding to, the value of the interest_score field may be insufficient to accurately qualify 'story'

  11. Inflation of Medicare billing by Anonymous Coward · · Score: 0

    SELECT billing_year, billing_month, AVG(billed_amount) FROM Billed_to_medicare GROUP BY billing_year, billing_month ORDER BY billing_year DESC, billing_month DESC;

  12. select future where freedom isGreaterThan techne by Anonymous Coward · · Score: 0

    A
      B
      C
      Techne

  13. Reminds me of a prior job by Tablizer · · Score: 1

    I used to do similar stuff for marketing research: How many customers fitting a certain profile purchased product X and also product Y, with and without promotion Z.

    The hard part was that the tables and product codes were messy. Historical baggage plagued the design. It would have been a relatively simple job with "clean" databases. (A lot of orgs have messy databases, by the way.)

    I pushed the idea of views or re-constituted table copies for marketing queries, but the DBA was too booked on other projects and they didn't want me to roll my own. I'd put myself out of job anyhow if someone succeeded at simplifying things, for 2/3 of queries could be put into web forms so the marketers could run their own. (I eventually left anyhow.)

  14. By Neruos by Anonymous Coward · · Score: 0

    If you are using designing reports directly via SQL, then you are doing something wrong. Time to upgrade to SaaS or BI solutions like Business Objects. SQL in reporting should be a limited if not break/fix/hack scenario, with a product doing 95% of the job via a wizard.

    Move with the times or get left behind.

  15. Anything but lecture by tepples · · Score: 1

    Yes. When I went to college, "lab" was the name for any section that students in a particular class were required to take in addition to a lecture.

  16. Sadly promotes the Flaws in Today's Reporting by fygment · · Score: 2

    The SQL tutorial looks at the numbers but doesn't emphasize two kind of glaring omissions in the WSJ article:

    a) Dr Weaver is charging for a procedure _labeled_ 'cardiac', but there is no mention of what the procedure is, it's relevance to cardiology (if the label is accurate), or it's relevance to internal medicine (Dr Weaver's _labeled_ current specialty). For all we know, Dr Weaver is an ex-cardiologist, now practicing internal medicine for which he has found this procedure to be extremely useful in the patients he treats. For all we know, the procedure was mislabeled (esp. since it is pointed out that the data is noisy incl. spelling errors, multiple labels for same thing, etc.)

    b) At one point, Dr. Weaver's _statistical_ use of the procedure (99.5%) is compared to a raw numerical value (6) by Cleveland Clinic cardiologists. For all we know, the clinic cardiologists only saw 6 patients for whom the procedure was relevant, or they never use the procedure because they have other more relevant/current techniques, or patients who are seen by the clinic are at a point where the procedure isn't required.

    While the SQL tutorial is an interesting look at how to verify the accuracy of the statistics in an article, it tacitly provided validation for what is still poor reporting ie. the statistics need explanation and validation beyond simple numbers.

    If you assume that most people are pretty honest (statistically they are), then the SQL queries are a neat way to highlight that the billing system (not the practioners) is in need of a second or third look.

    --
    "Consensus" in science is _always_ a political construct.
  17. Hmm, any select? by Karem+Lore · · Score: 1

    Select * from NSA.listeningDB union select * from GCHQ.listeningDB;

    select * from GOV.lobbyists where GOV.lobbyist.funded > 0 AND GOV.lobbyist.friend in (select GOV.inpower.name);

    --
    When all is said and done, nothing changes...
  18. And in other off topic but slightly related news by Anonymous Coward · · Score: 0

    I think people don't understand what the Pulitzer Prize is for these days. Once upon a time, it was given to inspire journalist excellence, but these days it is just a back-patting that says, "we endorse and agree with your leftist views." Just look at the prizes for Investigative Reporting - seven years into Obama's term and not a single Pulitzer has been awarded for investigating corruption and criminal behavior in his administration. Not one. And this from a scandal-ridden Presidency that is ripe for old-fashioned, shoe-leather, crusading journalists to expose to the light of day.

    I don't see anything condemning any leftists at all. You have to go back 20 years ago to find a Prize awarded for an investigation of the Nation of Islam's questionable business dealings. Can you imagine this sort of thing going on today? A journalist would never receive a Pulitzer for doing this sort of story, in fact they'd probably be publicly shamed, fired, and rendered unemployable. So, let's all remember what the Pulitzer Prize actually is and what it stands for.

    Its nice to see that the wall street journal is working on raising the quality level of articles they publish

  19. clearly SQL is still immature by Anonymous Coward · · Score: 0

    The SQL tutorial looks at the numbers but doesn't emphasize two kind of glaring omissions in the WSJ article:

    a) Dr Weaver is charging for a procedure _labeled_ 'cardiac', but there is no mention of what the procedure is,

    page 8 'the proce-dure, which is called "enhanced external counterpulsation" or EECP

    it's relevance to cardiology (if the label is accurate),

    page 8 'Steven Nissen, chairman of cardiovascular medicine at the
    Cleveland Clinic, characterizes EECP as “a treatment that
    is, and should be, rarely used”'

    or it's relevance to internal medicine (Dr Weaver's _labeled_ current specialty). For all we know, Dr Weaver is an ex-cardiologist, now practicing internal medicine for which he has found this procedure to be extremely useful in the patients he treats.

    page 8 'Dr Weaver, ... acknowledged having no specialized training in
    cardiology ... By his
    own account, he doesn’t see patients himself but employs
    two to three cardiologists for that purpose.'

    For all we know, the procedure was mislabeled (esp. since it is pointed out that the data is noisy incl. spelling errors, multiple labels for same thing, etc.)

    b) At one point, Dr. Weaver's _statistical_ use of the procedure (99.5%) is compared to a raw numerical value (6) by Cleveland Clinic cardiologists. For all we know, the clinic cardiologists only saw 6 patients for whom the procedure was relevant, or they never use the procedure because they have other more relevant/current techniques, or patients who are seen by the clinic are at a point where the procedure isn't required.

    Page 8 'Medicare covers EECP only for patients
    who have “disabling” angina, a kind of persistent and
    extreme chest pain, and who can’t have surgery to treat it.'

    While the SQL tutorial is an interesting look at how to verify the accuracy of the statistics in an article, it tacitly provided validation for what is still poor reporting ie. the statistics need explanation and validation beyond simple numbers.

    This statement is in fact quite entertaining.

    If you assume that most people are pretty honest (statistically they are), then the SQL queries are a neat way to highlight that the billing system (not the practioners) is in need of a second or third look.

    Source
    Please note that the linked pdf hosted on pulitzer.org includes an additional 2 pages at the beginning, which are omitted from the screenshot provided in the padjo.org reference.