Domain: processing.org
Stories and comments across the archive that link to processing.org.
Stories · 4
-
Book Review: The Nature of Code
eldavojohn writes "I kickstarted a project undertaken by Daniel Shiffman to write a book on what (at the time) seemed to be a very large knowledge space. What resulted is a good book (amazing by CC-BY-NC standards) available in both PDF and HTML versions. In addition to the book he maintains the source code for creating the book and of course the book examples. The Nature of Code starts off swimmingly but remains front heavy with a mere thirty five pages devoted to the final chapter on neural networks. This is an excellent book for Java and Processing developers that want to break into simulation and modeling of well, anything. It probably isn't a must-have title for very seasoned developers (unless you've never done simulation and modeling) but at zero cost why not?" Read below for the rest of eldavojohn's review. The Nature of Code: Simulating Natural Systems with Processing author Daniel Shiffman pages 520 publisher The Nature of Code rating 9/10 reviewer eldavojohn ISBN 978-0985930806 summary A book concentrating on the simulation of natural elements through both basic and advanced programming concepts in Processing. First off, I feel like defining the audience of this book is very important to avoid disappointment. This book is not for someone who's already developed games or modeled highway traffic or knows how to build their own physics engine. No, this book is geared at the people who are familiar with one language (preferably Java or Processing) and want to get a taste of all of the above. This book is possibly suitable for a someone new to the world of programming who is willing to put in the extra effort of coming up to speed on Processing in tandem with the text. After all, Processing is a comparatively forgiving language with a dead simple API to interact with the mouse and draw/animate objects.
I'd also like to address the "exercises" that are found throughout chapters and at the end of chapters in this book. They are excellent. I picked a couple and invested actual time in fleshing them out and I feel like Shiffman succeeded in inserting a wide range of difficulty. Leading along through each chapter, it is easy to successively complete each new exercise while the end of the chapters present stretch exercises. In addition to that, applicable chapters urge and provoke the reader to utilize newly learned concepts into what Shiffman calls "The Ecosystem Project." Where the user is basically defining an ecosystem and continually adding new animals, new movement patterns, new behaviors like predation and finally artificial intelligence.
Lastly this book can be found in many formats and I read the first half as HTML with animated diagrams. While the animated diagrams were awesome and added greatly to the text, I still found myself enjoying the dead tree book much more. I know I will soon be a dinosaur with shelves of needless weight that people will mock but I cannot make the jump to reading on a screen. The book's binding and paper quality is average as it appears to be from Amazon's CreateSpace. Diagrams that would animate are shown in the book as having progressively darkening shadows of the paths of objects and is fairly easy to envision movement. I did love the HTML version's moving examples though!
The introduction of this book brings up a few fundamental concepts on randomness like random walks and Perlin noise as well as a bit of statistics. For being labeled "Introduction" this chapter is fundamentally important and the aforementioned concepts are referred back to throughout the rest of the book. The book immediately dives into code snippets of a very simple nature that are easy to run and understand. Great detail and careful explanation are found throughout these opening chapters. The user is given informational boxes going further in depth to certain concepts. This was done really well in the first five or so chapters and was rare if even present in the final chapters.
The first chapter is devoted to vectors. It does an excellent job of explaining why they are so important as well as define and code mathematical concepts that affect vectors. A great aspect of this chapter is that the author fleshes out PVector functionality before your eyes to better understand Processing and object oriented programming. New (to beginners) ways of representing and implementing velocity and acceleration using vectors are explored at their most basic levels.
The second chapter moves naturally enough to forces on objects and begins to delve in basic physics formulas. Newton's Laws are modeled as well as friction, aerodynamics, fluid dynamics and gravity. Shiffman does a great job of keeping these unruly topics in easy to understand language while at the same time offering the scary looking formulas. He even goes so far as to insert an informational box imploring the reader to not be afraid of scary looking formulas by breaking down friction. I feel like one of the strengths of this book is showing how a complex looking formula can be deconstructed to easy English and then further implemented roughly in a model in Processing. While this modeling is by no means completely accurate or state of the art, it is a good introduction and would likely suffice for simple games and web design.
The third chapter brings angles into the mix by concentrating on oscillations. While it does a great job of talking about the important aspects of trigonometry, the text does really follow through with recalling these concepts. For instance, the mnemonic device SOHCAHTOA from geometry class is briefly explained and subsequently dropped. We use it in later chapters but it is used implicitly and may be difficult for people who are not intimately familiar with it to see the trigonometric reductions employed for simplified coding of the visualizations. Shiffman does an excellent job at starting with something that looks like a complex system, breaking it down to its component vectors and showing incremental changes to the code that iteratively improve on the visualization at hand. In doing so he gives an example of how a modeling programmer should think and work through known physical behavior to derive something that works visually in Processing.
Next up is particle systems. The reader is introduced to simpler ways of maintaining a set of particles as we start to focus on multiple particles with complex interactions. Shiffman opts to keep it simple and shies away from coding aspects like ArrayList versus LinkedList versus HashMap. Instead minimal space is spent on side ventures and the particle systems are surprisingly easy to get off the ground. The user is introduced to polymorphism, inheritance and more advanced class constructs so that the user can reduce the amount of code required to activate, handle and delete heterogenous groups of particles. For a beginning developer this chapter is great at walking them through these more advanced concepts and helping them see a direct benefit to the code.
So far, Introduction through Chapter Four of the book, everything has been great. Shiffman points out that there are a plethora of physics libraries out there in any imaginable language of any imaginable quality. And, consequently, it's unlikely you're going to forge forward with the aforementioned concepts and find yourself making the next engine for latest blockbuster space shooter game. As a result, Chapter Five is an overview of how to interact with physics libraries and use your Processing sketch as a facade that just queries said library for position. Box2D is the first library he tackles and with good cause — it's the same engine used by Angry Birds. And that's great because it is certainly empowering to know that if you can skin a simple game that adds a few game rules to physics, you can make a billion dollars. I learned a lot from this. I have never interacted with a physics library like this before and it was easy to produce fluid and impressive results. But it felt like glue code and it also felt like this text could be deprecated with a large update to Box2D (or it's Java and Processing equivalents). This really is a necessary and helpful chapter for this book but I felt sad that we had so quickly given up on rolling our own physics library. After Box2D, Shiffman presents VerletPhysics and provides a helpful resource for when you should use one over the other. Also, the terms for interacting with the libraries are slightly different but represent the same concept (side note: I wasn't a big fan of the convoluted names these two libraries used to designate objects and object types).
Chapter Six shows the reader how to emulate an autonomous agent by introducing "desired" vectors to each object. In this case it is a race car trying to reach a target. As the object moves, the desired vector is a updated. Examples of code are provided that show the object overshooting its target and Shiffman progresses on his path of slightly improving it by algorithmically adjusting the desired vector by introducing a slowing magnitude upon approach of the target. The actions of the object become more complicated as a flow field is suggested instead for behavior. The author explores path following and how to introduce a bit of wandering around straight line like an ant following a pheromone trail or a person walking along a wall. Simple examples of group behavior like even spacing in a crowded group or flocking in a sparse population within a large space. Lastly this chapter covers a very important aspect of code: performance. By now the reader has seen many examples where code can run slowly and this chapter's continual pairwise updating of all objects on the screen brings up Big O Notation. I wish Shiffman would spend more time on this or at least provide a separate box with more technical information on it like he did with other concepts.
The seventh chapter takes an interesting turn into cellular automata. While an interesting chapter and an interesting concept, it feels a bit disjoint from the rest of the text. While there is a way to tie it back into the long running ecosystem project. The most important aspect of cellular automata is that they are fun visualizations where as other concepts in programming that revolve around mutating state might not be as readily visible (like finite state machines or Markov models). This is the first chapter that feels a little rushed and more like a brief foray into a potentially deep field. The Game of Life is covered but only in its simplest aspects and I feel like this chapter could be better.
Chapter Eight dives into fractals. Again, like the last chapter, it is a bit short but I enjoyed this chapter. They are a great visual way to introduce newcomers to recursion and get them excited about it. On top of that, Shiffman shows how fractals appear in nature. Koch curves and Sierpinski triangles as fractal visualizations. Shiffman has a great informational box discussing the "monster" curve and tantalizes the reader with the paradox that an infinite recursion of the Koch curve results in an infinitely long line in a finite area of paper. This sort of stuff is what makes reading a book like this enjoyable and drives people to delve deeper into this concepts. I only wish the book had more of this. Also crucial to recursion in this chapter is a processing feature new to me: pushMatrix() and popMatrix(). As these are built out into trees, the author moves on to L-systems as devised by Aristid Lindenmayer. It's amazing how this simple grammar could result in a simulation of an algal growth.
The ninth chapter helps the user through a high level overview of genetic algorithms. I think one thing this book lacks is caution or warning about jumping into concepts or using concepts just because they sound cool. While genetic algorithms sound cool and futuristic, I have rarely found them to be at all useful on a professional level. Shiffman does a great job of explaining precisely how selection is determined by defining the constraints of the environment as well as the evaluation function. Unfortunately I find that these things are often hard to define and it's warnings like these that the text lacks. Nevertheless, there are a few good examples picked out for coding — unsurprisingly they use the laws of physics we just discussed and a number of computable variables for valuation. The best example is the rocket ship which is introduced after the standard monkeys trying to type the works of Shakespeare at a typewriter. Shiffman does a great job of explaining genetic algorithms and it's certainly a neat topic that's fun to think about but I'm not sure it's a good practical fundamental aspect of coding. It definitely works for the simulation side of coding so it should stay in the book but again it feels rushed with a lot of the simulation application left to the reader in the ecosystem project. I think that a much longer chapter that models predation — like wolves and rabbits — might work a lot better. You could even tie in a little bit of math and show situations where not enough mutations cause the hunter or prey to settle in on local maximums.
The tenth and final chapter briefly covers neural networks. Again, this chapter felt rushed and was missing a lot of the great explanations that were present in the first half of the book. The scant thirty five pages covers peceptrons, neural networks, training vehicles with them and even backpropagation of multilayered neural networks to hand more complex classification demands. In an effort to give this chapter some fun visualizations, the last thing Shiffman covers is the animation of the operation of a neural network. I'm intimately familiar with all these topics but the pace at which this chapter moves might be too much for a starting developer. I feel like there's a huge opportunity in this chapter to more thoroughly explain neural networks and to get readers more excited about classification systems in code.
All in all, the book was thoroughly enjoyable and I really enjoy that it is a creative commons work with both a github for the source code and the raw book. Although the latter chapters could use a lot of additional work this book is a great beginning tool for people who wish to start modeling nature in visualizations quickly and easily.
You can purchase a physical copy of The Nature of Code from amazon.com or you can name your price on a PDF directly. Slashdot welcomes readers' book reviews — to see your own review here, read the book review guidelines, then visit the submission page." -
Teaching Game Development To Fine Arts Students?
jkavalier writes "I've been asked to prepare a short course (50 hours) of video game development to Fine Arts students. That means people with little-to-no technical skills, and hopefully, highly creative individuals. By the end of it, I would like to have finished 1-3 very basic minigames. I'm considering Unity 3D, Processing, and even Scratch. How would you approach teaching such a course? What do you think is the best tool/engine/environment for such a task?" -
Beautiful Data
eldavojohn writes "Beautiful Data: The Stories Behind Elegant Data Solutions is an addition to six or so other books in the 'Beautiful' series that O'Reilly has put out. It is not a comprehensive guide on data but instead a glimpse into success stories about twenty different projects that succeeded in displaying data — oftentimes in areas where others have failed. While this provides, for the most part, disjointed stories, it is a very readable book compared to most technical books. Beautiful Data proves to be quite the cover-to-cover page turner for anyone involved in building interfaces for data or the statistician at a loss for the best way to intuitively and effectively relay knowledge when given voluminous amounts of raw data. That said, it took me almost two months to make it through this book, as each chapter revealed a data repository or tool I had no idea existed. I felt like a child with an attention deficit disorder trying my hand at nearly everything. While the book isn't designed to relay complete theory on data (like Tufte), it is a great series of short success stories revolving around the entire real world practice of consuming, aggregating, realizing and making beautiful data." Keep reading for the rest of eldavojohn's review. Beautiful Data: The Stories Behind Elegant Data Solutions author Edited by Toby Segaran and Jeff Hammerbacher pages 384 publisher O'Reilly Media, Inc. rating 9/10 reviewer eldavojohn ISBN 978-0-596-15711-1 summary A collection of twenty essays and chronicles from the implementers of successful projects revolving around real world data processing and display. Since the individual articles in this book are essentially a series of what to do and what not to do, this review is more like a list of notes that were my personal rewards from each chapter. Given my background, these notes will be very specified to my interests and responsibilities for web development whereas a statistician, academic or researcher might pull a completely different set from the book. The book also has a nice colorized insert that allows the reader to get a better sense of the interfaces discussed throughout the book. One potential problem with these "case studies" is that they will most certainly become dated — and in our world that happens quite quickly. It's very easy for me to think that specific information about colocation facility usage by social networking sites (Chapter Four) will always be useful and relevant. The sad fact of the matter is that because of the unforeseen nature of hardware advancements and language evolution, many of these stories could become irrelevant blasts from the past in one or two decades. I think the audience that stands to benefit this most from this book are low level managers and people in charge of large amounts of data that they don't know what to do with. The reason for this is that while there are a few chapters that deal with low level implementation details it mostly consists of overviews of popular and successful mentalities surrounding data. One other type of audience that might be a target for this book would be young college students with interests in math, statistics or computer science. Had I picked this book up as a freshman in college, no doubt the number of projects I worked late into the night on would have multiplied as would my understanding of how the real world works.
Chapter One deals with two projects done by grad students: Personal Environmental Impact Report (PEIR) and your.flowingdata (YFD). This chapter starts out slow describing how the system harnesses personal GPS devices — a common trend in phone development these days. After clearing the basics, the chapter reveals a lot about the iterative developments the author took to select and include a map interface to effectively and quickly display several routes that a user has driven with intuitive visual queues to indicate which was the most environmentally expensive. Trying to stick with the green means good and red means bad proved difficult and they employed an inverted map of mostly shades of gray to avoid clashing colors with the natural colors on a regular map. The final part of PEIR discussed a Facebook application that simply paired you up against friends also using PEIR. This gave the user a relative value basis of otherwise incomprehensible numbers surrounding their environmental impact. YFD focuses more on an interface for accumulating Twitter data from a user to help them track sleeping and weight loss.
The second chapter deals entirely with constructing a very simple survey that has a variable length depending on what answer you give to an earlier question. While this seems to be a very simple task, the chapter does a great job of explaining how you can make it better and why doing this makes it better. A great quote from this chapter is "The key method for collecting data from people online is, of course, through the use of the dreaded form. There is no artifact potentially more valuable to a business, or more boring and tedious to a participant." The chapter points out that for every action you require the user to make, the user may decide the survey is not worth their time. Yes, clicking "Next" on a multi-page form only gives the user another chance to decide this isn't worth it. Furthermore, many pages might cause the user to be unsure of the real length of the survey. So they decided against this and instead made the survey branch from one page so that page would continually get a little larger depending on how you answered the questions. Knowing the targets for the surveys were older made a copy large font mandatory as 72% of Americans report vision impairment by the time they are age 45. This chapter dealt more with collecting the data, respecting the source of data and building trust with the participants than displaying the data they provided.
Chapter Three deals with the recently disabled Phoenix that landed on Mars and how precisely the image collection was done. While it might seem like the wrong place to do it, there was actually pre-processing and compression done on board the lander before transmission to Earth. This article tackles interesting issues that are long thought to be an extinct animal in computer science where resources are constrained and radiation bombarding keeps the CPU modestly lower than your average desktop. Do you process the image in place in memory or make a copy so that the original image can be retained during processing? These are familiar issues to embedded developers but stuff I haven't touched since college. While the author details the situation on all fronts down to the cameras being used, it's largely a blast from the past as far as resource aware computing is concerned. Then again, I doubt any of my code will ever be flight certified by NASA.
Chapter Four has a very interesting analysis and description of Yahoo!'s PNUTS system for serving up data in complex environments like tackling issues with latency across the world when dealing with social networking. The chapter does a decent job of explaining how issues are resolved when replicated servers across the United States become out of sync and the resolution strategy. The chapter ends on an even more interesting note explaining why Yahoo! deviated from Google's BigTable, Amazon's Dynamo, Microsoft's Azure and other existing implementations. This tale of well thought out design is a stark contrast to Chapter Five which centers on a Facebook 'data scientist' that — instead of explaining the solution as a well planned finalized implementation — tells the trial and error approach of a very small team of developers treading into waters unknown with data sets of Sisyphean proportions. It was tempting for me to read this chapter and chastise the author for not foreseeing what numbers could come with making it big in social networking. But the chapter has a lot of value in a "lessons learned" realm. It may even prepare some of you who are writing web applications with a potentially explosive or viral user base. While it's popular to hate Facebook and in turn transfer that hate to the developers, no one can argue against them being one of the most successful social networking sites and any information of their (sometimes flawed) operations certainly proves to be interesting.
Chapter Six was completely unengaging for me. The chapter covers geographing. More specifically the efforts to take pictures of Britain and Ireland and map/display them geographically. The images would aim to cover a large area than users could tag them with what they see (tree, road, hill, etc). Unfortunately it never really registered with me why someone would want to do this and what the end goal was that they were aiming for. Instead they managed to produce some pretty heinous and very difficult to digest heat maps or "spatial tree maps." By embedding coloration and lines into the treemaps the authors hoped to convey intuitive information to the reader. Instead my eyes often glazed over and sometimes I flat out disagreed with their affirmation that this is how to display data beautifully. You're welcome to try to convince me that geographing has some sort of merit other than producing pretty mosaics of large image sets but it took a lot of effort for me to continue reading at points in this chapter.
Chapter Seven sets the book back on track in "Data Finds Data" where the writers cover very important concepts and problems surrounding federated search and instead offer up directories with some semantic metadata or relationship data that makes keyword searching possible over billions of documents. For anyone dealing with large volumes of data, this chapter is a great start to understanding the options you have to processing your data when you first get it (and only once) versus searching for that data just in time and paying for it in delay. While the former incurs much more disk space cost, Google has proven that paradigm shift definitely has merit.
Chapter Eight is about social data APIs and pushes gnip heavily as the de facto social endpoint aggregator for programmers. The chapter mentions WebHooks as an up and coming HTTP Post event transmission project but doesn't offer much more than a wake up call for programmers. The traditional polling has dominated web APIs and has lead to fragile points of failure. This chapter is a much needed call for sanity in the insane world of HTTP transactional polling. Unfortunately, the community seems to be so in love with the simplicity of polling that they use it for everything, even when a slightly more complicated eventing model would save them a large percentage of transactions.
Chapter Nine is a tutorial on harvesting data from the deep web. What they mean by this is that — given proper permission — one can exploit forms on websites to access database data and then index that instead of merely being relegated to static HTML pages. In my opinion, this is a fragile and often frowned upon approach to data collection but as this chapter (and many others) illustrates, sometimes data is locked up due to lack of resources to expose it. This means that if a repository of information is meant to be available to you through a simple submission form, you can tease that information out of "the deep web" and into your system with the tricks mentioned in this chapter.
Chapter Ten is the story of Radiohead's open sourced "data" music video of "House of Cards" and the collection process from the kinds of devices used to the methodology of collecting that data to the attitude they used when treating the data. This chapter is a sort of key for understanding what data you have with Radiohead's offerings and I heavily recommend it for anyone interested in taking a stab at this video. The most interesting things I found in this was their method for collection and, more importantly, their decision to actually degrade the data and opted not to texture when displaying Thom Yorke's face — citing artistic choice. This chapter gave me one very amazing display tool that I am embarrassed to admit I had no knowledge of prior to this book: processing.
Chapter Eleven is the story of a few people that chose to do something about serious crime problems in Oakland. The city was compiling reports of crimes weekly but they weren't opening up the data. You could do a search and get a very minimal display on a map of crimes that had happened. This caused Oakland Crimespotting to arise. At first they were forced to graphically scrape and estimate crime locations so their own system could offer it back to the user in more intuitive and useful ways to the citizens so the citizens could take action. At first they were forced to work around problems but in the end the city government came to its senses and began offering them the data in a far more open format. From browsing the site now, you can get an idea of the tale this chapter tells. The evolution of that end product is chronicled in this chapter.
Chapter Twelve center's on sense.us, a potentially powerful product that aims to empower users to analyze and create notations on graphs that might relay correlations between factors inside US Census data. The only disappointment with this chapter is that sense.us isn't live for us to use. The tool shows powerful abilities in collaboration in analysis of census data but also is a double edged sword. There's nothing that stops this tool from being used for political and monetary ideals instead of purely academic revelations. They used tools like Colorbrewer and prefuse to dynamically generate graphs and charts that were pleasing to the eye. Then they used 'geometric annotation' (a vector graphic approach to recording user's doodling and annotations) in order to facilitate collaboration. The notes the researchers took on the collaboration between their pilot users is probably more intriguing than their actual approach to display good graphics. Each user seemed to take a natural progression from annotation producer to annotation crawler and then bounce between them as other user annotations gave them ideas for more annotations to create. While not exactly ideal collaboration, it's interesting to hear what users do in the wild when left to their own.
Chapter Thirteen "What Data Doesn't Do" is a very short chapter with a set of ten or so rules that are intended to remind you that data doesn't predict, more data isn't always better or easier, probabilities do not explain, data doesn't stand alone, etc. This chapter felt sort of like a pause and remember way point through the book. Just when you've gone through these great stories of success, the book, reels you back into reality with this chapter. In other chapters you'll be reminded to avoid pitfalls like the narrative fallacy but this book just reminds you quite literally what data doesn't do automatically for you. It's an indicator that you need to shore up these things that data doesn't magically do when you present data.
Chapter Fourteen is Peter Norvig's "Natural Language Corpus Data" and does not disappoint. Once the reader is empowered with the code and the data in this chapter, it almost seems like one could solve several problems using ngrams, Bayes' theorem and natural language analysis. As you read this chapter, Norvig lays out how to tackle several problems with ease: decoding encryption levels up to WWII, spelling correction, machine translation and even spam detection. In just 23 pages, Norvig conveys a tiny bit of the power of a corpus of documents coupled with the willingness to be a little dirty (total probabilities summing to more than one, dropping ngrams below a threshold, etc). It's clear why he's employed at Google.
Chapter Fifteen takes a drastic turn into one of Earth's oldest data stores: DNA. As the chapter so coyly notes, programmers can view DNA as a simple string: char(3*10^6) human_genome; The chapter gives you a brief glimpse of DNA analysis but focuses more on the data storage involved in facilities that are currently working to harvest data from many subjects. As of the writing of this chapter, one facility was generating 75 terabits per week in raw data. Most interesting to me from this chapter was ensemble.org, a site to find DNA data, genome data and also collaborate with other researchers on annotating and commenting on certain parts and regions of DNA.
Similar to the previous chapter, Chapter Sixteen focuses briefly on chemistry and describes how data was collected "to predict teh solubility of a wide range of chemicals in non-aqueous solvents such as ethanol, methanol, etc." Having a very minimal chemistry background, it's never really revealed what purpose this data collection has but nonetheless the chapter explains a lot of challenges in this environment that are similar to other chapters. The interesting aspect of this chapter is that the team used open notebook science to collect this data and therefore faced the challenge of cleaning crowd-sourced data. A constantly recurring problem in these chapters is how one represents data and chemistry apparently has many standards — some more open than others. This book makes a very good argument for open standards and selecting open standards when one witnesses the screen scraping, licensing issues and costs researchers face when unifying data even for something as old as the representations of chemicals.
Chapter Seventeen is the case study of FaceStat, a statistically more ambitious Hot-or-Not effort from researchers. The site would allow anyone to upload a photo of a person and then allow users to rate them and tag them. After collecting this data, the researchers used the ubiquitous R statistical language to do some feature extraction on the data. Of course, the chapter first deals with cleaning the data and catching bad user input. While this chapter sounds like vanilla run-of-the-mill feature extraction, it also includes some interesting display examples as well as the very interesting yet controversial stereotype analysis. From taboo topics like attractiveness vs age line fitting to the sexism of tags to using k-means in order to establish stereotype clusters in the data. While other chapters sought offense through possible privacy concerns, this chapter reveals more about the callow stereotypes that internet inflict upon each other.
Chapter Eighteen looks at the San Fransisco Bay Area housing market from a very interesting selection of recent years. What differentiates this chapter from so many of the others (we collect, clean and process the data) is that it needed to break the data down by neighborhood to find the really interesting features of the data. The neighborhoods could then be grouped into six different groups with their increase in house prices to their decline in house prices. Only one group had one neighborhood that showed no decline (Mountain View). Unfortunately for this chapter and the next one, by the time the reader arrives they appear to be straight forward replications of ideas from other chapters. Chapter Nineteen is brief chapter on statistics inside politics. Aside from revealing five or six interesting correlations in voting revealed through data, this chapter merely relays what we already know: politicians implement statistics to a sometimes harmful degree (gerrymandering).
The last chapter is, appropriately, about the many sources of data exposed on the internet and the problems everyone faces in matching entities from one data source to another. The idea of using a URI to describe a movie hasn't really seemed to catch on. And if that wasn't enough, even words like "location" used to describe a column could mean drastically different things between houses and genomes. The chapter lists out a number of sources where data is available to download and tinker with (most already listed in the book) and proceeds to analyze an algorithmic (collective reconciliation) way for a system to differentiate between two movies with the same name. Naturally the author of this chapter worked on freebase which was recently (and predictably) acquired by Google. Although a short chapter, it speaks to problems that all online data communities face and what prohibits mashups from automagically happening between two disparate data sources holding data that is actually related.
With the exception of chapter six, every chapter offered me something that I won't forget. More importantly, most chapters offered a data source or data processing tool that expanded my toolbox of things to use when programming. The only reason this book misses a perfect 10/10 from me is chapter six and a couple of the later chapters feeling like weaker ideas from earlier chapters rehashed into a different domain. A worthwhile book if you work with data — whether you be a consumer or producer.
You can purchase Beautiful Data: The Stories Behind Elegant Data Solutions from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page. -
Processing Visualization Language Ported To Javascript
Manfre writes "On his birthday, John Resig (creator of jQuery) has given a present to developers by releasing Processing.js. This is a Javascript port of the Processing Visualization Language and a first step towards Javascript being a rival to Flash for online graphics content. His blog post contains an excellent writeup with many demos."