Choose a Better Train With Web Scraping (hackaday.com)
szczys writes: Tired of his trains being constantly late, Eric Evenchick headed to the Via Rail (Canada's communter train service) website to find which trains had a better on-time rate. Unfortunately they only offer three days worth of data through the dropdown selections — but a bit of investigating showed the GET requests were open for about the last six months. Evenchick built a web-scraper with Python, along with a web interface that queries the resulting SQL db. The harvested data shows system-wide delays that average more than twelve minutes (mostly due to commercial rail having the right-of-way). The good that comes of this? You can now choose your train based on smallest likelihood of delay..
Check the site's terms of service, scraping site contents may be in violation of the ToS.
I wrote a similar app about 15 years ago to scrape the Edmonton Transit System's route schedules (conveniently posted in generally well structured HTML at the time) so I could build a relational system and try and sort out predictive routes / times. Then I found out what I was doing was in violation of their ToS, I stopped my scraping service immediately (before getting called on it).
So a guy wrote a script. Good for him, I guess, but why is this on /.?