Slashdot Mirror


Choose a Better Train With Web Scraping (hackaday.com)

szczys writes: Tired of his trains being constantly late, Eric Evenchick headed to the Via Rail (Canada's communter train service) website to find which trains had a better on-time rate. Unfortunately they only offer three days worth of data through the dropdown selections — but a bit of investigating showed the GET requests were open for about the last six months. Evenchick built a web-scraper with Python, along with a web interface that queries the resulting SQL db. The harvested data shows system-wide delays that average more than twelve minutes (mostly due to commercial rail having the right-of-way). The good that comes of this? You can now choose your train based on smallest likelihood of delay..

50 comments

  1. Canada's communter train service by xxxJonBoyxxx · · Score: 3, Informative

    >> Canada's communter train service

    But do they have anything for commuters?

    1. Re:Canada's communter train service by davester666 · · Score: 1

      A communter is a commuter going to a commune. So, yes, they do have something for that specific subset of all commuters.

      --
      Sleep your way to a whiter smile...date a dentist!
  2. See Via Rail limiting the GET requests in... by Ecuador · · Score: 3, Funny

    See Via Rail limiting the GET requests in 3... 2... 1...
    Well, OK, there's the weekend ahead, perhaps Monday? :)
    In any case it does look like commuter rail is a 2nd class citizen in Canada.

    --
    Violence is the last refuge of the incompetent. Polar Scope Align for iOS
    1. Re:See Via Rail limiting the GET requests in... by cdrudge · · Score: 2

      See Via Rail initiating lawsuit against Eric Evenchick in 3... 2... 1...

      FTFY

    2. Re:See Via Rail limiting the GET requests in... by tlhIngan · · Score: 1

      See Via Rail limiting the GET requests in 3... 2... 1...

      Or fixing their database to delete rows older than 3 days.

      Then again, sometimes the right thing does happen - the company involved makes the data available and makes everyone happy. I mean, if the train is delayed because of other rail traffic, then maybe if the government comes asking about on-time rates being so poor, they can show them the data.

    3. Re:See Via Rail limiting the GET requests in... by Kergan · · Score: 1

      There are many ways to work around that, e.g. crawlera.com (disclaimer: working there)

  3. Violating ToS? by Anonymous Coward · · Score: 3, Insightful

    Check the site's terms of service, scraping site contents may be in violation of the ToS.

    I wrote a similar app about 15 years ago to scrape the Edmonton Transit System's route schedules (conveniently posted in generally well structured HTML at the time) so I could build a relational system and try and sort out predictive routes / times. Then I found out what I was doing was in violation of their ToS, I stopped my scraping service immediately (before getting called on it).

    1. Re:Violating ToS? by Anonymous Coward · · Score: 1

      I'm waiting to see a court tell someone they can't use software except a federally approved browser to retrieve data from a web URL.

    2. Re: Violating ToS? by Anonymous Coward · · Score: 1

      Learn how proxies work you milquetoast pussy.

    3. Re: Violating ToS? by Anonymous Coward · · Score: 0

      Ever heard of Chromedriver?

    4. Re:Violating ToS? by Anonymous Coward · · Score: 1

      If we're now in a world where one an be bound by terms one never agreed to, then my terms of service to Rail Canada reads as follows:

      "By returning data to my browser's HTTP request, you hereby agree that you owe me one million dollars. If you do not agree with these terms, you may not return data to my computer."

      What's that? They will add me to a block list? Sorry, too late already. The debt is already incurred, when they first agreed to my terms by returning said data.

    5. Re: Violating ToS? by Anonymous Coward · · Score: 1

      Yes, my point is there will always be tools like that. They are no different conceptually from a browser, unless you want to start enforcing mandatory adherence to HTML rendering specifications in which case ALL the major browser companies are going to be terrified.

      Now, the last few years have definitely taught me our legal system never backs down from a challenge to make horrible decisions, so I am sure eventually this could be legally problematic, but for now, the point is, as long as you are using the system yourself and not wholesaling or mass-distributing the data you've scraped, there is no legal problem as I see it.

    6. Re:Violating ToS? by Anonymous Coward · · Score: 0

      You sir are thicker than a whale omelet.

    7. Re:Violating ToS? by Anonymous Coward · · Score: 0

      Go to North Korea and you can witness it firsthand.

    8. Re: Violating ToS? by Anonymous Coward · · Score: 0

      Mmm... milquetoast pussy.

    9. Re:Violating ToS? by Lunix+Nutcase · · Score: 1

      Using there site is an agreement to the ToS. Are you dense or really that stupid?

    10. Re:Violating ToS? by Lunix+Nutcase · · Score: 1

      And yes I typo'd their. Bite my ass.

    11. Re: Violating ToS? by Anonymous Coward · · Score: 0

      Using their site to view the only available copy of the TOS is an agreement to their TOS.

    12. Re:Violating ToS? by Anonymous Coward · · Score: 0

      And sending data in response to my HTTP request is an agreement to MY ToS. It's right there in the request I send the over the wire. If they do not like this, they are welcome not to reply to the request.

    13. Re:Violating ToS? by Anonymous Coward · · Score: 0

      Like that guy, weewv? That accessed the iPad users data at AT&T website writting the url?

    14. Re:Violating ToS? by Anonymous Coward · · Score: 0

      Then I found out what I was doing was in violation of their ToS, I stopped my scraping service immediately (before getting called on it).

      Have you ever heard the saying, it's easier to ask for forgiveness than permission? Take the example of Thomas Peterffy, the founder of Interactive Brokers and an early innovator in electronic trading and disruptive finance technologies. An engineer by training, he once cut the wires off the back of a Quotron device, hooked them up to an Oscilliscope and reverse engineered the data stream so that he could feed it into his trading software. Do you think he asked Quotron for permission to do this? Would they have given it if he had? Of course not. Did it violate the "Terms of Service" or similar agreement? Almost certainly. Another time he again reverse engineered a data stream and used software to put trades into the NASDAQ platform without using the keyboard / terminal. When the NASDAQ folks found out about this, they tried to shut him down. Undeterred, Mr. Peterffy built a robot to type the trades into the terminal keyboard instead, sidestepping their silly rules. Progress comes from people who challenge the status quo and shake things up, not from those who follow the "Terms of Service" and do only what someone else gives them permission to. We need more risk takers and less rule followers. At least in this, Silicon Valley is right.

    15. Re: Violating ToS? by Anonymous Coward · · Score: 0

      Mmm... milquetoast pussy.

      A pussy that actually tastes good? Now that's a real innovation!

    16. Re:Violating ToS? by Anonymous Coward · · Score: 0

      your an moron.

    17. Re:Violating ToS? by Anonymous Coward · · Score: 0

      These days, assume anything useful involving an electronic device is illegal and take measures to protect/anonify yourself before you do it anyway. This is an essential life skill which should be taught in schools, like road safety or sex ed.

    18. Re:Violating ToS? by Anonymous Coward · · Score: 0

      Does it make you read their ToS in full and agree before you can use the site? If that were the case I imagine the GET requests shouldn't be getting through.

  4. Good luck with that by rsborg · · Score: 1

    Clearly the website is based on a loophole, which can/will be closed at any time. Given the litigious nature of most corporations (and in this case, possibly a government agency), I wouldn't be surprised if the author doesn't get a cease & desist and/or lawsuit coming his way.

    Other than that, this is pretty awesome and a hacker-worthy effort.

    --
    Make sure everyone's vote counts: Verified Voting
    1. Re:Good luck with that by Anonymous Coward · · Score: 0

      I wouldn't be surprised if the author doesn't get a cease & desist and/or lawsuit coming his way.

      That's why you scrape through a randomized rotating proxy server list with a random retry interval.

  5. I wonder if by Anonymous Coward · · Score: 0

    they include data on crew/dispatchers. That might also affect timeliness

  6. It's nice to see something good for a change by Opportunist · · Score: 1

    It's not often that sloppy security on commercial sites are working in favor of their customer.

    --
    We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
  7. Neither commuter nor "communter" by Roadmaster · · Score: 5, Informative

    VIA Rail is NOT a commuter train service. It offers "intercity passenger rail services", not commuter service, which Wikipedia defines better than I can: "Commuter rail, also called suburban rail, is a passenger rail transport service that primarily operates between a city centre, and the middle to outer suburbs...". Again, not what VIA Rail primarily does.

    Examples of agencies which offer commuter rail service in Canada include Greater Toronto's GO Transit trains and Montreal's AMT. These do, indeed, offer service between communities forming part of a greater metropolitan area and said area's city centre. At least in Montreal, the AMT has some exclusive tracks and agreements on shared tracks which prioritize commuter trains over other scheduled trains at rush hour.

  8. Nah by zAPPzAPP · · Score: 2

    I'd rather choose my train based on where it's headed.

    Being on time at the wrong destination is kinda useless.

    1. Re:Nah by Anonymous Coward · · Score: 0

      Indeed.

    2. Re: Nah by Anonymous Coward · · Score: 0

      Yes, the script is useless. I tried it and you also can't get the average delay for next week trains, only for trips from the past. But I don't want to take the train last week, I want to take the train next week!

  9. Choose when you want to travel by Anonymous Coward · · Score: 0

    If you have to be somewhere at a certain time, your choice of trains is limited.

  10. Guy Writes Script by Anonymous Coward · · Score: 2, Insightful

    So a guy wrote a script. Good for him, I guess, but why is this on /.?

    1. Re:Guy Writes Script by Anonymous Coward · · Score: 0

      in canada, it's no big deal.... but in the u.s., he would've been charged with hacking and unauthorized use of computer, and whatever other bullshit charges a prosecutor looking for their 15 minutes could think of.... people have been charged, and convicted, of much less.

    2. Re:Guy Writes Script by Lunix+Nutcase · · Score: 1

      By Slashdot's standards today that makes you a programming wizard.

    3. Re:Guy Writes Script by h33t+l4x0r · · Score: 1

      Not just any script, a Python script. That means he probably read a tutorial too.

  11. why didn't I think of that ? by Anonymous Coward · · Score: 0

    Choosing my train based on the likelihood of delays instead of on where I am, where I want to go and what time it is. brilliant, wish I thought of it.

    1. Re:why didn't I think of that ? by jonwil · · Score: 1

      The guy who wrote the script is probably checking all the different services from where he is to where he wants to go to figure out which time of the day he should travel (and on which service) to have the greatest chance of avoiding delays.

  12. Some train companies have an API for that... by Anonymous Coward · · Score: 4, Informative

    See the National Rail Enquiries APIs. Loads of information on train timetables, delays, maintenance schedules, and almost all for free.
    http://www.programmableweb.com/api/national-rail-enquiries

  13. he's toast by Anonymous Coward · · Score: 0

    yes, they indeed have their own private police force.
    with apologies to Johnny Fever, train cops are NOT a myth (part of the Railway Act in 1923, iirc).

    mark my words, if he's not rich and white, and maybe even if he is, he's going right to go to jail for this sort of planned terrorist activity.
    not only for plotting such harm against the nation's critical infrastructure, but actually carrying it out, too!
    and civilly, they're going to seize all of his assets.

    there's a reason it's called being railroaded.
    they can't actually kill him while the cameras are looking, by the time they're done, he'll have sure wished they did.

    1. Re:he's toast by Anonymous Coward · · Score: 0

      RTFA.
      It's in Canada not USA.
      He'll get a free dobble dobble, a complimentary Hudson's Bay tooque, and an apology.
      (sorry!)

  14. I did something similar way back in 1999. by Anonymous Coward · · Score: 0

    Back then, Canadian National Railway, (the transcontinental freight RR whose track the vast majority of Via Rail passenger trains operate on), had a railcar tracking webpage, totally open, no login needed. One could enter, ten at a time, the fleet numbers of the Via coaches, sleepers, dining cars, etc (didn't work for locomotives ), and receive formatted text of the cars last known location and train # from across Canada, from CN'sdata gleaned from trackside RFID scanners. I had a script for several hundred cars, polling the Site on a daily basis. I would make monthly reports of Via Rail train timekeeping and car utilization and movements, to post to passenger rail discussion boards.

    Alas, soon after 2001/09/11, CN cut off general public access to this data and made it a customer only login Site, for obvious paranoid security reasons. Fun while it lasted, even when this serviced was abused by railfans.

  15. Philadelphia by fulldecent1341 · · Score: 1

    For Phildelphia is the US, please see TrainView and http://phor.net/apps/septa/ This includes a live and 8-year history of train on-time-performance and analysis of lateness.

  16. Not for long by nospam007 · · Score: 1

    "The good that comes of this? You can now choose your train based on smallest likelihood of delay."

    Do it quickly, because, like always in these cases, the guy will be sued for data theft in 3, 2, 1, ...

  17. wrong solution by Tablizer · · Score: 1

    Tired of his trains being constantly late,

    Get up earlier instead of staying up too late writing silly scripts.

    1. Re:wrong solution by tlhIngan · · Score: 1

      Tired of his trains being constantly late,

      Get up earlier instead of staying up too late writing silly scripts.

      How does getting up earlier deal with issues of the train being late? If the 7:00 AM train consistently comes at 7:10AM, waking up 10 minutes earlier does nothing.

      Nor does it help if the train usually comes in at 7:00AM, but sometimes comes in at 7:10AM.

      Neither does waking up early help if the train (let's say it departs at 7:00AM and arrives at 8:00AM) consistently comes at 7:00AM and routinely gets delayed to 8:10AM.

      The only time it might help is if there's a 6:00 AM train and an 8:00 AM train, and he could choose between two or three of them. Then he'd need to figure out which one would be the best one that's not late. Which means he'd still need the data to analyze to figure which one is consistently more on time.

  18. How about for Amtrak? by Anonymous Coward · · Score: 0

    Damn, someone should do this for Amtrak.. They're on-time performance is _HORRIBLE_