A First Look at WMATA's New GTFS Data
There were long delays, petition drives, and some final technical hiccups, but WMATA has finally released its schedule data in the Google Transit Feed Specification format. What does that mean? Well, most obviously it means that Google Transit will soon be adding D.C. to its list of supported cities (UPDATE: or perhaps not — see below for a comment from Michael Perkins of GGW, who explains that there are lingering complications surrounding WMATA's legal relationship with Google). But far more exciting is the opportunity this dataset represents to third-party developers. You can bet that geeks across the region were feverishly importing schedule data into databases last night (I certainly was).
So what's in a GTFS file, anyway? You can read the full spec here if you'd like, but the short version is actually pretty simple: a bunch of text files are zipped up into a single archive, which can be downloaded from the transit agency's website — in WMATA's case, the file clocks in around 20 megabytes. These comma-separated text files have names like routes.txt, stops.txt and stop_times.txt, and they can be opened in a text editor or spreadsheet program. The setup is pretty simple to understand: for example, stops.txt contains a list of bus and rail stops, complete with information like name, latitude and longitude, and assigns each one an ID. stop_times.txt, on the other hand, has a bunch of entries that assign arrival and departure times to individual routes, linking back to the stop information via each stop's ID.
You don't have to understand the setup to start playing with the data, though. If you're on Windows (or handy with installing Python applications) you can check out out Google's Schedule Viewer — the download is here, and runs about 5MB. You'll need to download the WMATA data too, of course, and unzip both archives to locations you can find later. Then run schedule_viewer.exe and, if all goes well, you'll be prompted for the location of the GTFS files you just uncompressed. It'll chew them over, then give you a URL that you can use to browse the dataset on your local machine.
Frankly, though, the results are a little funky — the WMATA dataset appears to be so large that the Schedule Viewer seems to display only part of it at a time, and not in a particularly useful way. You can find Google's instructions here — if you have better luck than us, let us know in the comments.
There are other minor points worth griping about, too. Right now, WMATA makes users agree to licensing terms each time they download the GTFS dataset. We're less concerned about those terms than GreaterGreaterWashington is, but the captcha on that page is a significant problem. By including it, WMATA is demanding human intervention in order for developers to update their datasets, which means they're likely to sync up less often and less quickly. Given how much carping the transit agency did about Google's inability to keep its data as up-to-date as WMATA would like, this seems like an odd choice.
But of course the data has only been online for one day; we're sure this sort of glitch can be worked out as we move past the tip of the GTFS technical iceberg. If the Schedule Viewer whetted your appetite, you may also want to check out the free Quantum GIS viewer and feed it some of the DC GIS shapefiles — combined with the stop information from the GTFS dataset, you ought to be able to make some plots of the District's transit stops that are considerably nicer than the one attached to this post. And from there, the world of technical transit geekery begins to expand — GraphServer! Mapnik! iPhone libraries and visualization frameworks and... well, the point is that this GTFS business has given local developers a lot to chew over. So fess up, fellow nerds: what are you working on?
