Tuesday, January 13

rtd / bus schedules / transit data

I’m taking the bus today, so I got to thinking about bus schedules. I use Google Calendar a little bit (out of habit and convenience more than any particular love), and I was thinking “why doesn’t my calendar just know the times of transit routes I use?”

I thought maybe there’d be, say, iCal (CalDAV? What is actually the thing?) data somewhere for a given RTD schedule, or failing that, maybe JSON or TSV or something. A cursory search doesn’t turn up much, but I did find these:

  • http://www.rtd-denver.com/Developer.shtml
  • https://developers.google.com/transit/gtfs/reference?csw=1
  • http://www.rtd-denver.com/GoogleFeeder/
  • http://www.rtd-denver.com/GoogleFeeder/google_transit_Jan15_Runboard.zip

I grabbed that last one.

brennen@desiderata 16:16:43 /home/brennen ★  mkdir rtd && mv google_transit_Jan15_Runboard.zip rtd 
brennen@desiderata 16:16:51 /home/brennen ★  cd rtd
brennen@desiderata 16:16:53 /home/brennen/rtd ★  unzip google_transit_Jan15_Runboard.zip 
Archive:  google_transit_Jan15_Runboard.zip
  inflating: calendar.txt            
  inflating: calendar_dates.txt      
  inflating: agency.txt              
  inflating: shapes.txt              
  inflating: stop_times.txt          
  inflating: trips.txt               
  inflating: stops.txt               
  inflating: routes.txt              

Ok, so this is pretty minimalist CSV stuff from the look of most of it.

brennen@desiderata 16:22:12 /home/brennen/rtd ★  grep Lyons stops.txt
20921,Lyons PnR,Vehicles Travelling East, 40.223979,-105.270174,,,0

So it looks like stops have an individual id?

brennen@desiderata 16:24:41 /home/brennen/rtd ★  grep '20921' ./*.txt | wc -l
87

A lot of this is noise, but:

brennen@desiderata 16:26:23 /home/brennen/rtd ★  grep 20921 ./stop_times.txt 
8711507,12:52:00,12:52:00,20921,43,,1,0,
8711508,11:32:00,11:32:00,20921,43,,1,0,
8711509,07:55:00,07:55:00,20921,43,,1,0,
8711512,16:41:00,16:41:00,20921,43,,1,0,
8711519,05:37:00,05:37:00,20921,3,,0,1,
8711517,16:47:00,16:47:00,20921,1,,0,1,
8711511,17:58:00,17:58:00,20921,43,,1,0,
8711514,13:02:00,13:02:00,20921,1,,0,1,
8711516,07:59:00,07:59:00,20921,1,,0,1,
8711515,11:42:00,11:42:00,20921,1,,0,1,
8711510,19:10:00,19:10:00,20921,43,,1,0,
8711513,18:05:00,18:05:00,20921,1,,0,1,
8711518,06:47:00,06:47:00,20921,1,,0,1,
brennen@desiderata 16:26:57 /home/brennen/rtd ★  head -1 stop_times.txt
trip_id,arrival_time,departure_time,stop_id,stop_sequence,stop_headsign,pickup_type,drop_off_type,shape_dist_traveled

So:

brennen@desiderata 16:41:47 /home/brennen/code/rtd-tools (master) ★  grep ',20921,' ./stop_times.txt | cut -d, -f1,3 | sort -n
8711507,12:52:00
8711508,11:32:00
8711509,07:55:00
8711510,19:10:00
8711511,17:58:00
8711512,16:41:00
8711513,18:05:00
8711514,13:02:00
8711515,11:42:00
8711516,07:59:00
8711517,16:47:00
8711518,06:47:00
8711519,05:37:00

That first number is a trip_id, the second one departure time. Trips are provided in trips.txt:

brennen@desiderata 16:54:56 /home/brennen/code/rtd-tools (master) ★  head -2 trips.txt
route_id,service_id,trip_id,trip_headsign,direction_id,block_id,shape_id
0,SA,8690507,Union Station,0,   0  2,793219

I don’t usually use join very much, but this seems like a logical place for it. It turns out that join wants its input sorted on the join field, so I do this:

brennen@desiderata 16:54:38 /home/brennen/code/rtd-tools (master) ★  sort -t, -k1 stop_times.txt > stop_times.sorted.txt
brennen@desiderata 16:54:38 /home/brennen/code/rtd-tools (master) ★  sort -t, -k3 trips.txt > trips.sorted.txt 

And then:

brennen@desiderata 16:51:07 /home/brennen/code/rtd-tools (master) ★  join -t, -1 1 -2 3 ./stop_times.sorted.txt ./trips.sorted.txt | grep 20921
,Y,WK,Lyons PnR,0,   Y 16,79481043,,1,0,
,Y,WK,Lyons PnR,0,   Y 16,79481043,,1,0,
,Y,WK,Lyons PnR,0,   Y 15,79481043,,1,0,
,Y,WK,Lyons PnR,0,   Y 41,79480943,,1,0,
,Y,WK,Lyons PnR,0,   Y 41,79481043,,1,0,
,Y,WK,Lyons PnR,0,   Y 41,79481043,,1,0,
,Y,WK,Boulder Transit Center,1,   Y 41,794814
,Y,WK,Boulder Transit Center,1,   Y 16,794812
,Y,WK,Boulder Transit Center,1,   Y 16,794814
,Y,WK,Boulder Transit Center,1,   Y 15,794812
,Y,WK,Boulder Transit Center,1,   Y 41,794813
,Y,WK,Boulder Transit Center,1,   Y 15,794813
,Y,WK,Boulder Transit Center,1, 206  1,794816

Ok, waitasec. What the fuck is going on here? The string 20921 appears nowhere in these lines. It takes me too long to figure out that the text files have CRLF line-endings and this is messing with something in the chain (probably just output from grep, since it’s obviously finding the string). So:

brennen@desiderata 16:59:35 /home/brennen/code/rtd-tools (master) ★  dos2unix *.sorted.txt    
dos2unix: converting file stop_times.sorted.txt to Unix format ...
dos2unix: converting file trips.sorted.txt to Unix format ...

Why does dos2unix operate in-place on files instead of printing to STDOUT? It beats me, but I sure am glad I didn’t run it on anything especially breakable. It does do what you’d expect when piped to, anyway, which is probably what I should have done.

So this seems to work:

brennen@desiderata 17:04:45 /home/brennen/code/rtd-tools (master) ★  join -t, -1 1 -2 3 ./stop_times.sorted.txt ./trips.sorted.txt | grep 20921
8711507,12:52:00,12:52:00,20921,43,,1,0,,Y,WK,Lyons PnR,0,   Y 16,794810
8711508,11:32:00,11:32:00,20921,43,,1,0,,Y,WK,Lyons PnR,0,   Y 16,794810
8711509,07:55:00,07:55:00,20921,43,,1,0,,Y,WK,Lyons PnR,0,   Y 15,794810
8711510,19:10:00,19:10:00,20921,43,,1,0,,Y,WK,Lyons PnR,0,   Y 41,794809
8711511,17:58:00,17:58:00,20921,43,,1,0,,Y,WK,Lyons PnR,0,   Y 41,794810
8711512,16:41:00,16:41:00,20921,43,,1,0,,Y,WK,Lyons PnR,0,   Y 41,794810
8711513,18:05:00,18:05:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1,   Y 41,794814
8711514,13:02:00,13:02:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1,   Y 16,794812
8711515,11:42:00,11:42:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1,   Y 16,794814
8711516,07:59:00,07:59:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1,   Y 15,794812
8711517,16:47:00,16:47:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1,   Y 41,794813
8711518,06:47:00,06:47:00,20921,1,,0,1,,Y,WK,Boulder Transit Center,1,   Y 15,794813
8711519,05:37:00,05:37:00,20921,3,,0,1,,Y,WK,Boulder Transit Center,1, 206  1,794816

Which seems kind of right for the South & Northbound schedules, but they’re weirdly intermingled. I think this pulls departure time and a direction_id field:

brennen@desiderata 17:15:12 /home/brennen/code/rtd-tools (master) ★  join -t, -1 1 -2 3 ./stop_times.sorted.txt ./trips.sorted.txt | grep 20921 | cut -d, -f3,13 | sort -n
05:37:00,1
06:47:00,1
07:55:00,0
07:59:00,1
11:32:00,0
11:42:00,1
12:52:00,0
13:02:00,1
16:41:00,0
16:47:00,1
17:58:00,0
18:05:00,1
19:10:00,0

So southbound, I guess:

brennen@desiderata 17:15:59 /home/brennen/code/rtd-tools (master) ★  join -t, -1 1 -2 3 ./stop_times.sorted.txt ./trips.sorted.txt | grep 20921 | cut -d, -f3,13 | grep ',1' | sort -n
05:37:00,1
06:47:00,1
07:59:00,1
11:42:00,1
13:02:00,1
16:47:00,1
18:05:00,1

This should probably be where I think oh, right, this is a Google spec—maybe there’s already some tooling. Failing that, slurping them into SQLite or something would be a lot less painful. Or at least using csvkit.