GTFS from a technological point of view

GTFS from a technological point of view

GTFS derives from the English initials of General Transit Feed Specification, known in Spanish as Especificación General de Feeds de Transporte Público. The objective is to develop a common format for public transport timetables and geographic information relating to them.

Thanks to this standard, transport operators offer their travellers journey information.

A little history

The story goes that it all began at Google back in 2005. The employees at this company have a policy named “20% time”, which allows them to devote 20% of their time to reading about all kinds of things and thinking about ideas and promoting experiments. Chris Harrelson wanted to incorporate travel information into Google Maps when he met Tim and Bibiana McHugh, a couple who worked at the transport company TriMet, in Portland, Oregon. At that time, the most popular map services offered travel instructions as you drove, while travel information in unfamiliar cities was still a utopia. Then, the Google employee and the couple from TriMet began to exchange information, timetable information in CSV format.

This resulted in Google Transit Trip Planner, and Portland was the first city included in the project with information about its metro system. Soon, more cities in the United States were included and its gradually spread around the world, as its popularity grew.

Overview of a GTFS feed

A GTFS file is a compressed file in ZIP format that contains several text files in CSV format. Normally, this is known as a GTFS feed.

Each text file that comprises the feed shapes a particular aspect of travel information, which we are now going to list below.

  • agency.txt

Obligatory. It defines one or several public transport companies that provide the data for this feed.

  • stops.txt

Obligatory. It specifies the stops that services are provided to.

  • routes.txt

Obligatory. It defines the public transport routes. A route is a series of trips that is displayed to passengers as a single service.

  • trips.txt

Obligatory. These are the trips for each route. A trip is a sequence of two or more stops that take place at a particular time.

  • stop_times.txt

Obligatory. Within a trip, it specifies the times at which a vehicle arrives at a stop and departs from it.

  • calendar.txt

Obligatory. It defines service patterns that the company operates in, for example every day of the week, only at weekends or Monday to Wednesday.

  • calendar_dates.txt

Optional. It indicates the exceptions to the service of calendar.txt, although it can replace it if it has all the service dates.

  • fare_attributes.txt

Optional. Here is where the route fares are defined.

  • fare_rules.txt

Optional. These are the rules that apply to the information on fares corresponding to routes.

  • shapes.txt

Optional. Here the rules are defined for the layout of the lines on a map. If this file is not defined, the routes would be drawn as straight lines.

  • frequencies.txt

Optional. Defines the time between trips for routes whose service frequency is variable.

  • transfers.txt

Optional. This file specifies the rules for establishing connections in the transfer points between routes.

  • feed_info.txt

Optional. Here, information is included about the feed itself, that is to say, it includes information about the editor, the version, and the expiration of the feed.

Generation of feeds

Normally, the data source is a Operation Support System or OSS, that it is necessary to connect to and get informed about in order to find what is needed for the feed. In these cases, the difficult work consists of exploring the different databases in search of the table that provides the necessary data. In reality, this type of connection is none other than a database where you can choose precisely what is needed.

However, the objective is always the same. To begin with, there is a search for the operators that comprise the feed to be created, which are normally unique. Also, there is an attempt to find the most static data, such as its stops, routes and calendars. From there on, the most difficult tasks are looking for the trips for each route and generating the times per stop. Generally speaking, the routes are not shown in a straight line, the shapes.txt file is provided so that the lines drawn on maps reflect the trips carried out along the appropriate routes (roads, train tracks or metro…). Unfortunately, information such as fares, frequencies and transfers is rarely included.

It is desirable to provide accurate information to the GTFS standard, so that its processing is quick and does not require a long time to be created, with information left in repositories such as FTP, Amazon S3, etc.

Likewise, apart from OSSs and the latter, there are many data sources that can be processed on Web Services, whether SOAP or Rest.

Technologies at our service

At Ingartek we commit to and promote the use of Free Software. Using Java technologies, our developments for handling GTFS feeds are built using Spring Boot framework and the tools created by Conveyal, OneBusAway and OpenTripPlanner, as well as those that Google provides.

Validation of feeds

Having generated a GTFS feed, it is necessary to check that it does not contain any type of errors. For that purpose, Google created FeedValidator, a tool that analyses GTFS feeds and generates a web report showing errors and warnings or recommendations.

This tool requires knowledge of the use of terminals/consoles and command lines, which refer to the GTFS feeds and, with different personalization parameters, generate the validations.

Having carried out this initial check, it is necessary to upload the GTFS feeds to Google´s Partner Dash platform. There, a second and last validation is carried out, before the information is subsequently made available to the public.

Conclusions

Ultimately, the GTFS standard requires specific knowledge for its implementation, whether technical knowledge regarding new technologies or knowledge inherent to transport. Also, it has to go through very strict filters that correspond to standards defined by Google, which demands mastery of a series of tools.