Data sources

Site: Mobility Academy
Course: Using market research to help optimise your public transport system
Book: Data sources
Printed by: Guest user
Date: Thursday, 27 January 2022, 3:44 PM


This section talks about the various sources of data available to you and when each is most appropriate to use.

1. Internal sources of secondary data

Depending on the scope of the tasks and your organisational structure, there are vast possibilities to identify internal data sources important for public transport. They may include:

  • the accounting department (i.e. costs and incomes)
  • timetables and their execution (i.e. punctuality,  completion of particular departures/arrivals, frequency, average speed, travel times, volume of vehicle-kilometres and  hour-vehicle-kilometres)
  • the controlling unit (cleanness, general aesthetics of rolling stock and bus stops, drivers, availability of passenger information)
  • sales (volume of sales, their structure, number of reduced fare tickets, etc…)
  • fare control (number of passengers checked, number of "free riders"…)

There is also a lot of interesting data in different departments of the city administration, especially the spatial planning department, the traffic and road department, the department responsible for vehicle registration and the  department responsible for citizens registration, etc…).

The results of previous marketing research can also offer interesting data, especially as a starting point and/or as a source of comparative data.

2. External sources of secondary data

Other important data sources are external ones and the range of them is larger than you may imagine. Some possibilities include:

  • data from other companies, stakeholders and operators (as a benchmark) such as annual reports, other financial reports, adverts, tendering announcements. This type of data is the most difficult to collect; very often it is not publicly available.
  • specialised journals, especially publications of chambers of public transport (if they exist), scientific journals, statistical bulletins
  • case studies
  • deliverables within different transport/mobility focused projects co-financed by the EU - many of these are described in the case study database on the EU's Eltis website
  • general statistical publications (e.g. data on individual motorisation, customer price index, GDP, general wealth, demographic data and other parameters of local, regional and national population)
  • syndicated service data which are provided by companies that collect data in a standard format and make them available to subscribers. Such data may range from very general to highly specialised/specific and is generally not available to the general public
  • external data bases

You can also go to for more ideas, and depending what you want to research, you may have other ideas yourself. It pays to keep an open mind with regard to the possibilities.

3. Evaluation of secondary data

It is very rare for the secondary data available to exactly match your research problem. Although secondary data offer a lot of advantages (time and cost savings, enriching primary data), there are some problems associated with them. To evaluate their usefulness, you should ask yourself some questions, such as:

  • What was the purpose of the study for which the data were collected?
  • Who collected the data and published the information?
  • What information was collected?
  • How was the information obtained?
  • How consistent is the information across sources?

4. Primary data

Primary data refers to information that is collected by the researcher specifically for the research project at hand. Different research problems may require different types of primary data collection methods, including surveys, observation, registration and experiment.

One of the most popular methods is the survey, which is used in different forms including:

  • person-administered surveys
  • computer-administered surveys
  • self-administered surveys
  • hybrid surveys, which use multiple data collection methods

Their further classification is based on the form of contact with the respondent, e.g. direct, phone, traditional mail, electronic form.

modes of data collection