The raw data

The survey was conducted on two separate dates at several points on the roads around the campus. The counts of vehicles can be summarised in terms of total numbers of each type counted on each day and at each type of road (A road vs non A Road.

The survey was actually conducted as a series of ten minute monitoring periods. This design could, in some circumstances, be used to produce inferential statistics such as confidence intervals for means and formal tests of hypotheses. However in order to apply these methods correctly additional knowledge regarding the assumptions involved in the procedures and the correct interpretation of the result is needed.

So, for this exercise you should concentrate on summing up the counts of observations rather than calculating means and medians.

Visualising the data

The patterns in the data can best be seen through producing bar-charts. A good rule when producing figures is think about how the figures can be used by the viewer to detect contrasts.

How many contrasts can be spotted in this figure?

Aggregating the data

Most of the key patterns can be spotted if the raw data are presented in a well designed series of bar-plots. However it can be useful to aggregate the data in order to produce clearer summaries. For example you might have summarised the data through grouping at the level of road type and date of survey.

Visualising summarised data

These data might be visualised using stacked bar-charts. It can be helpful to display the numbers on the charts. This allows the reader to quote the actual data without referring to a data table in addition to the figure. In effect the figure below is a visual version of the data table above.

Summarising as percentages

Many of the reports summarised the data in the form of percentages. This can be useful but it is important to ensure that the denominator used in the calculation makes sense. A percentage (or proportion) involves dividing each value by an an overall total number.

The relative frequencies can be seen much more easily as a labelled bar-chart.

Why pie charts should not be used.

Pie charts are generally considered to be poor figures. Although it is possible to interpret a very simple pie chart with two or three slices, as soon as a pie chart becomes more detailed visual interpretation becomes more or less impossible.

What is wrong with confidence intervals?

Many of you showed plots with confidence intervals. It is quite difficult to explain why this is wrong, as in some respects it is not wrong at all. Using confidence intervals to express uncertainty about the true mean is extremely good practice. So I would not mark you down for this. However confidence intervals are usually calculated using data in which each element in the sample represents a measurement on the same experimental unit. For example if we measured a set of oak leaves, some from the shady side of a tree and some from the sunny side, then each measurement is on an oak leaf. We can group them into two classes (You may do this experiment next year).

We could try to calculate confidence intervals based on the data shown below.

This would “work” in the sense that the calculations can be done. However because we have pooled diverse observations on different vehicle types the result would not be strictly valid.

Confidence intervals are used for testing statistical hypotheses. Our observations on the traffic flow form a sample of observations drawn from all those that might possibly be drawn. Each observation varies. If this variation is attributed to random “noise” then any sample of observations will differ in some random manner. Correctly calculated 95% confidence intervals allow us to visualise the potential for variability between similar studies. The true mean value would fall within the 95% confidence intervals 95% percent of the time (i.e. 1 in 20 chance of being outside).

If we just filter out the cars we could potentially try this to see if there is any genuine difference between the observations made on each day.