Get ready, get (large data) set, go!

Paul Fannon

Will the addition of authentic data somehow make statistics more interesting or more relevant? That was probably the idea behind the requirement to get all the exam boards to provide and assess a large dataset. However, in trying to come up with data relevant to everybody there is a danger that the result is, largely, interesting to nobody.

So how do you get your students interested in a dataset?

The “easy” way to work with the large dataset will be to just get the students to calculate lots of averages and standard deviations in the spreadsheet. Although this is a good first step, if it’s where the engagement with the data ends, it will have been a huge wasted opportunity. Equally, my experience with asking students to “investigate” large datasets results in banal studies of correlation between height and weight or the like.  These actually tend to have a negative effect with students learning that statistics is about data mining and answering boring questions.

What’s needed is a middle way – a series of questions that encourages exploration of the interesting features of the data, ultimately leading to the students themselves coming up with their own observations.

What type of questions might these be? Here are the sort of things that I have found useful in the past:

Deliberate misrepresentations

Get students to take on the roles of groups with prior agendas. Explore how the same data can be used with correct statistics to support different arguments. If a percentage goes from 10% to 15% is that a 5% increase or a 50% increase? How does the choice of group boundaries affect the message of a histogram? Which average could you use to support your argument? Should you use data from 2001 or start at 2010?

As well as developing critical thinking skills, this allows great cross-curricular opportunities looking also at the use of language and the ethics of summarising data.

Finding errors

Feel free to adapt the large datasets by adding a few errors, and then get students to find them.  You could encourage them to do this graphically, using statistical tests or you could ask more advanced students to automate the process.

Research the data

It is very easy for students to see the large dataset as a starting point for statistical process, but it is actually in the middle – it is the result of much experimental design, sampling and pre-processing. All the datasets provided by the exam boards are taken from professional research studies. It is very instructive to get the students to research the original methodology and try to recreate it themselves in their own school. There is an opportunity to get them to think about what the data is actually a sample of, why sampling is needed and get a real experience of why creating a truly random sample is difficult.

Work backwards

Rather than asking students to create their own graphs and statistics, a great way to get them to engage is to provide your own analysis and ask them which part of the dataset it came from.

Be deliberately vague

In the real world, there is gap between what you want to measure and what you can measure. Examinations often shy away from this, but teaching with the large dataset can embrace this. Ask which area of the country is fittest? Where would you most like to live? What has been the biggest change since 2000? Although students at first struggle with the open-ended nature of such questions, this is what makes for authentic statistics and can actually help embed the techniques in a more meaningful way.

The hardest year with large datasets is always the first year. To make the best use of the dataset there is no substitute for playing with it yourself and trying to find interesting or unusual features. Our textbooks provide a starting point for the type of question, which can get discussions going. However, your best resource is often your students. If you can get them intrigued by the data, you will find that they will explore and discover many interesting things that you can store and use in future years.

For more information, view our A/AS Level Mathematics and A/AS Level Further Mathematics series.


Paul now teaches at Cambridge University after recently teaching mathematics at the Stephen Perse Foundation, Cambridge. He has 15 years’ experience of teaching A Level Mathematics and Further Mathematics and the International Baccalaureate. He also has A Level examining experience and is currently active in educational research, specifically working towards a doctoral thesis with the Cambridge Faculty of Education looking at the development of students’ thinking skills.