30 July 2014

Scatter Graphs

This blog is becoming a bit statistics-heavy, which is ironic given that I don’t particularly like teaching statistics at Key Stage 3 and 4.  I much prefer teaching algebra. I’m also quite keen on surds and indices. And I like trigonometry. In fact, apart from the odd annoying-to-teach topic like loci and constructions, I’d put most data handling topics right at the bottom of my ‘things I enjoy teaching’ list. It never fails to amaze me how long some teenagers take to plot a graph. What a faff.

Anyway, this post is all about ideas for teaching scatter graphs. And I intend to follow it up with a post looking at applications of box and whisker plots outside the maths classroom.  But then I’ll stop with all the statistics.

Practical activities
Unusually for a maths lesson, there's opportunities for some practical activities when we introduce scatter graphs. The obvious idea is to have students plot height or hand span against shoe size. If you want to get even more practical, try a human scatter graph in the classroom or in the playground (using chalk axes).

I like to use the two scatter graphs shown below (stolen from a Ten Ticks worksheet) in a class discussion about outliers.  Ask your class what could have caused these extreme values and you'll probably get some really creative suggestions!

Line of Best Fit
I teach my students to draw a line of best fit by trying to minimise the distances between all the points and the line. That’s what a least squares regression does. Contrary to popular belief, there doesn't have to be the same number of points above and below the line. Nor does the line have to go through any of the points. And the line certainly doesn't have to go through the origin. Beware of dodgy methods taught in other subjects. This misconceptions activity by frickard on TES is helpful - I've used this in a 'hands on head if you think it's true, hands on hips if you think it's false' activity.

Impossible Predictions
'Extrapolation' from regression lines is something that comes up at A level, but the idea can make for interesting discussions at Key Stage 3 or GCSE. The key point is that even if we observe a linear relationship in the data we're examining, we can't assume that relationship extends beyond the range of the data. Here's a nice example for class discussion:

The graph below shows a negative correlation between year and time taken by athletes to run 100m. It suggests that the speed of 100m runners has increased over time. A line of best fit has been drawn. Why do you think 100m runners are getting faster? Do you think this trend will continue?

Possible reasons for the increase in speed include: advances in technology (ie more lightweight or streamlined clothes and trainers), research into optimal diets and training plans, sponsorship meaning athletes have more time to train, the introduction of starting blocks in the 1940s and an improvement in the accuracy of timing the race.

If the trend continues indefinitely (ie if we draw the line of best fit into the future) then at some point people would be able to run 100m in zero seconds, which is impossible. There must be a point at which the time will plateau, due to limits in humans' physical capabilities. Although there is a linear correlation, we must be careful not to make predictions in the past or future because the linear relationship is over a limited time period.

The graph comes from this Minitab Blog post and the idea was put into my head by Kings College London PGCE tutor Chris Olley. 

"Wind is caused by the rotation of windmills"
I love the causation vs correlation discussion.  It's a very important learning point for students who will come across all sorts of dodgy statistics in everyday life.


A typical example is shoe size and reading age. The two variables are positively correlated, but obviously the size of someone's feet has no impact on their reading ability. For more ideas check out this spurious correlations website.

If you're planning a class discussion on causation then have a quick read of 'Correlation does not imply causation'on Wikipedia (not always the best source but gives some good examples).

Correlation does not imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing "look over there". – xkcd 

Finally, here's a few resources that you could use in your lessons:
  • I love this Bee Aware activity from Don Steward. It could easily be made into a mini-project/display work about the cause and impact of the demise of honey bees.
  • The excellent Mathematics Assessment Project gives us two tasks - Scatter Diagram and Birds' Eggs
  • There's no shortage of real-life applications of scatter graphs. This worksheet 'Shake It Up with Scatterplots' is one of many examples (here's the solutions).
  • Nrich gives us this nice David and Goliath activity.
  • 'Foldables' seem to be a big thing in US math lessons, but I've never seen them in the UK. I think my students would quite like them. Here's an idea for correlation foldables from the blog 'Everybody is a Genius'.
  • I like this online tool for checking the accuracy of lines of best fit from Mark Ritchings
  • Here's an interesting group project. I like the idea of giving each pair/group two options so they can choose the variables they prefer, and that everyone in the class is working on something different. Also, the 'assigned tasks' for each group member (see last page) is an idea I haven't seen before for organising group work. These particular tasks may not be suitable for a UK lesson because we don't expect students to calculate the equations of lines of best fit, but the idea could be adapted.

I hope this post has been helpful.  Please comment if you have any ideas to share. 

By the way, this cool moving scatter graph is worth a look.



  1. Great post, it seems we have similar feelings on teaching data topics at KS3/4 as well! The pedant in me feels the need to point out that you have used the term "outlier" as interchangeable with anomaly, but it's not.

    1. You know, as I typed the word anomaly, I had a feeling it was wrong. I think I once read in a mark scheme that I could accept 'extreme value' but not 'anomaly' but I didn't give any thought to why. I've just looked it up and found - correct me if I'm wrong - 'An anomaly should be omitted from a data set for being impossible (eg getting 120% on a test) whereas an outlier is an extreme piece of legitimate data eg getting 100% on a test where everyone else got 40-50%'. So I've corrected it in my post! Thanks! I've learnt something new.

  2. Brilliant online tool for practising your line of best fit skills! http://burymathstutor.co.uk/BestLine.html