Category Archives: statistics

SEDA Fellowship report 2012-2013

Introduction

For the second year running I have opted to put my SEDA Fellowship report on my website (last year's here). Although I am currently working at the LLAS Centre for Languages, Linguistics and Area Studies at the University of Southampton, I will be joining the Centre for Learning and Teaching at the University of Brighton in September. I was offered the Brighton job back in May so I am very much in a transition frame of mind at present.

Career development

After ten years at the LLAS Centre (counting the centre in its LTSN/ HEA subject centre forms) I felt it was now time to move on and undertook a UK-wide job search. The end result was an offer from the Centre of Learning and Teaching at the University of Brighton. Last week I visited Brighton for the university’s internal teaching and learning conference and heard about a lot of the interesting things about some of the interesting things which are going on there. It was also nice to spend time getting to know some of my new colleagues as well.

Statistics for Humanities

twitter
Tweet referring to the Statistics for Humanities book.

This past year has been mostly project based. My Statistics for Humanities student ‘text-book’ is available in draft form and I am awaiting comments from the British Academy nominated reviewers.  The British Academy agreed that I could put a draft online for a crowd sourced review. This has led to receiving many helpful comments, and one academic in particular has provided some very extensive feedback. I have long been dissatisfied with introductory statistics textbooks. I hope that mine will reach out to students (and academics) who struggled in the past. The examples in the book come from the humanities and I have attempted to write a book which uses a verbal reasoning-based approach which should resonate better with humanities students than some other texts.

EU Quality Assurance project

University of Aveiro, Portugal. Venue for our third project meeting in December 2012.

We are coming to the end of the second year of this 2-year EU-funded project,  Sharing Practice in Assuring and Enhancing Quality (SPEAQ) which follows on from LANQUA (the Language Network for Quality Assurance). I didn’t work on LANQUA and hadn’t worked on an EU-project before. I was quite apprehensive about being involved in the project as I had seen colleagues undergoing the stresses of running a project which involves administrative complications (e.g. currency conversions and daily rates) as well as working alongside colleagues in other countries who work in very difference pedagogic, policy and quality environments.   Fortunately our assistant director (and my line manager) Alison Dickens is an experienced director of EU–projects and our senior administrator Sue Nash has worked on them before, so, fortunately for me, I have been able to concentrate mostly on content issues.

Delegates at SPEAQ workshop, Tallinn
Delegates at SPEAQ workshop, at European Quality Assurance Forum, Tallinn

In the first year of the project we developed a workshop in which staff, students and quality managers can participate together. I played a big role in this aspect of the project producing a dialogue sheet and writing facilitator instructions. Along with our Danish colleague Ole Helmersen from Copenhagen Business School I attended the EQAF Forum in Tallinn, Estonia where we tried out the workshop on a large group of quality professionals from a range of European countries.

As well as running the workshop the EQAF conference was a great staff development opportunity for me. As a QE person rather than QA person it was interesting the meet people who operate in very different QA systems. The UK seems to be fairly in the middle between those countries in which QA is very highly centralised and regulated through to countries where QA is virtually non-existent—at least in the way that I understand it. If there is one thing that all countries seem to have in common it is that QA appears very different from teaching. As one person I met pointed out, a poor teacher is not a quality issue as far as most university structures are concerned. Even at the Senior Manager level there is often a separation of roles between the person in which of QA and the person who in charge of teaching and the curriculum.

For the second part of the project each partner does their own small-scale project which meets a particular institutional need. At Southampton we decided to do a project on feedback, called "Getting the Most Out of Feedback" (GMOOF). The core principle of GMOOF is that everybody, whether a member of teaching staff, a student or a quality manager, is both a provided and recipient of feedback. The principles of good feedback: Relevant, Timely, Meaningful and with Suggestions for improvement (See Race online), apply to all feedback, not just feedback from teacher to student but also student to teacher, student to student, teacher to teacher etc., teacher to quality manager, teacher to professional body etc. etc. GMOOF is a website which focuses on giving good feedback and making the most of feedback from others rather than focusing on different job roles. (The website is under development at present). A workshop based on the project is being developed and will be piloted in Southampton in September – I’ll be in Brighton by then so will not be leading it(!) Additional material for the website includes a card sort (built using the free software nanDECK), a series of feedback videos with reflective questions (built in xtranormal and put up on youtube), videos of interviews about feedback with the project team and other colleagues at Southampton, and online quizzes for staff and students. There is also a section specific on how we at Southampton work to enhance the quality of teaching across the university.

Teaching

My teaching this year has focused in two major areas. I have been contributing to the interdisciplinary Curriculum Innovation module “Sustainability in the Local and Global Environment"). 2012-13 was the first time this module has run and I benefited greatly from working with National Teaching Fellow Simon Kemp. It has been some years since I taught undergraduates and the modules made extensive use of technology (including Twitter, Panopto, Blackboard) and had a variety of assessments including a presentation, conference paper and group film project.

My other teaching responsibility has involved teaching research skills to (mostly Humanities) doctoral students. I have run numerous sessions on everything from putting the thesis together, preparing for the viva, ethnographic methods, critical thinking and applying for funding. Most of my materials are available in the HumBox under a Creative Commons license. Students produce critical reflections on the sessions, which also provide me with feedback.

Other work.

I continue to undertake evaluation for Routes into Languages programme which is funded to increase the uptake of languages in schools. I was recently a keynote speaker at the conference Innovative Language Teaching and Learning at University: Enhancing the Learning Experience through Student Engagement at the University, which was held at the University of Manchester.

I also presented at the LLAS e-learning symposium about my online open access language teaching research website YazikOpen. I have also been preparing materials for the LLAS annual Heads of Department workshop, which is entitled “Thriving for the Public Good”

Future

At Brighton I am expecting to be involved in a variety of academic development activities including working with teaching staff to apply for the HEA Fellowships, blended learning and undertaking research. I will also being going to Plymouth in November to undertake PASS (Peer Assisted Study Session) Supervisor Training.

  • Twitter
  • del.icio.us
  • Digg
  • Facebook
  • Technorati
  • Reddit
  • Yahoo Buzz
  • StumbleUpon

The order of learning, or does it matter in what order you learn things?

As a newbie to textbook authoring I am confronting the issue of the order of learning. Should you learn about a student t-test before an F-test? Pearson's product moment correlation co-efficient before Simple Linear regression? Perhaps it doesn’t matter in these cases but I am currently dealing with contradictory pieces of feedback regarding the chapter on presentation of data.

Initially I put it early on the book. I reasoned that things like bar graphs and scatterplots would be pretty familiar to students, and that the main aims of the chapter would be concerned with choosing appropriate ways to display data. After all my seven year-old knows about bar graphs. However, the risks of producing confusing, misleading or inappropriate graphs and charts is a real one. Moreover, bad graphics are highly entertaining.

Two people who saw early drafts felt that chapter 3 was too early for this chapter. After all, how you present statistical data if you don’t understand the statistics referenced?

Having moved it towards the end of the book, the latest feedback has been to move back to about chapter 3. In defence of early inclusion most of the material does not reference any particular statistical test, but is focused on principles of presenting data as clearly and unambiguously as possible.  Items such as boxplots which use medians and quartiles can be forward referenced.

Of course the learner can read the chapters in any order they like (or the teacher can assign chapter 20 to be read before Chapter 3). Yet at the same time I like the idea of book where the chapters build on each other. This is my ‘baby’ and I’m pretty precious about it all. “Could students go straight into the chapter on the t-test without having to read the other stuff?”,  I have been asked. I suppose they might be able to, but it depends on what they already know, and what they actually expect to learn from the process. None of the language learning books I have come across open with “Lesson 1: Forming the subjunctive.”

One of my aspirations for the book is writing something that fits together well as a learning journey. The journey is not just being able to ‘do’ something, but hopefully being able to understand and appreciate why you are doing it.

But back to the subject of order not everybody learns the same things in the same. I’m not a child development expert, but know from my own boys (n=2) (and their friends) that they do not reach the same milestones in the same order. On the other hand when the time comes for them to learn to drive (still a long way off…) I’m pretty confident that they will learn to drive forwards before they learn to drive backwards.

Either way the question of whether this particular chapter should come near the beginning or near the end (or indeed in the middle) is one I wish to ponder further.

 

  • Twitter
  • del.icio.us
  • Digg
  • Facebook
  • Technorati
  • Reddit
  • Yahoo Buzz
  • StumbleUpon

New resource: Excel to help with statistics involving pre-decimal UK currency.

Until 1971 the UK used pounds (£), shillings (s) and pence (d).

There were 12 pence in shilling.

20 shillings in a pound

240 pence in a pound.

More about decimalisation can be found on Wikipedia 

If basic arithmetic wasn't enough of a problem, statistics must have been a nightmare. This excel spreadsheet enables the input of up to 300 prices in pounds, shillings and pence and calculates:

  1. The Total sum
  2. Mean average
  3. Standard Deviation
  4. Maximum
  5. Minimum
  6. Upper quartile
  7. Lower Quartile
  8. Median

It also converts all the pounds, shillings and pence data into (today’s) decimal currency. This makes pre-1971 and post-1971 comparisons possible. (Full details of the mathematics behind this are in the Wikipedia article cited above).

Download the spreadsheet (Excel 2010) predecimal excel

The excel side of things is a complicated and probably far more complex than it needs to be. Any improvements welcome.

John Canning, 2013

This is Creative Commons resource, so improvements etc. are welcome.

  • Twitter
  • del.icio.us
  • Digg
  • Facebook
  • Technorati
  • Reddit
  • Yahoo Buzz
  • StumbleUpon

Latest version of Statistics for Humanities available for comments

Statistics for Humanities has been extensively revised and is now available for further comments. I am particular keen to receive feedback from undergraduate and postgraduate students, especially those in the humanities.

Still quite a few formatting issues. A couple of the sections need some work, e.g. the gini coefficient section, parts of the presenting data chapter (there are some graphs without any explanation at present).

  • Twitter
  • del.icio.us
  • Digg
  • Facebook
  • Technorati
  • Reddit
  • Yahoo Buzz
  • StumbleUpon

The Joy of LaTeX

Over the past year I discovered LaTeX.

As in latex gloves?

No. It's pronounced "Lah-tech" or "Lay-tech". For the uninitiated LaTeX is a programming language used to produce documents. It can be used for books, articles, posters, presentation and many more things.

So a bit like Word then?

Nothing like word or any other word processing programming. Word processing programmes are great when you have nothing but text. However, I'm sure that everyone has experienced the annoyance of trying to put an image into a word document then finding it disappears onto another page or behind your text. Word and similar programmes are "What you see to what you get" (WYWISYG). LaTeX is "What you mean is what you get".

Anscombe quartet: Output in LaTeX
Anscombe's quartet: Output in LaTeX

What do you mean?

Clip from the LaTeX file for my book.
Clip from the LaTeX file for my book.

If in Word I want to put an image 2.54cm from the right hand edge of the paper and 5.53cm from the top I may succeed to start with. However, once I add another image or some text there is no knowing whether it will stay there or not. In LaTeX it will stay exactly where I told it to.

Who should use LaTeX?

The great thing about LaTeX is that you can add packages to the basic installation. Packages can deal with mathematics, make graphs, posters, define colours, make books. If you use equations, graphs etc. you may find it worthwhile. also great for phonetics, ancient languages and languages using less commonly used alphabets.

Sounds a bit complicated...

Yes it is. I had a few false starts. There is quite a good introduction on wikibooks. At some point I plan to write a very basic introduction myself. The LaTeX project page is also a good place to start.

So everyone loves it then?

No, but I think its beautiful.

Show me an example.

See my online statistics book (preview).

How much does it cost?

Nothing

  • Twitter
  • del.icio.us
  • Digg
  • Facebook
  • Technorati
  • Reddit
  • Yahoo Buzz
  • StumbleUpon

British Academy launches leaftet for students on value of quantitative skills

Stand out
Stand out and be counted: a guide to maximising your prospects.

The British Academy has published a booklet of case studies from humanities and social science graduates who use quantitative skills in their everyday work. On the subject of statistics a draft of my Statistics for the humanities book is now with the British Academy for review. I hope to have more news soon.

  • Twitter
  • del.icio.us
  • Digg
  • Facebook
  • Technorati
  • Reddit
  • Yahoo Buzz
  • StumbleUpon

Why you should graph data

Amended 10 March 2016 (corrections/ update made)

Anscombe's Quartet: Click to enlarge
Click to enlarge

I came across Anscombe’s Quartet on Wikipedia recently. I must confess to not having seen it before and don’t recall seeing it in any introductory statistics books.

The Anscombe’s Quartet is a conceptually and graphically clear way of showing the importance of graphs in statistical analysis. Each of the 11 pairs of observations have the same, x mean, y mean, x variance, y variance, correlation co-efficient and regression equation, though each have very different distributions. They clearly demonstrate the impact of outliers and how non-linear relationships can be identified.

Citation:

F. J. Anscombe (1973) Graphs in Statistical Analysis The American Statistician , Vol. 27, No. 1 (Feb., 1973), pp. 17-21

Article Stable URL: http://www.jstor.org/stable/2682899 (Not open access)

 

LaTeX code below.

\documentclass{article}
\usepackage{pgfplots}
\usepackage{pgfplotstable}
\pgfplotsset{compat=1.7}
\usepackage{amssymb, amsmath}
\usepackage{subcaption}
\begin{document}
\begin{figure}
\caption{Anscombe's quartet is a good demonstration why a scatterplot is so valuable, prior to calculating regression equations and correlation co-efficients. In all four cases the $x's$ have a mean of 9, and variance of 11. The mean of all the $y's$ is 7.5, and a variance 4.125. The correlation co-efficient of each is 0.816 and the linear regression line is $y=3+0.5x $}
\begin{subfigure}{.45 \textwidth}
\centering
\caption{Normal linear relationship}
\begin{tikzpicture}
\begin{axis} [width=5cm, height=5cm, xlabel=X1, ylabel=Y1]
\addplot[scatter, only marks, mark=x, mark size=4pt]
coordinates
{
(10, 8.04)
(8.0, 6.95)
(13, 7.58)
(9, 8.81)
(11, 8.33)
(14, 9.96)
(6, 7.24)
(4, 4.26)
(12, 10.84)
(7, 4.82)
(5, 5.68)
};
\addplot[scatter, mark=.]
coordinates
{
(0, 4.1)
(20, 12.5)
};
\end{axis}
\end{tikzpicture}
\end{subfigure}
\begin{subfigure}{.45 \textwidth}
\centering
\caption{Relationship clear, but not linear}
\begin{tikzpicture}
\begin{axis}[width=5cm, height=5cm, xlabel=X2, ylabel=Y2]
\addplot[scatter, only marks, mark=x, mark size=4pt]
coordinates
{
(10, 9.14)
(8.0, 8.14)
(13, 8.74)
(9, 8.77)
(11, 9.26)
(14, 8.10)
(6, 6.13)
(4, 3.1)
(12, 9.13)
(7, 7.26)
(5, 4.74)
};
\addplot[scatter, mark=.]
coordinates
{
(0, 4.1)
(20, 12.5)
};
\end{axis}
\end{tikzpicture}
\end{subfigure}
\
\begin{subfigure}{.45 \textwidth}
\centering
\caption{Clear linear relationship, but one outlier offsets the regression line}
\begin{tikzpicture}
\begin{axis} [width=5cm, height=5cm, xlabel=X3, ylabel=Y3]
\addplot[scatter, only marks, mark=x, mark size=4pt]
coordinates
{
(10, 7.46)
(8.0, 6.77)
(13, 12.74)
(9, 7.11)
(11, 7.81)
(14, 8.84)
(6, 6.08)
(4, 5.39)
(12, 8.15)
(7, 6.42)
(5, 5.73)
};
\addplot[scatter, mark=.]
coordinates
{
(0, 4.1)
(20, 12.5)
};
\end{axis}
\end{tikzpicture}
\end{subfigure}
\begin{subfigure}{.45 \textwidth}
\centering
\caption{Clear relationship, but one outlier puts the regression line at 45 degrees to the other 10 observations}
\begin{tikzpicture}
\begin{axis} [width=5cm, height=5cm, xlabel=X4, ylabel=Y4]
\addplot[scatter, only marks, mark=x, mark size=4pt]
coordinates
{
(8, 6.58)
(8.0, 5.76)
(8, 7.71)
(8, 8.84)
(8, 7.04)
(8, 5.26)
(19, 12.5)
(8, 5.56)
(8, 7.91)
(8, 6.89)
(8, 6.89)
};
\addplot[scatter, mark=.]
coordinates
{
(0, 4.1)
(20, 12.5)
};
\end{axis}
\end{tikzpicture}
\end{subfigure}
\end{figure}
\end{document}

  • Twitter
  • del.icio.us
  • Digg
  • Facebook
  • Technorati
  • Reddit
  • Yahoo Buzz
  • StumbleUpon

Normal distribution curve in LaTeX

Amended: 22 February 2016: There were a couple of errors in the code which I have now fixed. The previous code omitted the need for the xcolor package and the some coding items symbols (notably '\' were missing).
I have been searching the internet on how to produce a normal distribution curve in LaTeX with the standard deviations marked. I wasn't able to find exactly what I wanted, but got some good clues here.  This code uses the pgfplots package. The code should work 'as is'.

Normal distribution diagram in LaTeX (Using Pgfplots package)


\documentclass{article}
\usepackage{pgfplots}
\usepackage{amssymb, amsmath}
\usepackage{tikz}
\usepackage{xcolor}
\pgfplotsset{compat=1.7}
\begin{document}
\pgfmathdeclarefunction{gauss}{2}{\pgfmathparse{1/(#2*sqrt(2*pi))*exp(-((x-#1)^2)/(2*#2^2))}%
}
\begin{tikzpicture}
\begin{axis}[no markers, domain=0:10, samples=100,
axis lines*=left, xlabel=Standard deviations, ylabel=Frequency,,
height=6cm, width=10cm,
xtick={-3, -2, -1, 0, 1, 2, 3}, ytick=\empty,
enlargelimits=false, clip=false, axis on top,
grid = major]
\addplot [fill=cyan!20, draw=none, domain=-3:3] {gauss(0,1)} \closedcycle;
\addplot [fill=orange!20, draw=none, domain=-3:-2] {gauss(0,1)} \closedcycle;
\addplot [fill=orange!20, draw=none, domain=2:3] {gauss(0,1)} \closedcycle;
\addplot [fill=blue!20, draw=none, domain=-2:-1] {gauss(0,1)} \closedcycle;
\addplot [fill=blue!20, draw=none, domain=1:2] {gauss(0,1)} \closedcycle;
\addplot[] coordinates {(-1,0.4) (1,0.4)};
\addplot[] coordinates {(-2,0.3) (2,0.3)};
\addplot[] coordinates {(-3,0.2) (3,0.2)};
\node[coordinate, pin={68.2\%}] at (axis cs: 0, 0.4){};
\node[coordinate, pin={95\%}] at (axis cs: 0, 0.3){};
\node[coordinate, pin={99.7\%}] at (axis cs: 0, 0.2){};
\node[coordinate, pin={34.1\%}] at (axis cs: -0.5, 0){};
\node[coordinate, pin={34.1\%}] at (axis cs: 0.5, 0){};
\node[coordinate, pin={13.6\%}] at (axis cs: 1.5, 0){};
\node[coordinate, pin={13.6\%}] at (axis cs: -1.5, 0){};
\node[coordinate, pin={2.1\%}] at (axis cs: 2.5, 0){};
\node[coordinate, pin={2.1\%}] at (axis cs: -2.5, 0){};
\end{axis}
\end{tikzpicture}
\end{document}

  • Twitter
  • del.icio.us
  • Digg
  • Facebook
  • Technorati
  • Reddit
  • Yahoo Buzz
  • StumbleUpon