Over the last week or so I’ve built my own mathematical infectious disease model. I’ve done this for many different reasons (which I’ll discuss below), but what started off as one blogpost quickly mushroomed! So this is just part one (of five, or six, or maybe ten!)
Beware of building models
A couple of weeks ago, I was struggling to understand some of the proposals that were coming from the UK government, and it’s scientific advisors, so I used the ‘SIR disease model’ and an Excel spreadsheet to try understand the spread of Covid-19, and reach some conclusions about the sorts of numbers we could expect. The numbers looked pretty scary, and I could easily reach a couple of million deaths with semi-sensible numbers. At the time, I posted this on facebook:
Hmmm…I decided to whittle away some time using Excel to do some SIR disease modelling on C-19. The conclusion is – I’m glad I’m wearing my brown underpants!
I got some stick from the family for that one! I wanted to blog about my spreadsheet (so that other geeks out there could check it for mistakes, and some folks might find it interesting to while ago some lockdown time plugging in some number), but I was starting to question the wisdom of it. However, what really got me thinking about why I would post my really simple model was the remarks made on Radio 4’s Inside Science (2nd April 2020 episode) about ‘Armchair Epidemiologists’. On 6th April, Robert Halfon (chair of the UK Education select committee) went a bit further on Radio 4’s The World At One and commented on how unhelpful ‘ACDCs’ (Armchair Commentators Discussing Coronavirus) are. Why build my own model, when there are far better ones out their? What does my own model tell me that the other papers won’t?
First of all it’s the challenge of building my own model: this involved the maths behind the model, but also produced numerical data of the kind I hadn’t worked with before so I needed to develop a ‘dashboard’ within the Excel spreadsheet so I could easily understand what the calculations was telling me. Secondly, can I use my model to answer questions I have but the professional modellers don’t ask. Thirdly, as the information we have about Covid-19 changes, it’s not clear to me what those changes mean: can use my model to understand whether the ‘latest data’ is good, or bad.
But finally, I found that as I built my model I was doing an ‘in silico experiment’. As an educator I would always encourage students to take their knowledge from lectures and books and ‘apply it’; perhaps in a computer program or a laboratory experiment. Anyone who learns needs to move from ‘store-bought knowledge’ to ‘practical understanding’. For me, building this spreadsheet consolidates my practical understanding in two ways. Firstly, as I put values into the model I need to understand where those values come from, and in doing this ‘in silico’ experiment I get closer to an understanding of the strengths and weaknesses of the information I’m using. I am not just reading numbers and accepting them; I’m using them sceptically. My model also shows me how important those numbers are: for example, if a paper states that the Ro value is estimated at between 1.4 and 1.9, what does the difference between those two values actually mean? Secondly, Covid-19 is the greatest societal challenge I have seen in my lifetime so far; I need to turn on my brain, and probe, question, and challenge what I’m being told. My family and I have given up our liberty and freedom (and we may need to give up more before this is all over): I need to have a better understanding – a personal understanding – of why?
That’s why I modelled the spread of Covid-19 for myself.
I must emphasis that is is a personal journey of understanding and development. I’m putting the model on the blog because I’m hoping some of my data-geek friends will look at it, comment on it, and perhaps we can make it a better one.
My model will not compare to the rigorous and detailed modelling tools being used by the professional epidemiologist: perhaps my facebook post was bluster in the face of shock. In any case, academic groups such as the Global Infectious Disease Analysis unit at Imperial College are asking different questions such as ‘how modify society’s behaviours to reduce spread at the level of a collection of interacting individual’, whereas I’m just modelling gross disease spread through a population.
A new disease!
Since I’m putting this online as an ACDC, I can’t forget that I’ve ‘Dr’ in front of my name and I career in healthcare sciences. It’s easy to push my model to 15 million UK deaths with ‘selective’ use of some of the information we have available. So, from here on, I’m modelling a new disease: Divoc-91. I’ve based the modelling parameters on Covid-19, but if you play with the spreadsheet and find the numbers scary, please remember the spreadsheet is – like the disease Divoc-91 – made up, and not real.
The SIR model
For my spreadsheet I’m using a simple SIR disease model. The SIR disease model is about 100 years old now and in the last few week theres been a lot of new youtube videos explaining it. but there’s lots of youtube videos explaining it: the one that started me off is here, with a recent follow up video about the SIR variants here. Most of the information (and the equations I used) can be found on the wikipedia page. I’ve not solved the equations in the SIR model by ‘fitting it’ to the current Covid-19 outbreak, I’m just using the maths to understand the spread of Covid-19 rather than make predictions. I’ll let the real experts get those wrong! (Sorry, cheap dig! Any modelling will contain errors and variability, unless it’s a model about something that has already happened!)
The SIR model assumes there are three different populations: the number of Susceptible people, the number of Infected people, and the number of Recovered people who will not get the disease again (although this also includes those who have died as well). The SIR model works by using equations to predict how that number of S, I or R people goes up and down. There are two important things to remember: firstly people only go from the susceptible group, to the infected group and then to the recovered group (not susceptible straight to the recovered group, or from the recovered group back to the susceptible group); and secondly ‘the number of susceptible people’ plus ‘the number of infected people’ plus ‘the number of recovered people’ must equal the whole population number. (There are SIR variants that deal with these issues: take a look at at the video link above.) In a simplified text form the SIR equations are:
How fast the number of Susceptible people changes per day = – a x S x I / pop’n number
where a is a constant value we need to work out, S is the number of susceptible people, and I is the number of infected people. The negative is there because as the disease spreads the number of people in the susceptible group will go down.
How fast the number of Recovered people changes per day = b x I
where b is a constant value we need to work out and I is the number of infected people. There is no negative here because as people recover the number of people in the infected group will go up.
This leaves us with the number of infected people, I, which is the number of people leaving the susceptible group (and becoming infected), minus the number of people leaving the infected group (and recovering). So:
How fast the number of infected people changes per day = (a x S x I/pop’n no.) – (b x I)
May be if I have time I can play about with different variants in other blogposts.
The way I’d do these calculations is by using the number of people in the S, I and R groups on day 1, to work out the ‘How fast the number of S, I and R people change per day’ in day 1, and then add the ‘How fast’ values to the day 1 values, to get the day 2 values (and so on for days 3, 4, 5 etc).
There one other equation that helps me too:
The Ro value (which is the number of people a single infected individual will spread the disease to) is given by the equations:
Ro = a/b
This means that I can use expected Ro values to help me work out a or b.
(Note that in the full equation is Ro = s x a/b (where s = S/total population) – see here – however, because at the start of the model almost everyone is infected then s = 1 so the equation above is a decent approximation).
That’s the philosophical and mathematical rational behind why and how I built the model. In part II, I explain how S, I, R, a and b are calculated.