COVID-19 Spread Forecast

Why Forecasting?

The outbreak of an epidemic is a process that often needs to be controlled by authorities. At the same time numbers of infected people do not correspond to numbers of people which are confirmed diagnosed. Only the number of diagnosed people is available as confirmed data source while the number of actual infections can be significantly higher.

The faster the infection spreads the more difference is in between these two numbers and the more difficult it is to provide a solid method of control strategy.

It get's even more difficult if people are not diagnosable but infectious for a longer period of time either because infectious people do not show symptoms yet or symptoms are not strong in a large number of cases.

This is what is called a Dead Time System. A system that reacts on changes of its surrounding with a delay. Only after that delay changes get visible in numbers. So if authorities decide to apply restrictions trending of the infection rate will only change after that delay time. That makes it extremely difficult to make the right decisions.

Covid-19 is such a dead time system.

Making desicions without having a forecast that provides information about the results of that decisions (which are only visible in the future) the whole process is prone to errors due to late action and following overreactions.

No forecasting can be perfect, in fact it can be totally wrong if some assumtions were wrong. But at least it can be possible to get an idea of how the system reacts on changes in principle. And in case the assumptions were right and the model parameters are stable, forecasting can provide a quite solid information for the near future, even if the results for distant future are totally wrong. And that information might be sufficient to make good decisions.

We could not find any good forecasting information for Covid-19 on the web. There are some good publications of mathematical models but unfortionately without real calculations with real numbers. So we made our own.

And again a disclaimer: We are not professionals in virology. We have a solid understanding of math and that is only a try to use the available numbers of cases to create a model that seems sufficiently accurate to us. A virus might behave different than we assumed. People in a country might behave different than we assumed. This can change everything and our calculation can be totally wrong.

SO DO NOT USE THESE RESULTS FOR ANY OTHER PURPOSE THAN RESEARCH AND STUDY OF THE PHENOMENOM ITSELF.

What is a numerical time domain model?

Numerical modelling can provide solutions for problems that can not be solved by pure analytical solutions. Calculation of a numerical method is in general more simplistic than analytical methods which can invole highly complex formulas. Numerical modelling is the best choice for complex problems that need to be solved on a computer. Downside is the higher amount of calculations needed to get to a solution than it would be with an analytical approach.

Numerical modeling in time domain means that time is sliced in timeslots. For each timeslot a set of computations is performed to get the state of the system in the next timeslot. It sounds quite simple and in fact it is.

Assumptions

First we made some assumtions. The most important one is that there is no immunity at the beginning and everybody could possibly get infected (this might - hopefully - be wrong). So the number of infectable people at the beginning is exacltly the population of the country.

Another assumption is that the virus itself will not change a lot in terms of how easy it spreads under the same conditions. It is known by specialists that most viruses tend to get more likely to be spread as it mutates but we neglegt this for simplicity.

The third unknown is how many deaths are to be expected compared to the number of total infections. As authorities only count the number of diagnosed people there is definitely the possibility of dignosable people not getting a dignosis and so not being counted. But we had to decide which death rate to use in the model. First we decided to use the number of 0.5% which is known from South Korea. We think the lowest known number must be used in any calculation of death rates as all other countries which have higher rates just did not diagnose all diagnosable people. Of course this might also be the case in South Korea and so in fact we expect the actual rate of death to be even lower then 0.5% when people get proper medical treatment. Also we do not take into account that overloading of medical infrastructre may increase the rate of death at a later point in time. As germany shows a good step response to the restrictions that were applied, we were able to calculate the death delay time as well as the actual death rate out of that step response. We derived a death rate of 0.40% and a mean delay time of 13 days between infection and death from the step response. From now on we use that as assumption.

As a calculation with large numbers we do not track down effects to single persons or single regions in a country. We expect the model to be a system of mean values. So numbers in some regions may be higher while being lower in some other regions.

Furthermore we implemented methodes to model social restrictions that are variable in time. We do not know the exact effect of different methods of social restrictions as nobody ever tried this before in Germany. So we do an estimation at this point. After some time the model is able to learn which type of social restriction causes which numerical reduction of infection rates. We are able to learn this from other countries that are ahead in time. But countries with different social, political and technical background cannot be compared to each other that simple. For example we assume that closing down schools will reduce the overall social activity by 20%. At this point in time we have no idea what amount of reduction further methods of restriction will bring.

Last it is assumed that one person can get the disease only once in the near future. The model assumes that once someone got the disease that person will either get immune after some time or die. That person cannot get the disease a second time.

Data Sources and how calculation works

First we want to thank Johns Hopkins University for providing data of history of actual cases and deaths for most countries in the world in a computer readable format on Github. This is our main data source. This work is greatly appreciated. Please notice their terms of use for that data if you plan to use our calculation or results.

Now we want to explain how the model itself works. As described before we are using a numerical time domain model. We use a time slice width of one day as this also matches the rate of data in the repository of Johns Hopkins University. The model is calculated day by day.

First the number of infectious people (people who are able to infect another person) for the current day is calculated by looking at the number of new infections in the last days. We define a time window in the history in which an infection must have been in order for that person to be infectious today. Then we count how many infectious people we have today (Forecast Infectious).

From the number of infectious people today we can derive the number of new infections today (Forecast new infections). This is simply done by multiplication of the number of infectious people times a constant which defines how many infectable persons will get infected in one day by one infectious person. That result is then multiplied with the ratio of infectable persons over the total number of population and multiplied with a factor of social interaction. The second multiplication takes into account that it gets less likely for an infectios person to find someone to infect who is susceptible for the disease as time goes by and more people get immune. The third multiplication enables us to reduce the factor of social interaction as authorities apply social restrictions. This factor starts with 1 and is reduced over time as restrictions are introduced.

Now we accumulate that number of new infections per day over history. This results in the estimated number of infections (Forecast Infected incl. Dead).

The number of estimated infectable persons (someone who is susceptible of getting the disease) is then simply calculated by subtracting the number of estimated infected people from the total population. This results in Forecast Infectable.

The number of people who could actually get a positiv test result (Forecast Diagnosable) in a diagosis is simple the curve of Forecast Infected delayed by 9 days. This has been assumed as the mean time between infection and diagnosis. The model has been trained by parameterization to fit that curve on the actual number of confirmed cases in history. After the parameters are derived that curve allows a forecast of confirmed cases for the near future.

The number of accumulated deaths (Forcast Dead) is calculated by the assumption that 0.5% of infections will result in death. So the curve of Forecast Infected (which should in fact be a rather accurate estimation of real infections) is multiplied by 0.5% and then delayed by a certain amount of time that is the mean time between an infection and death in case a person dies. This delay time has been derived from data of other countries to be around 14 days.

The forecast of ICU Load (people in need of intensive care) is a rectangle window function over the histoy of new infections. It is derived from a rolling sum over the history of new infections in an interval of icuDuration days mutliplied by a factor icuRate. icu Rate is the proportion of infected needing intensive care after some time. icuDuration is the mean time a person will stay in icu. That curve is then delayed by icuDelay because it takes some time from infection until intensive care is needed. This delay time is set to 9 days. The mean time a person will stay in icu is set to 15 days.

Interested in detail?

All the calculation is done in your browser in javascript so everybody is able to see the source code, copy and check it and apply changed to it. Data from Johns Hopkins University is downloaded in real time from the github repository. For research the code may be used according to the terms of Gnu Public License Version 3 or later. Get the source