Bigger problems than spreadsheets

By Pete North - October 5, 2020

It finally happened. A story about Excel spreadsheets is more exciting than Brexit. A data error led to 15,841 positive tests being left off the official daily figures. Public Health England set up an automatic process to pull this data together into Excel templates so that it could then be uploaded to a central system and made available to the NHS Test and Trace team, as well as other government computer dashboards.

Unlike the rest of the media I’m not going to speculate on why they chose to do it the way they did. For these such data preparation exercises Excel is as good as any. The problem is that PHE’s developers picked an old file format to do this – known as XLS, and as a consequence, each template could handle only about 65,000 rows of data rather than the one million-plus rows that Excel is actually capable of.

Why that happened is not yet explained. It could have been a coding oversight or even a typo. I’ve done it myself. That a data transfer routine failed last thing on a Friday is very possibly the biggest non-story in the galaxy. To handle the problem, PHE is now breaking down the test result data into smaller batches to create a larger number of Excel templates. That should ensure none hit their cap.

One could ask why the data standard has been set out the way it has, but I know from experience that there are very often good reasons for less orthodox approaches. I’ve had to employ some questionable methods in the past to overcome various snags. I’m prepared to give them the benefit of the doubt and say this could have happened to anybody – and I’m certain it does.

The story is not that an import routine failed. That goes with the territory. It’s not even relevant or important that Excel is used in the process. Complex Excel systems are used by Airbus, JP Morgan, Lloyds and the MoD. The recipient system is not based on Excel either as certain MPs are making out. That the data was added as a single day increment and subsequently published as such without anyone querying it is the serious lapse here.

One would have thought that with the curve failing to track Witty’s famous “plausible scenario”, that a sudden leap like that would have set alarm bells off before publication. I recall seeing the jump and immediately suspecting something was up. It simply didn’t look credible – and it certainly shouldn’t have looked credible to anyone in the driving seat.

The scandal here is one of two things. Either the data is published directly with no quality checking by epidemiologists, or that there is quality checking and nobody smelled a rat. The former is bad, but if it’s the latter, then we have much bigger problems than malfunctioning spreadsheets.