/ / An Epidemiology Glossary for Programmers
Miscellaneous

An Epidemiology Glossary for Programmers

You might be hearing a lot of people say “Well, I am not an epidemiologist, but here is my opinion anyway…” these days.

So, to continue that trend, I will say the same thing. I am not an epidemiologist. 🙂

However, I will try my best not to give you any opinion on anything outside my subject matter expertise. (Although, my father is a retired biostatistician who did study epidemics, so I am a lot more familiar with the phrase “clinical trial” and its general importance than many people).

What is epidemiology?

Here is what Google says.

I am not going to explain all the basics of epidemiology in this article. Instead, I will point out a tool which can help you get started.

A few weeks back, I started participating in a Kaggle competition which was trying to answer questions about COVID-19.

You might already know this, but there is a MASSIVE number of organizations conducting clinical trials all around the world, trying to understand more about this virus. And a lot of them are publishing their results as preprints (i.e. yet-to-be-published in journals).

Here is an amazing stat:

Through all of 2019, there were ~3600 research papers published on topics (loosely) related to epidemiology. Just until April 15th 2020, there are already 8000+ research papers on the same topic, many of them discussing clinical trials. (many of them are still preprints). I learnt about this stat while working on my tool, which I describe later.

In other words, there are a lot of articles being published very quickly and the Kaggle competition is asking if Machine Learning can help the medical community process all this information in a timely manner.

The Programmer’s viewpoint

Now, if you are like other programmers, you are usually concerned with things like input and output. And you would like to see some structure.

As it turns out, there is a visualization tool called an evidence gap map (EGM), which allows epidemiologists and the medical community to more quickly identify

a) all research papers published about specific variables of interest

b) gaps in the current research so they know where to focus future efforts

Here is an example:

Interventions

So what you see on the Y axis are what epidemiologists call “interventions”. These are the things which are being studied – for example, a particular drug or a vaccine.

Outcomes

The X axis has various “outcomes”. What was the mortality rate? How many people ended up on ventilators? etc.

Study Design

You can also see that there are many circles, in different colors. Usually, the color of the circle indicates the type of study – some studies are very rigorous and use a large sample size and are very carefully controlled for randomization etc. Other studies are less rigorous and are not easy to generalize across an entire population. Also, when you click on the circle, you will usually see a list of research papers if it is a web based EGM.

For example, there is something called the PRISMA protocol which provides a checklist to infer how rigorous a study was.

Usually, the evidence gap maps are created based on pre-decided interventions and outcomes. In other words, the interventions and outcomes are static.

Dynamic Evidence Gap Map

Since COVID19 is still not very well understood, there are lots of interventions and outcomes that the medical community is interested in. A static map will be too large to be practical.

So I built a dynamic evidence gap map for the dataset provided in the Kaggle competition.

You can use the dynamic EGM tool as a glossary – look up different terms and then go do your own studies on that topic.

In case you are wondering where I got all the values for the dropdown lists – an epidemiologist joined the forum and created a medical dictionary which is used to populate these dropdowns.

Related Posts