Background:

The ongoing COVID-19 pandemic has infected 39.6 million people and killed 1.11 million people worldwide. The United States has had one of the highest infection rates with 8.14 million people infected and 200,000 people killed as of October 2020.

In parallel to efforts to create a vaccine for the highly infectious disease, researchers have undergone extensive efforts to report and analyze an array of COVID-19 symptoms reported from patients around the world. For example, Wang et al. analyzed epidemiological, demographic, clinical, laboratory, radiological, and treatment data from patients in Wuhan. Directly comparing the outcomes of critically ill patients and noncritically ill patients, Wang’s study provides strong evidence that common COVID-19 symptoms include fever, fatigue, dry cough, and abnormal lymphoma count. Several follow-up studies of similar nature have either confirmed these findings or provided evidence of new symptoms of COVID-19.

Blueprint Datathon has partnered with Enya.AI this year to host the COVID-19 Symptoms vertical. Enya.AI recently launched FeverIQ, a breakthrough privacy-preserving COVID-19 symptom tracker which is part of a global effort to collect symptoms and test data from patients all around the world. To date, FeverIQ has collected COVID-19 symptoms and associated diagnostic test data from over 3.6 million patients.


The Research Problem:

FeverIQ is a gold mine for epidemiological researchers and data scientists, providing a route to securely circumventing the numerous regulatory and political barriers to collecting and sharing COVID-19 health data. The dataset relies on four main diagnostic statistics “designed to capture the similarity of the user’s symptoms to four preconfigured diagnosis vectors” . You can find extensive information on how these scores are calculated and how they can be interpreted here.

To make the lawyers happy: The below Enya.ai data is meant exclusively for use by the Stanford Blueprint Datathon teams for the duration of the datathon. All team members should have signed the Enya.ai Non-Disclosure Agreement (NDA) during registration, and should contact the Blueprint organizers immediately if this is not the case, prior to accessing the data. Data should not be shared outside of this context without prior written approval.

As a starting point, we ask you to consider the following questions when designing your experiment and interpreting your findings:

  1. Can you derive a computational model to predict COVID-19 PCR diagnostic test results based on patient symptom data reported in the form of FeverIQ’s diagnostic metrics?
  2. Do you see underlying patterns between certain symptoms? Can you explain your observations and support your analysis with data from similar epidemiological studies?
  3. Do any stand-alone symptom metrics seem to predict COVID-19 PCR diagnostic test results with notable accuracy? Consider strategically selected combinations as well.

The Data: