So what I wanted to talk about today is the free data set that the company has made available to COVID-19 researchers. Am I correct in thinking that offering is live on the AWS Data Exchange (ADX) right now?
That’s correct. It is live, and it is being refreshed nightly.
How did you guys come up with the idea for this offering?
We were receiving a lot of requests from various organizations looking for access to our data for COVID-19 research purposes. Obviously, the Tectonix post put a big spotlight on our location data, and how useful that can be for showing how people are moving around infected areas. We are a launch partner with AWS for their data exchange, so we thought that would be an excellent mechanism to rapidly get our data set into the hands of people who are doing research, or who are fighting the spread of the disease directly.
We also have a separate COVID-19 data set that is for sale for our enterprise clients. But for a group like the Cleveland Clinic, or a researcher at a University, we’re not going to charge them. They can get access to the data via one click, then they can take that data in house and do deep analysis on it.
What does the free data set for researchers include, versus the paid data set for enterprise clients?
They are the exact same data. The only difference between the products is who we are authorizing to purchase it. We vet anyone asking for free data to make sure that they are indeed using it for research purposes. We then approve their request, and they get access to the entire historical data set.
We have two different data sets included in either package. The first one is just the US, and it’s about 175 gigabytes of location data per day. So it is a hyper-dense, hyper-accurate location data set that can be used to determine social distancing success, potential areas of new outbreaks, etcetera. The second data set is global, excluding the EU 28 because of GDPR restrictions.
A lot of countries have actually suspended some of their restrictions on data. Brazil, for example, put their new data protection law on hold until this crisis is over. That’s because the free flow of data is absolutely necessary to help measure the spread of the virus. I think everybody has realized that the solution to this lies not exactly in preventing further outbreaks, but in spreading them out so that health care systems don’t become overwhelmed. That’s what social distancing is useful for, and that’s what we mean by “lower the curve.” We’re not going to be able to turn this off like a light switch, but using certain tools we can make it a lot more manageable.
That’s what we’re trying to do with this offering: give researchers a way to measure how well social distancing and other measures are working so far. Our data can, I believe, give an accurate picture of that down to a neighborhood level. That extra level of granularity is important. Just looking at the states as a whole can be wildly misleading. A map of movement and social distancing in Michigan is going to look a lot different than a map just of Detroit, for example.
What kind of research projects are you hoping people will use this data for?
Obviously the impact of social distancing is a huge one, and is one that we’ve already had some success with. Prediction of upcoming hotspots is probably the single most valuable research question that can be answered. Governor Cuomo just said that New York is actually shuffling ventilators around the state, day by day, based on where hotspots are popping up. That’s crazy. If they could have a predictive strategy for that kind of procedure, versus a reactive strategy, maybe that would work better for the whole country. If we could know where a hotspot is coming, and allocate the medical equipment that we desperately need ahead of the outbreak, it could have a huge impact on treatment. To me, that is probably the single strongest use case out of our particular data set. We have tons of density, which tells us when people are and aren’t social distancing.
That density is a strength for sure, because it gives a more complete picture. But we also know it can be a challenge for some researchers, who need to wade through 175 gigs of data in order to get, for example, just Pennsylvania data. The reason why we are releasing such a large data set, though, is that we want it to be applicable to as many use cases as possible. If it takes some researchers longer to narrow it down, we believe the advantages that our density offers more than makes up for that.
What would you say to someone who says that from a business standpoint it doesn’t make sense to give away your product for free?
If you look at the market, lots of folks are doing this. Google is giving away a ton of its data for free. The greater good is more important than the immediate dollar. We could sell this data — we do know how much it’s worth — but we’re not doing this for our bottom line. There is a greater good at play. X-Mode has always been about Location Data for Good. That’s what Picket (Our Free Data for social good initiative) is all about. And while this is not exactly a Picket project, it definitely goes hand in hand with it. We are doing good, and showing people that location data can be used for these beneficial use cases.