Med-DL | Extracting medical entities from social media


In a pandemic, what use is social media?

AI can now analyze user-generated internet data to inform public health policy during epidemics

Med-DL
Post-traumatic-stress symptoms spiked on Twitter with the declaration of the state of emergency in the USA.

During the COVID-19 pandemic, many shared their health conditions and concerns on social media, and that represented an opportunity to be able to monitor population health at an unprecedented scale. Yet, it is extremely difficult to find meaningful information in this ocean of data.

The Social Dynamics group at Bell Labs Cambridge developed Med-DL, a novel AI Deep Learning algorithm that mines large-scale social media data to reliably extract medical symptoms and diseases. The algorithm is based on state-of-the-art Recurrent Neural Networks, and is able to discover health mentions from large-scale, noisy, and unstructured data. For example, it can discover mentions such as:

“Hi all, idk about everyone else but my sleep has been super disrupted during this whole quarantine thing. I've been finding it really difficult to fall asleep at night and I often wake up early morning from pretty vivid dreams. [...] I do have some anxiety, but I don't take medications for it because I want to find more natural ways to help it.”

“I am having such a hard time right now. I have had my anxiety under control for many years but this pandemic has really brought it to the surface again. Today has been the worst. I’m shaking right now.”

With this tool at hand, the researchers analyzed geolocated tweets in the U.S. during the first three months of the pandemic, and were able to discover which medical conditions tended to be frequently mentioned (producing the health taxonomy below). Previous research into the psychological effects of quarantines found that symptoms of post-traumatic stress disorder (PTSD) tended to emerge. For the first time, the Bell Labs researchers were able to track post-traumatic-stress symptoms through time at the entire scale of the United States. They indeed found a spike in disturbed sleep, feelings of isolation, irritability, guilt, fight-or-flight response, disturbing thoughts, and mental distress.

Med-DL goes well beyond the current COVID-19 as it can detect virtually any medical symptom. It represents the first general-purpose tool that is able to analyze user-generated internet data to track the spread of diseases in advance of official statistics, and ultimately makes “infodemiology” (the use of user-generated data to inform public health policy) a reality.

taxonomy
Reddit health taxonomy: mental community graph.

Publications

  • The Healthy States of America: Creating a Health Taxonomy with Social Media. ICWSM 2021 PDF
  • Extracting Medical Entities from Social Media. CHIL 2020 PDF

Code and data


We'll never share your email with anyone else.

N.B.: If you do not receive the instruction message within a few hours, please check your junk/spam e-mail folder just in case the email was moved there.