Michael Paul
 
Ph.D. Candidate [CV]

Department of Computer Science
Center for Language and Speech Processing
Johns Hopkins University
Hackerman Hall 321
3400 North Charles Street
Baltimore, MD 21218


Google Scholar | Twitter

About me: I am a final year PhD student of CS at Johns Hopkins University, advised by Mark Dredze and Jason Eisner. Before coming here, I was an undergraduate at the University of Illinois at Urbana-Champaign. I interned at Twitter and Microsoft Research in the summers of 2011 and 2013-2014. I am currently supported by a Microsoft Research PhD Fellowship as well as a Dean's fellowship from the Whiting School of Engineering. My research interests include natural language processing, text mining, and machine learning, with an emphasis on building unsupervised models to find meaningful patterns in large text collections. I'm interested in applications to social media and health informatics.

News:


Research
2014
David A. Broniatowski, Michael J. Paul, Mark Dredze. Twitter: Big data opportunities (letter). Science 345(6193): 148. [article]
[pdf]
Michael J. Paul and Mark Dredze. Discovering health topics in social media using topic models. PLOS ONE 9(8): e103408. [article]
[pdf]
[data]
Ahmed Abbasi, Donald Adjeroh, Mark Dredze, Michael J. Paul, Fatemeh Mariam Zahedi, Huimin Zhao, Nitin Walia, Hemant Jain, Patrick Sanvanson, Reza Shaker, Marco D. Huesch, Richard Beal, Wanhong Zheng, Marie Abate, Arun Ross.
Social media analytics for smart health. IEEE Intelligent Systems 29(2):60-80. Mar-Apr 2014.
  • Our article in this collection: M. Dredze and M.J. Paul, "Natural language processing for health and social media"  
[article]
[preprint]
Byron C. Wallace, Michael J. Paul, Urmimala Sarkar, Thomas A. Trikalinos, Mark Dredze. A large-scale quantitative analysis of latent factors and sentiment in online doctor reviews. Journal of the American Medical Informatics Association (JAMIA) 21(6), 1098-1103. [article]
[preprint]
Shiliang Wang, Michael J. Paul, Mark Dredze. Exploring health topics in Chinese social media: an analysis of Sina Weibo. AAAI Workshop on the World Wide Web and Public Health Intelligence, Quebec City. July 2014. [paper]
[slides]
Michael J. Paul, Mark Dredze, David Broniatowski. Challenges in influenza forecasting and opportunities for social media. AAAI Workshop on the World Wide Web and Public Health Intelligence, Quebec City. July 2014. [slides]
Mark Dredze, Renyuan Cheng, Michael Paul, David Broniatowski. HealthTweets.org: a platform for public health surveillance using Twitter. AAAI Workshop on the World Wide Web and Public Health Intelligence, Quebec City. July 2014. [paper]
[slides]
[website]

2013
David A. Broniatowski, Michael J. Paul, Mark Dredze. National and Local Influenza Surveillance through Twitter: An Analysis of the 2012-2013 Influenza Epidemic. PLOS ONE 8(12): e83672. [article]
[pdf]
Michael Paul, Eric Horvitz, Ryen White. Understanding Cancer Patients through Search Engine Query Logs. 2nd International Conference on Digital Disease Detection (DDD), San Francisco. September 2013. [rapid fire talk] [slides]
[video]
Michael J. Paul, Byron C. Wallace, Mark Dredze. What Affects Patient (Dis)satisfaction? Analyzing Online Doctor Ratings with a Joint Topic-Sentiment Model. AAAI Workshop on Expanding the Boundaries of Health Informatics Using AI (HIAI), Bellevue, WA. July 2013. [paper]
[data]
[slides]
Mark Dredze, Michael J. Paul, Shane Bergsma, Hieu Tran. Carmen: A Twitter Geolocation System with Applications to Public Health. AAAI Workshop on Expanding the Boundaries of Health Informatics Using AI (HIAI), Bellevue, WA. July 2013. [paper]
[code]
[slides]
Alex Lamb, Michael J. Paul, Mark Dredze. Separating Fact from Fear: Tracking Flu Infections on Twitter. 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2013), Atlanta. June 2013. [paper]
[data]
[slides]
[video]
Michael J. Paul and Mark Dredze. Drug Extraction from the Web: Summarizing Drug Experiences with Multi-Dimensional Topic Models. 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2013), Atlanta. June 2013. [paper]
[code]
[slides]
[video]

2012
Michael J. Paul and Mark Dredze. Factorial LDA: Sparse Multi-Dimensional Models of Text. Advances in Neural Information Processing Systems (NIPS 2012), Lake Tahoe, Nevada. December 2012. [25% acceptance] [paper]
[code]
Michael J. Paul and Mark Dredze. Experimenting with Drugs (and Topic Models): Multi-Dimensional Exploration of Recreational Drug Discussions. In the AAAI 2012 Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text, Arlington, VA. November 2012. [full paper] [paper]
[slides]

Alex Lamb, Michael J. Paul, Mark Dredze. Investigating Twitter as a Source for Studying Behavioral Responses to Epidemics. In the AAAI 2012 Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text, Arlington, VA. November 2012. [paper]

Atul Nakhasi, Ralph J. Passarella, Sarah G. Bell, Michael J. Paul, Mark Dredze, Peter J. Pronovost. Malpractice and Malcontent: Analyzing Medical Complaints in Twitter. In the AAAI 2012 Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text, Arlington, VA. November 2012. [paper]

Ralph J. Passarella, Atul Nakhasi, Sarah G. Bell, Michael J. Paul, Peter J. Pronovost, Mark Dredze. Twitter as a Source for Learning about Patient Safety Events. In the AMIA 2012 Annual Symposium (American Medical Informatics Association), Chicago, IL. November 2012. [oral presentation]
Michael J. Paul. Mixed Membership Markov Models for Unsupervised Conversation Modeling. In the 2012 Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2012), Jeju Island, Korea. July 2012. [25% acceptance] [paper]
[code]
[slides]
Michael J. Paul and Jason Eisner. Implicitly Intersecting Weighted Automata using Dual Decomposition. In the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2012), Montreal, Canada. June 2012. [paper]
[poster]

William M. Darling, Michael J. Paul and Fei Song. Unsupervised Part-of-Speech Tagging in Noisy and Esoteric Domains With a Syntactic-Semantic Bayesian HMM. In the EACL 2012 Workshop on Semantic Analysis in Social Media, Avignon, France. April 2012. [paper]


2011
Michael J. Paul and Mark Dredze. You are what you Tweet: Analyzing Twitter for Public Health. In the proceedings of the 5th International AAAI Conference on Weblogs and Social Media (ICWSM 2011), Barcelona, Spain. July 2011. [24% acceptance] [paper]
[slides]
[video]
Michael J. Paul and Mark Dredze. A Model for Mining Public Health Topics from Twitter. Technical Report. Johns Hopkins University. 2011. [paper]
Delip Rao, Michael Paul, Clayton Fink, David Yarowsky, Timothy Oates, Glen Coppersmith. Hierarchical Bayesian Models for Latent Attribute Detection in Social Media. In the proceedings of the 5th International AAAI Conference on Weblogs and Social Media (ICWSM 2011), Barcelona, Spain. July 2011. [short paper] [paper]
Roxana Girju and Michael J. Paul. Modeling Reciprocity in Social Interactions with Probabilistic Latent Space Models. Natural Language Engineering 17(1), pages 1-36. Cambridge University Press 2011. [paper]
[article]
[data]

2010
Michael J. Paul, ChengXiang Zhai and Roxana Girju. Summarizing Contrastive Viewpoints In Opinionated Text. In the proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP 2010), pages 65-75, MIT, Cambridge, Massachusetts. October 2010. [25% acceptance] [paper]
[slides]
[data]
Michael Paul and Roxana Girju. Comparative Scientific Research Analysis with a Language-Independent Cross-Collection Model. In the proceedings of XXVI Congreso de la Sociedad Española para el Procesamiento del Lenguaje Natural (SEPLN 2010), Valencia, Spain. September 2010. [paper]
Michael Paul and Roxana Girju. A Two-Dimensional Topic-Aspect Model for Discovering Multi-Faceted Topics. In the proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI-10), pages 545-550, Atlanta, Georgia. July 2010. [26.9% acceptance] [paper]
[slides]
[code]

2009
Michael Paul. Cross-Collection Topic Models: Automatically Comparing and Contrasting Text. Undergraduate Thesis, advised by Roxana Girju. Department of Computer Science, University of Illinois at Urbana-Champaign. 2009. [paper]
[slides]
Michael Paul and Roxana Girju. Topic Modeling of Research Fields: An Interdisciplinary Perspective. In the proceedings of Recent Advances in Natural Language Processing (RANLP 2009), Borovets, Bulgaria. September 2009. [paper]
Michael Paul and Roxana Girju. Cross-Cultural Analysis of Blogs and Forums with Mixed-Collection Topic Models. In the proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP 2009), pages 1408-1417, Singapore. August 2009. [paper]
[code]
[data]
Michael Paul, Roxana Girju, Chen Li. Mining the Web for Reciprocal Relationships. In the proceedings of the 13th Conference on Computational Natural Language Learning (CoNLL 2009), Boulder, Colorado. June 2009. [paper]
[data]

2008
Michael Paul and Roxana Girju. AIRTA: An Automatic Interdisciplinary Research Topic Advisor. [extended abstract] NSF-sponsored Symposium on Semantic Knowledge Discovery, Organization and Use - Demo session, New York University. November 2008. [paper]
[poster]