2014 AAAI Discovery Informatics Workshop

Discovery Informatics: Scientific Discoveries Enabled by AI

Sunday July 27, 2014

Co-located with AAAI 2014

Quebec City, Quebec

Discovery Informatics encompasses research on intelligent systems in support of scientific discoveries. At the core of Discovery Informatics research is modeling and capturing some aspect of the scientific processes that can lead to new discoveries. The focus of this workshop will be on new discoveries resulting from intelligent systems that use AI techniques, highlighting the importance of the discovery, the challenges that led to requiring an AI approach, and understanding the generality of the approach taken for other science problems and domains.

We are happy to announce the workshop's invited speakers:

  • Phil Bourne, Assistant Director for Data Science, National Institutes of Health
  • Paul Cohen, Program Manager, Defense Advanced Research Projects Agency
  • Tom Dietterich, Professor of Computer Science at Oregon State University and President-Elect of AAAI


Download the symposium schedule.

Workshop Description

Discovery Informatics encompasses research on intelligent systems in support of scientific discoveries. At the core of Discovery Informatics research is modeling and capturing some aspect of the scientific processes that can lead to new discoveries.

The focus of this workshop will be on new discoveries resulting from intelligent systems that use AI techniques, highlighting the importance of the discovery, the challenges that led to requiring an AI approach, and understanding the generality of the approach taken for other science problems and domains. Topics include but are not limited to machine reading from scientific articles, information integration and model synthesis, scientific knowledge modeling and inference, planning data analysis and experiment tasks, and learning from scientific data.


The workshop will be focused on discussion, based on two invited talks, seven long paper presentations, and three short paper presentations. There will also be a session with short previews of six AAAI/IAAI conference papers relevant to the workshop topics.


Download the symposium schedule.

Invited Speakers

Phil Bourne: "Ask not what the NIH can do for you; ask what you can do for the NIH"
Abstract: The NIH is about to embark on a series of initiatives intended to stimulate the development of an ecosystem surrounding biomedical research as a digital enterprise. The discovery informatics community can be major contributors to that enterprise. What those initiatives are and and how you could be involved will be discussed.

Biography: Philip E. Bourne PhD is the Associate Director for Data Science (ADDS) at the National Institutes of Health. Formally he was Associate Vice Chancellor for Innovation and Industry Alliances and Professor in the Department of Pharmacology and Skaggs School of Pharmacy and Pharmaceutical Sciences at the University of California San Diego. He was also Associate Director of the RCSB Protein Data Bank. Bourne's professional interests focus on service and research. He serves the national biomedical community through contributing ways to maximize the value (and hence accessibility) of scientific data. His research focuses on relevant biological and educational outcomes derived from computation and scholarly communication. This implies algorithms, text mining, machine learning, metalanguages, biological databases, and visualization applied to problems in systems pharmacology, evolution, cell signaling, apoptosis, immunology and scientific dissemination. He has published over 300 papers and 5 books, one of which sold over 150,000 copies. Bourne is committed to furthering the free dissemination of science through new models of publishing and better integration and subsequent dissemination of data and results which as far as possible should be freely available to all. He is the co-founder and founding Editor-in-Chief of the open access journal PLOS Computational Biology. Bourne is a Past President of the International Society for Computational Biology, an elected fellow of the American Association for the Advancement of Science (AAAS), the International Society for Computational Biology (ISCB) and the American Medical Informatics Association (AMIA).

Paul Cohen: "Big Mechanism"
Abstract: Dr. John Snow's nineteenth century maps of cholera deaths in London were a kind of big data, but it took Snow's human ingenuity to infer from these data that a water pump was probably a causal mechanism of disease transmission. Nearly two centuries on, big data is vastly bigger, but human ingenuity is still required to infer causal mechanisms. DARPA's Big Mechanism program aims to change that. Big mechanisms are large, explanatory models of complicated systems in which interactions have important causal effects. The collection of big data is increasingly automated, but the creation of big mechanisms remains a human endeavor made increasingly difficult by the fragmentation and distribution of knowledge. To the extent that the construction of big mechanisms can be automated, it could change how science is done. The first challenge problem for the Big Mechanism program is cancer signaling pathways. The program has three primary technical areas: Computers should read abstracts and papers in cancer biology to extract fragments of cancer pathways. Next, they should assemble these fragments into complete pathways of unprecedented scale and accuracy, and should figure out how pathways interact. Finally, computers should determine the causes and effects that might be manipulated, perhaps even to prevent or control cancer. Although the domain of the Big Mechanism program is cancer biology, the overarching goal of the program is to develop technologies for a new kind of science in which research is integrated more or less immediately -- automatically or semi-automatically -- into causal, explanatory models of unprecedented completeness and consistency.

Biography: Dr. Paul Cohen joined DARPA as a program manager in September 2013. His research interests span artificial intelligence and include machine learning, language, vision, semantic technology, data analysis, information theory and education informatics. Dr. Cohen joined DARPA from the University of Arizona, where he is professor and founding director of the university’s School of Information: Science, Technology and Arts. He has also served as head of the university’s department of computer science. Prior to joining the University of Arizona, Dr. Cohen worked at the University of Southern California. At that institution, he served as director of the Center for Research on Unexpected Events and deputy director of the Intelligent Systems Division. He began his career as professor of computer science at the University of Massachusetts. Dr. Cohen has published nearly 200 peer-reviewed articles and is the author of Empirical Methods for Artificial Intelligence (MIT Press, 1995). He is co-author of five books and has contributed chapters to another 20 books. He is an elected Fellow of the Association for the Advancement of Artificial Intelligence. Dr. Cohen holds a Doctor of Philosophy degree in Computer Science and Psychology from Stanford University, a Master of Science degree in Psychology from the University of California, Los Angeles and a Bachelor of Science degree in Psychology from the University of California, San Diego.

Tom Dietterich: "Constructing a Continent-Scale Bird Migration Model to Understand Bird Decision Making"
Abstract: The BirdCast team (Cornell Lab of Ornithology, Oregon State, and U Mass, Amherst) is building a large, multi-species migration model for the US. One of the key goals of this project is to formulate and test hypotheses about bird migration. For example, what signals are birds using to decide when to begin migration, when to stop over, and when to continue? Are they waiting for favorable winds? Suitable temperatures? Humidity? Food availability? Are they on an absolute time schedule or is there temporal flexibility? This talk will describe our model, which is a latent variable graphical model expressed in Dan Sheldon's Collective Graphical Model formalism. Two challenges will be discussed: (a) the computational challenges of fitting this model and (b) the representational and inferential challenges of working with scientific hypotheses represented as latent variables.

Biography: Dr. Dietterich is Distinguished Professor and Director of Intelligent Systems in the School of Electrical Engineering and Computer Science at Oregon State University, where he joined the faculty in 1985. In 1987, he was named a Presidential Young Investigator for the NSF. In 1990, he published, with Dr. Jude Shavlik, the book entitled Readings in Machine Learning, and he also served as the Technical Program Co-Chair of the National Conference on Artificial Intelligence (AAAI-90). From 1992-1998 he held the position of Executive Editor of the journal Machine Learning. The Association for the Advancement of Artificial Intelligence named him a Fellow in 1994, and the Association for Computing Machinery did the same in 2003. In 2000, he co-founded a new, free electronic journal: The Journal of Machine Learning Research, and he is currently a member of the Editorial Board. He served as Technical Program Chair of the Neural Information Processing Systems (NIPS) conference in 2000 and General Chair in 2001. He is Past-President of the International Machine Learning Society, a member of the IMLS Board, and he also serves on the Advisory Board of the NIPS Foundation. He is currently President-Elect of the Association for the Advancement of Artificial Intelligence and will serve a 2-year term as President from 2014-2016. Dr. Dietterich's currently pursues interdisciplinary research at the boundary of computer science, ecology, and sustainability policy. He is PI (with Carla Gomes of Cornell) of an 5-year NSF Expedition in Computational Sustainability. He is part of the leadership team for OSU's Ecosystem Informatics programs including the NSF Summer Institute in Ecoinformatics.

Accepted Long Papers

  • Gully Burns and Hans Chalupsky.
    ‘Its All Made Up’ - Why we should stop building representations based on interpretive models and focus on experimental evidence instead
  • Ishanu Chattopadhyay and Hod Lipson.
    Data Smashing: Uncovering lurking order in data
  • Ishanu Chattopadhyay and Hod Lipson.
    Distilling Evidence Of Long-range Direction-specific Causal Cross-talk In Molecular Evolution Of Retro-viral Genomes
  • Ken-Ichi Fukui, Daiki Inaba and Masayuki Numao.
    Discovery of Damage Patterns in Fuel Cell and Earthquake Occurrence Patterns by Co-occurring Cluster Mining
  • Ashok Goel and David Joyner.
    Computational Ideation in Scientific Discovery: Interactive Construction, Evaluation and Revision of Conceptual Models
  • Kazjon Grace and Mary Lou Maher.
    Using computational creativity to guide data-intensive scientific discovery
  • Emily Leblanc, Marcello Balduccini and William Regli.
    Towards a Content-Based Material Science Discovery Network

Accepted Short Papers

  • Tim Clark, Carole Goble, Paolo Ciccarese.
    Discoveries and Anti-Discoveries on the Web of Argument and Data
  • Anita de Waard.
    Ten Habits of Highly Effective Data
  • Kevin M. Livingston, Michael Bada, William A. Baumgartner Jr., Lawrence E. Hunter.
    Semantically Integrating Biomedical Databases to Support Inference

AAAI/IAAI Conference Paper Highlights

  • Hazem Radwan Ahmed, Janice I. Glasgow.
    Pattern Discovery in Protein Networks Reveals High-Confidence Predictions of Novel Interactions
  • Gadi Aleksandrowicz, Hana Chockler, Joseph Y. Halpern, Alexander Ivrii.
    The Computational Complexity of Structure-Based Causality
  • Alnur Ali, Rich Caruana, Ashish Kapoor.
    Learning with Model Selection
  • James Robert Lloyd, David Duvenaud, Roger Grosse, Joshua B. Tenenbaum, Zoubin Ghahramani.
    Automatic Construction and Natural-Language Description of Nonparametric Regression Models
  • Jun Yu, Rebecca A. Hutchinson, Weng-Keen Wong.
    A Latent Variable Model for Discovering Bird Species Commonly Misidentified by Citizen Scientists
  • Jun Yu, Weng-Keen Wong, Steve Kelling.
    Clustering Species Accumulation Curves to Identify Skill Levels of Citizen Scientists Participating in the eBird Project


Yolanda Gil (Co-Chair), Information Sciences Institute and Department of Computer Science, University of Southern California

Lawrence Hunter (Co-Chair), Department of Pharmacology (Denver) and Department of Computer Science (Boulder), University of Colorado

Program Committee

Elizabeth Bradley, University of Colorado Boulder

Gully APC Burns, University of Southern California

Ishanu Chattopadhyay, Cornell University

Tim Clark, Harvard University

Anita De Waard, Elsevier Labs

Michel Dumontier, Stanford University

Saso Dzeroski, Jozef Stefan Institute

Susan L. Epstein, The City University of New York

Paul Groth, Vrije Universiteit Amsterdam

Ashok K. Goel, Georgia Institute of Technology

Melissa Haendel, Oregon Health & Science University

Jim Hendler, Rensselaer Polytechnic Institute

Haym Hirsh, Cornell University

Rinke Hoekstra, Vrije Universiteit Amsterdam

Vasant Honavar, Pennsylvania State University

David Jensen, University of Massachusetts Amherst

David Kale, University of Southern California

Peter Karp, SRI International

Craig Knoblock, University of Southern California

Hod Lipson, Cornell University

Yan Liu, University of Southern California

Claire Monteleoni, The George Washington University

Mark Musen, Stanford University

Nigam Shah, Stanford University

Loren Terveen, University of Minnesota

Natalia Villanueva-Rosales, University of Texas at El Paso

Kiri Wagstaff, NASA/JPL


Submissions can be in three categories:

  • Abstracts that describe articles already published or soon to appear that describe discoveries made with AI systems. Abstracts should be 1 page in length.
  • Articles describing ongoing work that has the potential of leading to new discoveries. Articles should be at most 8 pages.
  • Position papers with unique perspectives on discovery informatics. Position papers should be at most 4 pages.

Submissions should use the AAAI style files.

Submissions can be made through this site.

Important Dates

Submission deadline: April 10, 2014

Notification date: May 1, 2014

Author accepted paper submission deadline: May 15, 2014

Workshop date: July 27 or July 28, 2014 (TBD)


The workshop will be co-located with AAAI 2014.


To register for the workshop, please use the AAAI 2014 registration site.