2012 Discovery Informatics Workshop

Discovery Informatics Workshop: Science Challenges for Intelligent Systems

February 2-3, 2012

Highlights

An executive summary of the workshop report is available: Executive Summary of the 2012 NSF Discovery Informatics Workshop Report.
The final workshop report is available: 2012 NSF Discovery Informatics Workshop Report.
A slide presentation is available: 2012 NSF Discovery Informatics Workshop Presentation, given at NSF on June 2012.
Join us at this follow-on event: 2012 Discovery Informatics Symposium, on November 2-4, 2012 in Washington DC.

What is Discovery Informatics?

The synergies between advances in computing and advances in science open the doors to exciting research agendas in computer science. Scientific questions have motivated computer science research in many areas including distributed sensor networks, high-end computing, distributed systems, scalable databases, statistical and data mining algorithms, computer networks and the web itself. Scientists have now the means to collect and process unprecedented amounts of data to understand phenomena that could not be studied before, from climate change to social networks to phylogenetics.

A new community of Discovery Informatics is emerging to understand the role of information and intelligent systems research in improving and innovating scientific processes in ways that will accelerate discoveries. Although computing has become central to science, there are important hallmarks in the 21st century that remain largely unaddressed and where AI research plays a central role.

First, discovery processes are increasingly complex and broader in scope. They remain largely human driven, and human cognitive limitations have become a bottleneck. New approaches are needed to address this complexity.

Second, data must be connected more closely than ever to the models of the phenomena under study. The current separation of models and data is hurting our ability to test and improve models. We must improve our understanding of how to link data with models of the phenomena under study.

Third, science is an increasingly social endeavor. Recent systems enable citizen volunteers to contribute large amounts of data, annotations, or complex processing results that result in scientific discoveries. We need to design new approaches to harness human abilities in all forms to contribute to science.

Addressing the ambitious research agendas put forward by many scientific disciplines requires meeting a multitude of challenges in intelligent systems, information sciences, and human-computer interaction. There are many aspects of the scientific discovery process that our community could help automate, facilitate, or make more efficient through artificial intelligence techniques. For example, although considerable efforts have been directed toward data modeling and integration, these activities continue to demand large investments of scientists’ time and effort. The scientific literature continues to grow and is becoming more and more unmanageable for researchers operating in the most active disciplines. Better interfaces for collaboration, visualization, and understanding would significantly improve scientific practice. Scientific data, publications, and tools could be published in open formats with appropriate semantic descriptions and metadata annotations to improve sharing and dissemination. Opportunities for broader participation in well-defined scientific tasks enable human contributors to provide large amounts of data, annotations, or complex processing results that could not otherwise be obtained. These are just some examples of areas where there are opportunities for artificial intelligent techniques could make a difference. Improvements and innovations across the spectrum of scientific processes and activities will have a profound impact on the rate of scientific discoveries. This workshop provided a forum for researchers interested in understanding the role of AI techniques in improving or innovating scientific processes.

Workshop Description

The synergies between advances in computing and advances in science open the doors to exciting research agendas in computer science. Scientific questions have motivated computer science research in many areas including distributed sensor networks, high-end computing, distributed systems, scalable databases, statistical and data mining algorithms, computer networks and the web itself. Scientists have now the means to collect and process unprecedented amounts of data to understand phenomena that could not be studied before, from climate change to social networks to phylogenetics.

In order to address the ambitious research agenda put forward by many science disciplines, many challenges must be addressed in the areas of information sciences, intelligent systems, and human-computer interaction. Data modeling and integration still require large investments of scientist time and effort. The scientific literature grows so quickly in many areas that it becomes unmanageable for scientists. Many aspects of the scientific discovery process are often largely manual and could be automated, improved, or made more efficient. Better interfaces for collaboration, visualization, and understanding would significantly improve scientific practice. At the same time, recent research in information and intelligent systems has opened up new opportunities for scientific discovery. Social computing has been successfully demonstrated as a novel approach to scientific discoveries. Social robotics is an emerging area that presents new opportunities to redesign data collection. The Semantic Web offers radically new paradigms for data publication and sharing.

The goal of this workshop is to investigate the opportunities that scientific discoveries present to information sciences and intelligent systems as a new area of research called discovery informatics.

Possible themes in discovery informatics for discussion at the workshop include:

  • Efficient experimentation and discovery processes: Scientific discovery processes involve many steps that are often managed or executed manually. While a lot of research has been devoted to managing complex data analysis processes in distributed resources for large-scale execution, many aspects of the management of those processes have been neglected, such as: How can scientists formulate data analysis workflows efficiently? How can systems best support and document the repeated hypothesis-experiment-test cycle that leads to discoveries? What kinds of intelligent reasoning can be brought to bear in the experimentation process? What roles can robots play in experimental processes? What new scientific discoveries could be enabled by robots? How can collaborative and cross-disciplinary process formulation be facilitated? How can discovery processes be shared, reused, and efficiently adapted to new questions? Can scientific processes be easily reproducible, repeatable, and repurposed? Is it possible to set formal guarantees that a dataset has been fully exploited and no opportunities for discoveries remain? Can we quantify the cost of a scientific question in terms of resources to explore the hypothesis space?
  • Practical issues in learning models from science data: Many discoveries are the result of applying data mining and machine learning techniques to experimental data. In recent years, there has been a lot of research on data-intensive computing and on novel machine learning algorithms. However, many key aspects of learning from data remain largely open research questions, such as: What kind of assistance can be provided to scientists to help them formulate hypotheses and then create experiments to test them with data? How can hypotheses be linked to data needs and therefore be used to control data collection instruments? What are appropriate methodologies to address data cleaning and preparation? What feature selection techniques are appropriate for given kinds of data or for a given question posed by a scientist? How can human ideas and insight be combined with machine learning algorithms? How could a system highlight what is unusual about a dataset that grants further investigation? How can insightful visualizations become commonplace?
  • Social computing for science: The raise of the social web has uncovered social computing as a completely new approach to discoveries. From protein folding to proving theorems, anyone with a little training can become a participant and even contribute to novel discoveries. This is a very recent area that is still highly exploratory and with many open research questions: What scientific tasks are amenable to social computing approaches? How can tasks be organized piecemeal to enable many contributors to understand what is expected of them? How can different contributor roles be defined and assigned to optimize the formulation of scientific tasks? How can we facilitate the development of reusable software infrastructure for social computing in science? How can we develop social computing approaches that enable K-12 students to take more active roles in scientific discoveries as a novel way to integrate research and education?

These themes, as well as other salient themes that may arise from discussions at the meeting, will be articulated in detail in the final workshop report outlining the challenges and opportunities in discovery informatics.

The workshop will be held in the Gallery III meeting room of the Hilton Arlington in Arlington, VA (one block from NSF).

Slides and notes from the sessions are available under the Documents tab


Thursday February 2, 2012

  • 8:00-8:30 Continental Breakfast
  • 8:30-9:00 Introductions, welcome, meeting plans
  • 9:00-10:30 Plenary session: Themes in Discovery Informatics
  • 10:30-11:00 Break
  • 11:00-12:00 Plenary session: Themes in Discovery Informatics (continued)
  • 12:00-1:00 Working lunch: Planning breakout topics
  • 1:00-3:00 Breakout sessions: Elaborating themes in Discovery Informatics
    1. Topic 1 (Gallery III Room)
    2. Topic 2 (Renoir Room)
    3. Topic 3 (da Vinci Suite)
  • 3:00-3:30 Break
  • 3:30-4:30 Plenary session: Breakout reports on Elaborating themes in Discovery Informatics
  • 4:30-5:30 Breakout sessions: Vision Scenarios for Science Areas
    1. Topic 1 (Gallery III Room)
    2. Topic 2 (Renoir Room)
    3. Topic 3 (da Vinci Suite)
  • 5:30-6:00 Plenary session: Breakout reports on Vision Scenarios for Science Areas

Friday February 3, 2012

  • 8:00-8:30 Continental Breakfast
  • 8:30-9:30 Plenary session: What each of us learned from yesterday
  • 9:30-10:00 Plenary session: Planning http://translatingfashion.com breakout groups
  • 10:00-10:30 Break
  • 10:30-12:00 Breakout sessions: Discovery Informatics Challenges in vision scenarios
    1. Topics 1 & 2 (Gallery III Room)
    2. Topic 3 (Renoir Room)
    3. Topic 4 (da Vinci Suite)
  • 12:00-1:00 Working lunch with presentations from breakout groups
  • 1:00-2:00 Workshop report planning
  • 2:00-3:00 Final presentation
  • 3:00-4:00 Q&A with NSF and government attendance
  • 4:00-... All available start to draft final report

Location

The workshop will be held in the Gallery III meeting room of the Hilton Arlington in Arlington, VA (one block from NSF). The address is 950 North Stafford Street, Arlington, VA 22203.

Travel Information

The Hilton Arlington is holding a block of rooms for the workshop until Wednesday January 11, 2012. To reserve one of these rooms, make a reservation requesting the code "UOS", either through the hotel web site or by calling call 1-800-Hiltons.

Submitting Expenses for Reimbursement

Flights must be booked as a round trip from the home city to Washington, be reasonably priced, and have economy fare. If your travel plans involve other stops or have any questions about flight arrangements, please contact the organizers before booking the tickets regarding the requirements for reimbursement in those cases.

Transportation and meals will be reimbursed when receipts are provided and expense is within reason.

Hotel room charges will be paid directly from the workshop account. Room Internet charges and other incidental expenses will not be covered.

Reimbursements must be pre-approved by the organizers. Car rental, Internet and/or others expenses will not be reimbursed.

Expenses submitted after February 15 will not be reimbursed.

The information contained in the reimbursement guide needs to be provided. The guide also specifies detailed requirements for reimbursement.

Invited Participants

  • Cecilia Aragon, University of Washington (interaction and visualization)
  • Phil Bourne, University of California San Diego (biology, future scientific publications)
  • Elizabeth Bradley, University of Colorado (qualitative reasoning)
  • Will Bridewell, Stanford University (machine learning and discovery)
  • Paolo Ciccarese, Harvard University (ontologies and semantic web)
  • Susan Davidson, University of Pennsylvania (databases and provenance)
  • Helena Deus, Digital Enterprise Research Institute (semantic web)
  • Yolanda Gil, University of Southern California (workflows and semantic web)
  • Clark Glymour, Carnegie Mellon University (philosophy of science, causality)
  • Carla Gomes, Cornell University (constraint reasoning and sustainability)
  • Alexander Gray, Georgia Institute of Technology (data mining and astrophysics)
  • Haym Hirsh, Rutgers University (social computing)
  • Larry Hunter, University of Colorado Denver (natural language and biology)
  • David Jensen, University of Massachusetts Amherst (machine learning)
  • Kerstin Kleese van Dam, Pacific Northwest National Laboratory (semantic scientific data management)
  • Vipin Kumar, University of Minnesota (machine learning and climate)
  • Pat Langley, Arizona State University (computational scientific discovery)
  • Hod Lipson, Cornell University (robotics)
  • Huan Liu, Arizona State University (social computing)
  • Yan Liu, University of Southern California (data mining and biology)
  • Miriah Meyer, University of Utah (scientific visualization)
  • Andrey Rzhetsky, University of Chicago (genetics)
  • Steve Sawyer, Syracuse University (social computing)
  • Alex Schliep, Rutgers University (bioinformatics)
  • Christian Schunn, University of buy viagra online Pittsburgh (cognitive science and discovery)
  • Nigam Shah, Stanford University (ontologies and semantic web)
  • Karsten Steinhaeuser, University of Minnesota (data mining and climate)
  • Alex Szalay, The Johns Hopkins University (astrophysics and citizen science)
  • Loren Terveen, University of Minnesota (interaction and social computing)
  • Raul E. Valdes-Perez, Vivisimo Inc. (commercialization, knowledge-based discovery)
  • Evelyne Viegas, Microsoft Research (semantic computing)

Government Observers

  • Dr. Josh Alspector, IDA
  • Dr. Mitra Basu, NSF CISE/CCF
  • Dr. Bonnie Dorr, DARPA
  • Dr. Le Gruenwald, NSF CISE/IIS
  • Dr. Vasant Honavar, NSF CISE/IIS
  • Dr. David Jakubek, OSD
  • Dr. Jia Li, NSF MPS/DMS
  • Dr. Mark Luker, NCO NITRD
  • Dr. Wen Masters, ONR
  • Dr. Michael Nelson, Georgetown University
  • Dr. Grace Peng, NIH NIBIB
  • Dr. Marc Rigas, NSF OD/OCI
  • Dr. Edwina Rissland, NSF CISE/IIS
  • Dr. Tom Russell, NSF OD/OIA
  • Dr. Carey Schwartz, ONR
  • Dr. Abdul Shaikh, NIH NCI
  • Dr. Julia Skapik, AAAS Science and Technology Fellow
  • Dr. George Strawn, NCO NITRD
  • Dr. Kenneth Whang, NSF CISE/IIS
  • Dr. Maria Zemankova, NSF CISE/IIS
  • Dr. Fen Zhao, NSF CISE/CCF

Final Workshop Report

The final workshop report is available: 2012 NSF Discovery Informatics Workshop Report.
A slide presentation is available: 2012 NSF Discovery Informatics Workshop Presentation, given at NSF on June 2012.

Records from the Meeting Sessions

Sessions appear in reverse chronological order


Participant Backgrounds and Contributions

Participants contributed materials prior to the workshop, such as current interests, position papers, and relevant publications:


Reports from Previous Workshops

Several workshops have been organized http://varley.net/online/ in recent years on topics relevant to the proposed workshop, although the topic of discovery informatics is itself new. A series of workshops on the topic of Cyber-enabled Discovery and Innovation have been held in recent years, including the NSF Symposium on Cyber-Enabled Discovery and Innovation, held in September 2007, the NSF workshop on data mining and cyber-enabled discovery for innovation, held in October 2007, and the SC07 session on supercomputing and CDI, held in November 2007.

Related workshops on creativity and scientific discovery include NSF Symposium on Computational Approaches to Creativity in Science, held in March 2008, the NSF Workshop on Knowledge Management and Visualization Tools in Support of Discovery, held in March 2008, the NSF Innovation and Discovery Workshop: The Scientific Basis of Individual and Team Innovation and Discovery, held in August 2006.

Many workshops have been held concerning scientific challenges for cyberinfrastructure, including the NSF Workshop on Cyberinfrastructure for the Atmospheric Sciences in the 21st Century, held in June 2004, and the NSF SBE-CISE Workshop on Cyberinfrastructure and the Social Sciences, held in March 2005.

Sponsorship

This workshop is sponsored by the Division of Information and Intelligent Systems of the Directorate for Computer and Information Sciences at the National Science Foundation under grant number IIS-1151951.