LSST Informatics and Statistics Science Collaboration

Support material for a new science collaboration proposal

Letter of intent

To the LSST Science Collaboration reviewers:

This is a Letter of Intent to propose the creation of a new LSST Science
Collaboration on Data Sciences.  Its membership will include astronomers
with significant research expertise in astroinformatics and
astrostatistics, and information scientists (statisticians and computer
scientists) collaborating in such research, and bringing expertise to
the collaboration from fields such as biology and geosciences that are
also addressing discovery and analysis challenges with large data sets.

This collaboration will pursue research and provide consultation on challenging
astroinformatics and astrostatics problems that will arise in pursuing the
science goals of other Science Collaborations.  It will expend special effort 
on classes of problems that are cross-cutting, arising in diverse astrophysical
applications.  It will focus on science-driven methodology research, and not
(directly) on data management issues.  Examples of the types of problems to be
addressed include:

- Discovery issues:
  * Controlling the number of false detections, accounting for test 
    multiplicity
  * Dimensional reduction to facilitate both guided and serendipitous
    discovery
  * Supervised, semi-supervised, and unsupervised classification in
    large data sets
  * Flexible and adaptive (semi-parametric and nonparametric) transient
    detection in time series
  * Cross-matching between large catalogs with accurate accounting for
    directional uncertainties
  * Faint source detection in multi-epoch/wavelength data cubes

- Modeling and analysis issues:
  * Design and comparison of photometric redshift algorithms, including
    calibration of redshift uncertainties
  * Flexible modeling of multivariate distributions (e.g., luminosity functions,
    size distributions, redshift distributions, Tully-Fisher and fundamental
    plane relations, color-magnitude diagrams) accounting for truncation,
    censoring, and measurement error
  * Detecting and modeling correlated temporal behavior across wavelength
    bands

The methodological research will be undertaken in close coordination with the
work of other Science Collaborations whose science efforts will depend on
accurate and efficient data science methodology.

This collaboration's work will contribute significantly to LSST science in
two ways.  It will improve the efficiency of LSST science efforts by
coordinating research on challenging data science problems of interest to
multiple Science Collaborations and to LSST science efforts of the broader
astronomical community.  It will also enable improved and even new science
by providing a channel for bringing cutting-edge information science expertise
to bear on challenging LSST data science problems.

In accord with the cross-cutting nature of the work of the proposed
collaboration, its membership will include many scientists who are members
of other Science Collaborations, or who are applying for membership in
response to the 2009 call for applications.

The following astronomers and information scientists have expressed 
enthusiastic interest in participating in the proposed Data Sciences 
Collaboration, though of course the final participant list may be
revised in the course of preparing the proposal:

[A preliminary participant list of over three dozen astronomers and
information scientists appeared here; see the Team page for the 
final list.]

We recognize that the nature of this collaboration is unusual in that it is
not focusing on a specific astronomical science topic.  If the LOI reviewers
have advice to offer us on how to tailor our proposal to the application
process, or on other forums for our proposal, we would be grateful for
a quick response to guide our proposal effort.

On behalf of the collaboration, I thank you in advance for your consideration
of our forthcoming proposal.

Tom Loredo
Dept. of Astronomy
Cornell University