LSST Informatics and Statistics Science Collaboration

Support material for a new science collaboration proposal

Team members

Astronomers & physicists:

  1. Joshua Bloom, University of California, Berkeley   Web page
  2. Kirk Borne, George Mason University   Web page
  3. Robert Brunner, University of Illinois at Urbana-Champaign  
  4. Tamas Budavari, Johns Hopkins University   Web page
  5. Douglas Burke, Harvard-Smithsonian Center for Astrophysics   Web page
  6. David F. Chernoff, Cornell University   Web page
  7. James M. Cordes, Cornell University   Web page Web page
  8. George Djorgovski, California Institute of Technology   Web page
  9. Eric Feigelson, Penn State University   Web page
  10. L. Samuel Finn, Penn State University   Web page
  11. Peter Freeman, Carnegie Mellon University   Web page
  12. Matthew Graham, California Institute of Technology   Web page Web page
  13. Carlo Graziani, University of Chicago   Web page
  14. Jon Hakkila, College of Charleston   Web page
  15. William Jefferys, University of Vermont and University of Texas at Austin   Web page
  16. Vinay Kashyap, Harvard-Smithsonian Center for Astrophysics  
  17. Kevin Knuth, University at Albany   Web page
  18. Donald Q. Lamb, University of Chicago   Web page
  19. Thomas Loredo, Cornell University   Web page
  20. Ashish Mahabal, California Institute of Technology  
  21. Bruce McCollum, California Institute of Technology  
  22. Christopher Miller, Cerro Tololo Inter-American Observatory   Web page
  23. Misha (Meyer) Pesenson, California Institute of Technology  
  24. Vahe Petrosian, Stanford University   Web page
  25. Andy Ptak, Johns Hopkins University   Web page
  26. Jeffrey Scargle, NASA Ames Research Center   Web page
  27. Aneta Siemiginowska, Harvard-Smithsonian Center for Astrophysics   Web page
  28. Ben Wandelt, University of Illinois at Urbana-Champaign   Web page
  29. Michael Way, NASA Ames Research Center   Web page
  30. Martin Weinberg, University of Massachusetts Amherst   Web page

Information scientists:

  1. Jogesh Babu, Penn State University   Web page
  2. James Berger, Duke University and Statistical and Applied Mathematical Sciences Inst.   Web page
  3. Adam M. Brazier, Cornell University   Web page Web page
  4. Merlise Clyde, Duke University   Web page
  5. Ian Davidson, University of California, Davis   Web page
  6. Bradley Efron, Stanford University   Web page
  7. Chris Genovese, Carnegie Mellon University   Web page
  8. Alexander Gray, Georgia Institute of Technology   Web page
  9. Woncheol Jang, University of Georgia   Web page
  10. Eric D. Kolaczyk, Boston University   Web page
  11. Ji Meng Loh, Columbia University   Web page
  12. John Rice, University of California, Berkeley   Web page
  13. Joseph Richards, Carnegie Mellon University   Web page
  14. David Ruppert, Cornell University   Web page
  15. Naoki Saito, University of California, Davis   Web page
  16. Chad Schafer, Carnegie Mellon University   Web page
  17. Jiayang Sun, Case Western Reserve University   Web page
  18. David van Dyk, University of California, Irvine   Web page
  19. Larry Wasserman, Carnegie Mellon University   Web page
  20. Robert Wolpert, Duke University   Web page
  21. Michael Woodroofe, University of Michigan   Web page

Member roles

Astronomers & physicists:

Joshua Bloom (active; U.C. Berkeley) is PI of newly funded NSF/CDI grant to develop a framework to classify time series light curves from massive data streams. Fast and parallized statistics and machine learning algorithms form the basis of the new research. This work, in conjunction with efforts of the ISSC team, should be a powerful addition to LSST. Also, as co-chair of the LSST Transients Working Group he will be an important scientific liasion to this Group.

Kirk Borne (core & ISSC Chair; GMU) is a faculty member in both astrophysics and computational & data sciences at George Mason University and has been a leader in the development of data mining, scientific databases, semantic e-Science, virtual observatories, and other aspects of astroinformatics. He will contribute to ISSC efforts in these areas, particularly in regard to detection and characterization of rare types of objects and behaviors in LSST data. He will also lead the ISSC E/PO and citizen science efforts. He has been a member of the LSST Data Products Working Group, Data Mining Interest Group, Galaxies Science Collaboration, and Education/Public Outreach Core Team.

Robert Brunner (active; U. Illinois) will contribute to collaboration efforts on the application of machine learning to object characterization and distance estimation, and on improved statistical characterization of the distribution of matter in the Universe. He is a member of the LSST AGN and LSS Science Collaborations.

Tamas Budavari (active; JHU) is interested in observational cosmology and galaxy evolution. He has analyzed some of the largest astronomy catalogs to date with scalable statistical algorithms. Most relevant to the proposed collaborative work is his recent research on the cross-identification of independent detections of astronomical objects, and on the probabilistic constraints of photometric measurements on physical properties of celestial sources. He is a member of the LSS/BAO Science Collaboration.

Douglas Burke (active; CfA) is a member of the Science Data Systems group at the Chandra X-ray Center, which is responsible for the development and support of the data-reduction software provided to users of the Chandra X-ray satellite. His primary interest is in the use of semantic technologies to address data integration and analysis problems, and has interest in data algorithms for pipeline analysis and astrostatistics.

David Chernoff (active; Cornell) is interested in use of LSST data for AGN population modeling (with Loredo & Ruppert) and for constraining cosmic superstrings via searches for superstring lensing of stars within the Galaxy, which produces unique, highly repetitive achromatic flux variations. This work requires methodological developments for modeling data with measurement errors and selection effects, and for detecting novel signals in time series.

S. George Djorgovski (active; Caltech) is interested in exploration of observable parameter spaces, using novel machine learning and data mining and visualization techniques, including discovery of rare or new types of objects or phenomena, multivariate correlations, etc.. He is a co-founder of the VO effort, and is a member of a few LSST Science Collaborations.

Eric Feigelson (core; Penn State) has 25 years experience in the interface between astronomy and statistics. He will help find and evaluate statistical approaches to LSST challenges, draft and edit documents, and liase between the astronomical and information sciences communities. He is Associate Director of CASt, co-founder of the Summer School in Statistics for Astronomers, co-organizer of the cross-disciplinary conferences Statistical Challenges in Modern Astronomy, a member of the new Virtual Astronomy Observatory Science Council, and a member of the LSST Weak Lensing Science Collaboration.

Lee Samuel Finn (active; Penn State) will contribute to collaboration efforts on analysis of multi-messenger data sets involving LSST and gravitational wave detector data. He is a member of the LIGO Scientific Collaboration (LSC) and the North American Nanohertz Observatory for Gravitational-waves (NANOGrav) Collaboration.

Peter Freeman (active; CMU) is in the InCA group. His primary interest is the development and application of astrostatistical algorithms, particularly within the context of large-scale structure (e.g., estimation of the CMB foreground, photometric redshifts, and the dark energy EOS parameter) but also within other contexts as well (e.g., source detection).

Matthew Graham (active; Caltech) is interested in the development and application of semantic technologies to astronomy, particularly in the area of object characterization for transient programs.

Carlo Graziani (active; University of Chicago) is a member of the University of Chicago FLASH Center. His area of expertise is in the application of Bayesian statistical methods to astrophysical datasets. His current research centers on improving distance modulus estimates from Type Ia supernovae for use in Dark Energy studies, and in the comparison of SN Ia light curves to the output of models run on massively parallel computing platforms.

Jon Hakkila (active; College of Charleston) is interested in the application of astrostatistical algorithms to a variety of astronomical problems, with a focus on gamma-ray bursts (e.g., classification, pulse fitting). He is also interested in fundamental observational astronomical issues such as source detection and deconvolution of observations from instrumental and sampling biases.

Bill Jefferys (support; University of Texas at Austin and University of Vermont) has expertise in Bayesian statistics and applications to astronomy, with application to astrometry, binary and exoplanet orbit estimation, and Cepheid light curve modeling. He will consult with the team on issues involving the interface between statistics and astronomy, both in research and education.

Vinay Kashyap (active; CfA) is a CHASC member. He specializes in solar and stellar coronal astrophysics, with a view to understanding the observable variances and dependencies seen between X-ray luminosities and spectral type, age, rotation rate, binarity, composition, flare rate, temperature structure, etc. In addition, he has worked on a number of astronomical analysis algorithms, including wavdetect, which is the de facto standard for X-ray source detection.

Kevin H. Knuth (active; SUNY Albany) will contribute expertise in Bayesian data analysis and source separation techniques toward the detection, classification and characterization of astrophysical objects as well as estimating distances from redshifts.

Don Lamb (active; University of Chicago) is director of the University of Chicago Flash Center and a member of the Swift science team. His current interests include modeling of gamma-ray bursts, simulations of Type Ia supernovae using high-performance computers, analysis and interpretation of observations of these transients, and the application of Bayesian statistical methods to astrophysical datasets.

Thomas Loredo (core; Cornell U.) has devoted most of his career to astrostatistics research, largely focused on developing new Bayesian approaches to problems straining the limits of conventional methods. He has developed new methods for analyzing both time series and spectroscopy data in high energy astrophysics, analyzing exoplanet radial velocity data (including adaptive scheduling of observations), and analysis of number-size "log(N)-log(S)" distributions of GRBs and TNOs. He leads two newly-funded astronomer/statistician collaborations, working on semiparametric modeling of dynamic spectra from GRBs and SNe Ia, flexible modeling of AGN luminosity functions, and cross-matching of survey catalogs with significant astrometric uncertainties; these projects are all directly relevant to analysis of LSST data.

Ashish Mahabal (active; Caltech) is interested in classification of transients including new data as well as archival data in different techniques. He has been involved in past sky surveys like DPOSS and Palomar-Quest, is involved in the ongoing Catalina Realtime Transient Survey (CRTS) for transients in general and in the Palomar Transient Factory (PTF) for blazars in particular. He is also part of the LSST transients group. He is involved in the VOEventNet and SkyAlert efforts that collect diverse pieces of data on transients and alert subscribers about them. He is also interested in application of statistical techniques to astronomical data and in Citizen Science.

Bruce McCollum (active; Caltech) has experience in astronomical data quality assessment. He will contribute to collaboration efforts on intelligent image processing applications for identifying and classifying data anomalies and for performing source classifications.

Christopher Miller (active; CTIO) is in the InCA group. His primary interest is the application of computational and statistical algorithms towards cosmological and astrophysical challenges. His scientific focus includes the CMB, clusters of galaxies, large-scale structure, and active galactic nuclei. He is also the designer and project scientist for the Pitt/CMU SDSS Value-Added-Catalog as well as the NOAO Virtual Observatory Portal.

Misha Pesenson (active; Caltech) will contribute to collaboration efforts on the application of image processing, artificial intelligence and machine learning to analysis and visualization of large, multitemporal, multidimensional data sets. A specific objective will be developing new paradigms that enlarge the scope of information processing from Euclidean to curved spaces. This effort is presently supported by the National Geospatial Intelligence Agency.

Vahe Petrosian (active; Stanford U.) will contribute to determination of multivariate distributions of charcteristics of astronomical sources, with particular attention to correlations among these characteristics, from biased and truncated data. Some of his past work in this area was done in collaboration with Efron.

Andrew Ptak (active; Johns Hopkins University) is interested in astrostatistics concerning the modeling of complex distributions (with experience in Bayesian analysis of luminosity functions) and astrostatistics in X-ray astronomy. He is also serving a project scientist role in a proposed X-ray survey mission (the Wide-Field X-ray Telescope) that promises to provide important X-ray source correlations with LSST data, and is generally interested in methodology concerning multi-wavelength analysis of extragalactic objects.

Jeff Scargle (active; NASA Ames Research Center) will contribute methods and algorithms for automatic detection and characterization of transient sources, as well as for cross-detecting and cross-analyzing transients between LSST and other observations at other wavelengths.

Aneta Siemiginowska (active; SAO) is at Chandra X-ray Center and she is a memeber of the CHASC team. Her primary interests are in AGN and quasar science. She will contribute to the statistical characterization of source populations, source detection, photometry and photometric redshifts, multi-wavelength source modeling, and image analysis.

Benjamin Wandelt (active; U. Illinois) is a theoretical cosmologist working on large scale structure probes of dark energy and the cosmic microwave background. Based on his experience in CMB data analysis he is planning to contribute to the development of statistical and computational methods to extract information about the cosmological parameters, the initial perturbations and the nature of dark energy from LSST data. He is involved in LSST supernova, large scale structure and weak lensing science collaborations.

Michael Way (support; NASA/GISS) will work on helping to better utilize advanced regression methods from the machine learning community such as Gaussian Process Regression. These can better characterize object types and distances than many presently used methods. These will in turn will be utilized to better understand the clustering scales of different kinds of galaxies at different redshifts, how these scales may evolve, and the implications for cosmological models.

Martin D. Weinberg (active; University of Massachusetts) is interested in characterizing properties of galaxies for inference of morphological and dynamical evolution with environment on large scales. He is also working on using star count analyses to investigate the evolution of stellar populations and dynamical structure in Local Group members. He leads the theory group at UMass that developed the Bayesian Inference Engine, an MPI parallelized software package for supercomputing clusters tuned to perform Bayesian inference simulations for model comparision and hypothesis testing with large datasets.

Information scientists:

Jogesh Babu (core; PSU) is the Director of CASt; in that capacity he has co-organized numerous astrostatistics meetings and summer schools, and consulted with numerous astronomers on diverse astrostatistics problems. His areas of statistics expertise relevant to LSST include: bootstrap and other resampling methods; nonparametric Methods; inference for misspecified models; goodness-of-fit testing; analysis of massive datasets; inference on finite populations; density quantile estimation.

James O. Berger (support; Duke University) has central research interests in discovery issues arising from the need to control multiplicity of testing, as well as in modeling and analysis, especially model selection. His current astronomical methodological work is in exoplanet detection (with Loredo, Chernoff, & Clyde), which can be viewed as a time series model selection problem; he has also worked on nonparametric modeling of Cepheid light curves. Berger is a recipient of a MacArthur Prize Fellowship.

Merlise Clyde (support; Duke University) has expertise in Bayesian experimental design and model choice, including model selection and methods to combine models. Her current astrostatistics research include model selection and design for exoplanet detection. She has also developed Bayesian methods for nonparametric regression using wavelets, kernels and other overcomplete representations. Her current astronomical methodological work is in exoplanet detection (with Loredo, Chernoff, & Berger).

Ian Davidson (active; U.C. Davis) has expertise in data mining and machine learning, most relevantly the design of provably efficient knowledge enhanced mining algorithms that allow domain experts to encode expectations into what is not novel and not interesting. His work includes clustering, dimension reduction and anomaly detection. He is a member of the LSST AGN collaboration.

Bradley Efron (support; Stanford U.) has made seminal contributions to both applied and theoretical statistics. Efron is the inventor of bootstrap resampling; other areas of focus include simultaneous estimation (e.g., within a population) and multiple testing. His main application areas currently are biostatistics and astrostatistics, where he has worked on population distribution estimation with Petrosian. Efron has been awarded the National Medal of Science and a MacArthur Prize Fellowship. Efron served as President of the American Statistical Association in 2004.

Christopher Genovese (active; Carnegie Mellon University) studies nonparametric inference in high-dimensional and complex statistical models. One of the founders of the International Computational Astrostatistics (InCA) Group at Carnegie Mellon, he has worked on a variety of astronomical and cosmological problems, including inference for the Cosmic Microwave background, the dark energy equation of state, and source detection.

Alexander Gray (core; Georgia Tech) has expertise in multivariate statistics and machine learning, in particular the design of fast algorithms for allowing such methods to scale to massive datasets. His work includes density estimation, classification, regression, component analyses and dimension reduction, clustering, n-point statistics, cross-matching of catalogs, image de-blending, measurement error modeling, and anomaly detection, among others. He is a member of the LSST AGN collaboration.

Woncheol Jang (support; University of Georgia) has worked on nonparametric statistical inferences for massive high dimensional data. He has been working on galaxy clustering and time series analysis for variable stars and will contribute to source detection and multiple testing problems.

Eric Kolaczyk (support; Boston University) has worked on multiscale methods for analyzing astronomical time series and imaging data from photon counting instruments. He will consult with the team on multiscale statistical modeling, sparse statistical modeling, dimension reduction methods, and anomaly detection.

Ji Meng Loh (support; Columbia U. & AT&T) has expertise in spatial statistics, and has done work in estimating spatial correlations from astronomy data, in spatial modeling, bootstrap and in anomaly detection.

John Rice (active; Berkeley) has expertise in time series analysis, functional data analysis and multiple testing. He is a member of the Taiwanese American Occultation Survey. He has developed fast methods for blind searches for gamma-ray pulsars. He is also currently involved in developing statistical methods for identifying transient sources from the Palomar Transient Factory survey.

Joseph Richards (active; CMU) will contribute in the areas of non-linear dimensionality reduction methods, unsupervised and semi-supervised statistical inference, and non-parametric models for estimation in large astronomical databases.

David Ruppert (active; Cornell) has expertise in Bayesian modeling and computation (including MCMC methods), nonparametric estimation including density estimation, measurement error correction and deconvolution, heteroscedastic data, and splines for semiparametric statistical modeling. He is collaborating with Loredo and Chernoff on developing new methods for flexibly modeling AGN luminosity functions, and for cross-matching catalogs with significant direction uncertainties.

Naoki Saito (support; University of California, Davis) is an expert on computational harmonic analysis and its applications including: feature extration, pattern recognition, statistical signal and image processing, image analysis, human and machine perception, and geophysical inverse problems. He will consult with the team on feature extraction from astronomical data and estimation of spatially-varying point spread functions and deconvolution of astronomical images using such psf estimation.

Chad Schafer (active; CMU) is in the InCA group. He develops novel statistical methodology to address inference problems in cosmology and astronomy. Projects include the development of methods for constructing optimally precise confidence regions for cosmological parameters, for estimating bivariate luminosity functions, and for estimating properties of galaxies via low-dimensional representations of their emission spectra.

Jiayang Sun (support; CWRU) is an expert on simultaneous inference and multiple testing; biased sampling and measurement error problems; imaging, data mining and bioinformatics; mixtures, semiparametrics and nonparametrics. She has collaborated with astronomers on studying galaxy inner halos.

David van Dyk (active; University of California, Irvine) is the statistian leader of CHASC. He has developed methods for spectral analysis, image analysis, feature detection, feature modeling, and calibration that rely on multiscale, computer-evaluated, and/or highly-structured statistical models. He will consult with the team on computational methods, Bayesian methods, and methods for photon limited observations.

Larry Wasserman (active; CMU) is in the InCA group. He develops nonparametric statistical methods for problems in cosmology and astronomy. Projects include the development of methods for constructing nonparametric confidence intervals for cosmological parameters, estimating the equation of state of dark energy, filament detection, and estimating the peculiar velocity field.

Robert L Wolpert (active; Duke University) is expert in the theory and application of stochastic processes, and in Bayesian statistical theory and implementation. Much of his recent work is in nonparametric and non-Gaussian Bayesian analysis where stochastic processes (point processes, Levy processes, diffusions, etc) are used to construct prior distributions in high dimensional problems, and in development of numerical procedures for deriving and exploring the associated posterior distributions. Wolpert is in a newly-funded collaboration (with Loredo, Graziani, & Hakkila) using these techniques to model complex time variability of astrophysical sources.

Michael Woodroofe (active; U. Michigan) will contribute his expertise in model-free statistical inference to the collaboration. Woodroofe is the lead statistician in the UMich astrostatistics collaboration studying dark matter in dwarf galaxies.

Collaborations

Among the proposed ISSC membership are members of three longstanding and productive astrostatistics collaborations:

InCA — International Computational Astrostatistics Group

The InCA group (formerly PiCA, the Pittsburgh Computational Astrostatistics group) is hosted at Carnegie Mellon University and University of Pittsburgh, and includes participants at University of Washington, CTIO, and University of Portsmouth. The InCA team currently includes 28 astronomers, statisticians, and computer scientists, including faculty members, senior researchers, and post-docs. InCA began in 1999 with with support from the NSF's Knowledge and Distributed Intelligence Initiative; subsequent operations have been funded by a variety of NSF and NASA grants. InCA research has largely focused on data analysis problems in cosmology. InCA research has played a prominent role in SDSS data analysis; InCA researchers have led the field in using modern proximity data structures to enable sophisticated statistical analysis of large data sets. Recent InCA work has brought statistics research on false discovery rate control and dimension reduction to bear on cosmology problems.

More information about the InCA group is available at InCAGroup.org.

The following proposed ISSC team members are affiliated with InCA: Freeman, Genovese, Miller, Park, Richards, Schafer, and Wasserman. Core team member Gray was affiliated with InCA before moving to Georgia Tech.

CHASC — California-Harvard Astrostatistics Collaboration

The California-Harvard Astrostatistics Collaboration is currently comprised of 17 astronomers and statisticians, mostly affiliated with CfA and the statistics departments at Harvard and UC Irvine. CHASC began as an informal collaboration between Harvard statisticians and Chandra scientists in 1996. Its activities were initially supported via funding from the Chandra X-ray Center; subsequent support has been via NASA and NSF grants from both astronomy and statistics programs. Much of their astrostatistics work has focused on problems in X-ray astronomy (spectroscopy and imaging), particularly focusing on detailed modeling of diverse and complex instrumental effects (response functions, pulse pile-up), and modeling photon counting data in the low-counts, non-Gaussian regime. Current research is developing techniques to quantitatively propagate systematic error in spectroscopy and image analyses, and CHASC interests are expanding beyond X-ray astronomy to include detailed, computer-model-based analysis of multicolor stellar color-magnitude diagrams. CHASC members also wrote an influential critique on astronomers' use of model selection techniques (e.g., the F test and likelihood ratio tests).

More information about CHASC is available at the CHASC web site.

The following proposed ISSC team members are affiliated with CHASC: Kashyap, Meng, Park, Sieminginowska, van Dyk (Freeman was formerly a CHASC affiliate before moving to CMU and joining InCA).

CASt — The Center for Astrostatistics

The Center for Astrostatistics at Penn State was created in 2003 as an outgrowth of nearly two decades of astrostatistical research and activity. Supported by NSF and NASA grants from both astronomy and statistics divisions, they have conducted research on the treatment of upper limits, symmetric linear regression techniques, multivariate classification, data streaming algorithms for gigascale data sets, and faint source detection in the presence of non-Gaussian noise. CASt publications have received ~1000 citations.

Starting in 1991, CASt Director Babu and Co-Director Feigelson have organized the premier cross-disciplinary gatherings of astronomers and statisticians in the Statistical Challenges in Modern Astronomy I/II/III/IV conferences. They have run astrostatistical sessions at meetings in both fields, released specialized software packages, and written numerous review articles.  CASt operates a Web site receiving 100,000 hits/yr providing software, curricular and bibliographic services in astrostatistics. Of particular value to astronomers, CASt has resources and contacts among the larger astrostatistical community as evidenced by the semester-long 2006 Astrostatistics Program, organized by the center in collaboration with SAMSI. CASt has been regularly organizing summer schools at Penn State to train astronomers and physicists in advanced statistical methods for handling a diversity of statistical issues confronting astronomy, space sciences, and high energy particle physics.  CASt has been organizing similar summer schools since 2007 in India in collaboration with the Indian Institute of Astrophysics. CASt-trained statistician Dr. Hyunsook Lee has a postdoctoral position at CfA and collaborates with ISSC team members associated with CHASC; she maintains The AstroStat Slog, a widely read statistics-in-astronomy blog.

More information about CASt is available at the CASt web site.

ISSC team members Babu and Feigelson are the Director and Co-Director of CASt. Members Berger, Djorgovski and Rice are on the CASt advisory board. van Dyk, Finn, Loredo, Wasserman and Weinberg are CASt associates.

Smaller collaborations

Our proposal team also includes members of several smaller astro/info collaborations, including the following: