Jim Cordes 520 Space Sciences Building jmc33@cornell.edu

This course builds upon a foundation of probability and statistics to explore, develop, and apply algorithms for discovering objects and events in astronomical data, for inference of sophisticated models for populations of objects using frequentist and Bayesian methods, and for visualization and presentation of results to address fundamental questions using persuasive, data-based arguments. Topics include time-series analysis; clustering and classification algorithms; genetic algorithms; Markov Chain Monte Carlo methods; and neural networks. Analysis projects include investigation of simulated and real data using Python and Jupyter notebooks. The emphasis is on understanding the fundamentals of algorithms and developing expertise for choosing appropriate methods to address top-level goals.

Course syllabus: pdf

Lectures, reading material, and assignments will be posted here as we go.

Lecture 1: Introduction to the Course pdf

Reading for the next two lectures: Probability basics: here are several reference options. After reading item 1, make a quick run through items 2, 3 or 4. Key features to be emphasized in class include random variables, probability density functions, examples like the Gaussian distribution, and the Central Limit Theorem.

Relevance: we will treat data as 'processes' that are collections of random variables and we need to be able to manipulate their probabilities.1. Chapter 1 and portions of Chapter 5 from Gregory:

all sections except 5.5.4, 5.6, 5.7.3, 5.8.3, 5.8.4, 5.11.1, 5.13.2, 5.14, 5.15

2. Course Notes (slides 2,3 summarize main elements we will use)

3. Another approach: Laws of Probability, Bayes' theorem, and the Central Limit Theorem (Babu)

4. Another approach: Basics of Probability from CS229 Stanford

Lecture 2: Basics of Probability and Processes. pdf

Lecture 3: Probability Applications. pdf

Lecture 4: Correlation functions, transformation of variables, Fourier transforms pdfHomework Assignment 1: A simple demonstration of Frequentist and Bayesian methods Jupyter notebook: Frequentist_Bayesian_Example.ipynb

Address the questions asked in the notebook by writing up what you did and what your conclusions are. You can add this to the notebook but RENAME the notebook with your name prepended to the notebook name. Alternatively you can submit (by email) a text-edited document converted to PDF (also with a name that includes your name). Due Thursday Feb 7.

Reading:

1. Fourier transforms: Appendix B sections B.1 through B.4.2 of Gregory

2. Fourier transforms: Course notes PDF

3. Linear Shift Invariant Systems PDF

Notes: Correlation Functions as a Diagnostic Tool pdf

Lecture 5: Linear models, cost functions, solutions, covariance matrices pdf

Reading:

1. Linear least squares pdf

2. Sections 10.1-10.6 in Gregory

Homework Assignment 2Due February 21 (Thursday) pdf

Lecture 6: Iterative linear regression; overfitting, detection and classification pdf

Reading:

1. DFT, FFT Usage, Fourier-based Power Spectra pdf

Lecture 7: Detection/classification of aligned features, Matched Filtering pdf

Reading:

1. Matched Filtering (first 18 slides) pdf

2. Chapter 39.4 The Single Neuron as a Classifier, in Information Theory, Inference, and Learning Algorithms (MacKay)

http://www.inference.org.uk/itila/book.html

3. Single layer neural networks (notebook from Sebastian Raschka)

https://sebastianraschka.com/Articles/2015_singlelayer_neurons.html

Lecture 8: Detection/classification of misaligned features, Matched Filtering II pdf

Jupyter notebooks: these are for the hackathon in the Lecture 9 class

1. Classical least squares demonstration (matrix algebra, covariance matrix, cost function, etc.) notebook

2. Least squares fit done iteratively in a simple network notebook

3. Pulse detection (classification) with a simple network (aligned pulses) notebook

4. Misaligned Pulse detection (first take) notebook

5. A neural net in 11 lines of Python

https://iamtrask.github.io/2015/07/12/basic-python-network/

Lecture 9 (19 Feb) Hackathon to investigate simple networks

Lecture 10: Matched filtering examples, information and entropy pdf

Reading:

1. Information and entropy methods (notes): first 20 pages pdf

2. Maximum Entropy Probabilities (Chapter 8 of Gregory)

Lecture 11: Localization of objects; precision issues pdf

Reading: convolutional neural network configuration examples

1. 1D CNN for time series (Ackermann, uses Keras)

2. Character recognition (LeNet) (LeCun et al. 1998)

Understanding LeNet Link

3. Deep Learning for Multimessenger Astrophysics George and Huerta (2018)

Reading: Deep learning and back propagation

2. Back propagation in multilayered network (Hertz et al. pp 115-120) Pages from Hertz et al.

3. Deep Learning from CS229 Stanford

4. Back propagation from CS229 Stanford

Lecture 12: Localization in the Fourier domain; information and entropy; data challenge. pdf

Homework Assignment 3Due March 26 (note revised due date) pdf

Data for Assignment 3:

1. Pulsar data (the gzipped file is 28MB; unzipped 100MB; be patient with download) pulsar data

2. Fast radio burst data set 1 numpy file

3. Fast radio burst data set 2 numpy file

Code (notebooks) for Assignment 3:1. Pulsar data analysis ipynb file

2. Matched filtering principles and examples ipynb file

3. Detection example with MF (ROC curves) ipynb file

Lecture 13: Convolutional Neural Networks; principal component analysis (brief) pdf

Lecture 14: CNNs (brief); principal component analysis pdf

Lecture 15: PCA example and signal detection; clustering examples pdf

1. Fourier Power Spectra pdf

2. Jupyter notebook for principal component analysis (PCA) ipynb file

3. Jupyter notebook for FFT Demo ipynb file

4. Jupyter notebook for False alarms in power spectra ipynb file

Lecture 16: Clustering algorithms and sampling theorem pdf

Reading

1. Chapter 20 of MacKay on clustering

2. Extra: Chapter 22 of MacKay on Gaussian mixture models (Advanced topic)

Lecture 17: Lomb-Scargle spectrum, CLEAN algorithms, NN elements pdf

Notebooks:1. Aliasing demonstration ipynbNotes: Notes_A6523_CLEAN_Algorithm_2017.pdf

1. CLEAN Algorithm pdf

Lecture 18: Nonuniform sampling and Lomb-Scargle spectrum, Nonlinear Models pdf

Lecture 19: Comments on spectral line detection, nonlinear models pdf

Notebooks:1. CLEAN demonstration with two nearby spectral lines ipynbReading/Articles:

2. Masking of one spectral line by another ipynb

1. Chapters 11, 12 (Gregory) Nonlinear modeling and MCMC

2. Genetic Algorithms: Principles of Natural Selection Applied to Computation (Forrest) pdf

3. Introduction to astroML: Machine Learning in Astrophysics (VanderPlas et al. ) pdf

Lecture 20: Markov processes and Gaussian process modeling (brief) pdf

Handwritten notes on Markov Chains pdf

Lecture 21: Gaussian process modeling and MC pdf

Notebooks:1. Generation and plotting of Markov time series ipynb

2. Gaussian process modeling: basic principles and examples ipynb

Lecture 22: MC and MCMC pdf

Notebooks:1. Notebook example of rejection sampling ipynb

2. Notebook demo of toy MCMC (Gaussian proposal and Gaussian target PDFs) ipynb

Lecture 23: MCMC and Bayesian inference for a Poisson model pdf

Notebooks:1. Bayesian inference for a straight line model (from https://dfm.io/emcee/current/user/line/ with modifications) ipynbScanned handwritten notes:

1. Bayesian inference for Poisson events in two intervals pdf

1. Software tool for drawing neural network diagrams link

2. Architectures of convolutional neural networks with definitions of layer types, choices of architecture, efficiency issues link

cordes@astro.cornell.edu