Astronomy 4523/6523: Modeling, Mining, and Machine Learning in Astronomy

Spring 2019     Tuesday-Thursday 1:25-2:40 pm     301 Space Sciences Building
Jim Cordes     520 Space Sciences Building

This course builds upon a foundation of probability and statistics to explore, develop, and apply algorithms for discovering objects and events in astronomical data, for inference of sophisticated models for populations of objects using frequentist and Bayesian methods, and for visualization and presentation of results to address fundamental questions using persuasive, data-based arguments. Topics include time-series analysis; clustering and classification algorithms; genetic algorithms; Markov Chain Monte Carlo methods; and neural networks. Analysis projects include investigation of simulated and real data using Python and Jupyter notebooks. The emphasis is on understanding the fundamentals of algorithms and developing expertise for choosing appropriate methods to address top-level goals.
Course syllabus: pdf
Lectures, reading material, and assignments will be posted here as we go.

Lecture 1: Introduction to the Course pdf
Reading for the next two lectures: Probability basics: here are several reference options. After reading item 1, make a quick run through items 2, 3 or 4. Key features to be emphasized in class include random variables, probability density functions, examples like the Gaussian distribution, and the Central Limit Theorem.

Relevance: we will treat data as 'processes' that are collections of random variables and we need to be able to manipulate their probabilities.
1. Chapter 1 and portions of Chapter 5 from Gregory:
       all sections except 5.5.4, 5.6, 5.7.3, 5.8.3, 5.8.4, 5.11.1, 5.13.2, 5.14, 5.15
2. Course Notes (slides 2,3 summarize main elements we will use)
3. Another approach: Laws of Probability, Bayes' theorem, and the Central Limit Theorem (Babu)
4. Another approach: Basics of Probability from CS229 Stanford

Lecture 2: Basics of Probability and Processes. pdf

Lecture 3: Probability Applications. pdf
Homework: Assignment 1 A simple demonstration of Frequentist and Bayesian methods Jupyter notebook: Frequentist_Bayesian_Example.ipynb

Address the questions asked in the notebook by writing up what you did and what your conclusions are. You can add this to the notebook but RENAME the notebook with your name prepended to the notebook name. Alternatively you can submit (by email) a text-edited document converted to PDF (also with a name that includes your name).     Due Thursday Feb 7.
      1. Fourier transforms: Appendix B sections B.1 through B.4.2 of Gregory
      2. Fourier transforms: Course notes PDF
      3. Linear Shift Invariant Systems PDF

Lecture 4: Correlation functions, transformation of variables, Fourier transforms pdf
Notes: Correlation Functions as a Diagnostic Tool pdf

Lecture 5: Linear models, cost functions, solutions, covariance matrices pdf
1. Linear least squares pdf
2. Sections 10.1-10.6 in Gregory

Homework: Assignment 2 Due February 21 (Thursday) pdf

Lecture 6: Iterative linear regression; overfitting, detection and classification pdf
1. DFT, FFT Usage, Fourier-based Power Spectra pdf

Lecture 7: Detection/classification of aligned features, Matched Filtering pdf
1. Matched Filtering (first 18 slides) pdf
2. Chapter 39.4 The Single Neuron as a Classifier, in Information Theory, Inference, and Learning Algorithms (MacKay)
3. Single layer neural networks (notebook from Sebastian Raschka)

Lecture 8: Detection/classification of misaligned features, Matched Filtering II pdf
Jupyter notebooks: these are for the hackathon in the Lecture 9 class
1. Classical least squares demonstration (matrix algebra, covariance matrix, cost function, etc.) notebook
2. Least squares fit done iteratively in a simple network notebook
3. Pulse detection (classification) with a simple network (aligned pulses) notebook
4. Misaligned Pulse detection (first take) notebook
5. A neural net in 11 lines of Python

Lecture 9 (19 Feb) Hackathon to investigate simple networks

James M. Cordes