FINAL EXAM: You can download the Final Exam here as of 11:15am on 03/20. It has also been emailed to you, and is available on Gradescope, and in Piazza.**You must read, sign, and adhere to the Academic Integrity Pledge**(per Question 1). This means no collaboration with other humans, and no use of resources (online or otherwise) beyond what is listed on the exam. You**may**use your course notes, the recommended course textbooks, and the resources on the course webpage: Course Notes and the slides from lectures. You may also refer to old Piazza posts for this course; but do not post or answer new ones, other than private posts to the instructor for clarifying questions.- Write the solutions on your own paper, or on a tablet, or type them on a computer.
- Submit your exam through Gradescope by 2:50pm. Any later submissions will be flagged and will require a good explanation for late submission.
- Good Luck!

Final Office Hours are on Monday, March 16, 10am-12pm, and Thursday, March 19, 8:30-9:30am and 2-3pm. These are virtual office hours, through Zoom. A link to the office hour meeting is in the Google Calendar, has been sent to the class by email, and is posted in Piazza.The final lecture was recorded in a new system called EVT (Educational Vision Technology). You can click here to view and interact with the lecture, its transcript of blackboard work, and transcript of what I said.**Please send me feedback**on this! I have had it installed as a test for a model of online teaching for Spring Quarter, and would like to hear what you think of it.The grading scheme for the course has some new editions. In particular, you**need not take the final exam**; two of the optional grading schemes do not include the final exam. You do not need to choose: I will use whichever grading scheme gives you the highest score.The Final Exam will be**take-home**. It will be posted here, and on Piazza, at 11:15am on Friday, March 20. It will include an**academic integrity pledge**for you to read, copy out, and sign (you may do this during the time from 11:15-11:30). You should write the solutions on your own paper; you need not print the exam. You will turn in the exam on Gradescope (just as you have with all other coursework), with a due time of 2:50pm; you are supposed to stop working on the exam at 2:30pm. In case of technical issues, Gradescope will be set to accept late exams until 7pm; if you do not submit by 2:50pm, however, it will be marked as late, and you will need to contact me to explain why it was not submitted until later.The Final Lab Project deadline has been extended to Wednesday, March 18, at 11:59pm.Homework 4 has been extended to Monday, March 16, at 11:59pm.Take-Home Midterm Exam took place February 10-11. You may download the midterm and solutions here:

182.Midterm.pdf

182.Midterm.Solutions.pdf (The solution to 4(b) has been corrected to replace a 1/N with the correct 1/m.)

The midterm has been graded, and grades released on Gradescope. The regrade request window will be open from Wednesday, February 19 at 8am through Friday, February 21 at 8pm.

Homework 2 and Lab 2 have been graded and released on Gradescope. The regrade request window will be open from Wednesday, February 12 at 8am through Friday, February 14 at 8pm.Homework 1 and Lab 1 have been graded and released on Gradescope. The regrade request window for both will be open from Tuesday, January 28 at 8am through Friday, January 31 at 8pm.Welcome to DSC 155 & MATH 182: Hidden Data in Random Matrices, in "Winter" 2020!

Textbook : There is no single textbook that treats the material in this advanced topics course. We will largely be using the instructor's lecture notes as source material. The topics come from several areas: linear algebra, probability theory, mathematical statistics, and more specialized topics like random matrix theory. Here are three auxiliary texts that will be of use for the material in the first part of the course.

**Linear Algebra**

*Linear Algebra*, by Stephen H. Friedberg, Arnold J. Insel, and Lawrence E. Spence; ISBN 978-0134860244

**Probability Theory**

*Introduction to Probability*, by David F. Anderson, Timo Seppäläinen, and Benedek Valkó; DOI 10.1017/9781108235310

**Probability for Data Science**

*Probability for Data Science*prob140.org/textbook

**Mathematica Statistics**

*Mathematical Statistics and Data Analysis*, by John A. Rice; ISBN 978-8131519547

**These are not required textbooks.**You may find them useful as auxiliary sources, but you need not purchase them for this course.Coursework : There are 5 homework assignments (starting in Week 2); they are posted below.

There are 5 data science labs, done as group work during lab sessions (but to be turned in individually); they can be accessed through DataHub.

There will be a final lab project, due on Wednesday, March 18.

There will be one take-home midterm exam and a final exam; dates, times, and locations posted below.DataHub is UC San Diego's implementation of jupyter hub: a web-based live coding platform. We will use it to host the Data Science Labs component of this course. After classes have begun, you should be able to login to the DataHub and access the DSC 155 & MATH 182 Data Science Labs. Your first discussion section will be devoted to helping you become familiar with jupyter notebooks.Piazza is an online discussion forum; we will use Piazza for the three lectures of Math 180A combined. It will allow you to post messages (openly or anonymously) and answer posts made by your fellow students, about course content, homework, exams, etc. The instructors and TAs will also monitor and post to Piazza regularly. You can sign up here.**Note:**Piazza has an opt-in "Piazza Careers" section which, if you give permission, will share statistics about your Piazza use with potential future employers. It also has a "social network" component, based on other students who've shared a Piazza-based class with you, that comes with the usual warnings about privacy concerns. Piazza is fully FERPA compliant, and is an allowed resource at UCSD. Nevertheless, you are not required to use Piazza if you do not wish.Gradescope is an online tool for uploading and grading assignments an exams (it is now under the umbrella of Turnitin). You will turn in your homework and labs through Gradescope, and you will access your graded exams there as well. Access the class Gradescope site here.

Name | Role | Office | |

Todd Kemp | Instructor | APM 5202 | tkemp@ucsd.edu |

Denise Rava | Teaching Assistant | APM 2220 | drava@ucsd.edu |

Date | Time | Location | |

Lecture A00 (Kemp) | Monday, Wednesday, Friday | 1:00pm - 1:50pm | APM B402A |

Lab A01 (Rava) | Wednesday | 5:00pm - 5:50pm | APM B432 |

Lab A02 (Rava) | Wednesday | 6:00pm - 6:50pm | APM B432 |

Take-Home Midterm Exam | Monday, Feb 10 | 2:00pm | (Home) |

Take-Home Final Exam | Friday, Mar 20 | 11:30am - 2:29pm | (Home) |

Here are lecture notes on topics outside any of the recommended preparatory textbooks.

182.Notes.pdf last updated March 5, 2020.

Here are my lectures notes on Random Matrix Theory. They are intended for a reader who has taken graduate courses in (measure theoretic) probability
theory, complex analysis, real analysis, and have some familiarity with (enumerative) combinatorics. Section 4.1 contains material related
to the discussion in Section 2.4 of our current course notes.

RMT.Notes.pdf

The lectures are typically given via tablet, on notes/slides with some information prepared before lecture, and some filled-in during the lecture. Below, you will find the before and after slides for each lecture (as they are produced).

DSC 155 & MATH 182 is a one quarter topics course on the theory of *Principal Component Analysis* (PCA), a tool from statistics
and data analysis that is extremely wide-spread and effective for understanding large, high-dimensional data sets. PCA is a method of projecting
a high-dimensional data set into a lower-dimensional affine subspace (of chosen dimension) that best fits the data (in least squares sense),
or equivalently maximizes the preserved variance of the original data. It is a computationally efficient algorithm (utilizing effective numerical
methods for the singular value decomposition of matrices), and so is often a first-stop for advanced data analytics of big data sets. The algorithm
produces a canonical basis for the projected subspace; these vectors are called the *principal components*. The question of choosing the dimension
for the projection is subtle. In virtually all real-world applications, a "cut-off phenomenon" occurs: a small number (usually 2 or 3) of the singular
values of the sample covariance matrix for the data account for a large majority of the variance in the data set.

A full (and honestly still developing) theoretical understanding of this "cut-off phenomenon" has only arisen in the last 15 years, using the tools
of random matrix theory. The result, known as the *BBP Transition* (named after Jinho Baik, Gerard Ben Arous, and Sandrine Peche, who discovered it in 2005),
explains the phenomenon in terms of analysis of outlier singular values in low-rank perturbations of random covariance matrices. This uses a simple model
(standard in signal processing and information theory) of a signal in a noisy channel to tease out exactly when an outlier will appear in PCA analysis of noisy data,
and further predicts more subtle effects that current statistical methods are being developed to correct to produce even more accurate data analysis.

The goal of this course is to present and understand the PCA algorithm, and then analyze it to understand how (and when) it works. Time permitting, we will then use these ideas to apply to some current interesting problems in data science and computer science, such as community detection in large (random) networks.

There is no textbook that covers this material, so we will sample from different sources (and prepare instructor lecture notes along the way). Because this is a brand new experimental course, which has never been taught at UCSD (or anywhere else), it is hard to say in advance what the schedule of topics will be. Instead, here is a point-form list of topics we intend to cover, in the order they will be covered.

- Review of relevant linear algebra concepts (linear equations; rank and nullity; orthogonal rotations and projections; eignevalues and eigenvectors), and introduction to singular value decomposition.
- Review of relevant topics from probability (distributions and densities; random vectors, covariance, and correlation).
- Introduction to basic statistical concepts: unbiased estimators, maximal likelihood estimation, sample covariance, (linear) least squares and regression.
- Principal Component Analysis (PCA):
- as best approximating
*d*-dimensional affine subspace projection - as highest variance
*d*-dimensional affine subspace projection - equivalence of these two characterizations
- finding the principal components: singular value decomposition
- complexity analysis
- which $d$ should we pick? Demonstration of cut-off phenomenon in many real data sets.
- Noise without signal: spectral statistics of totally random data sets:
- method of moments
- Wigner's semicircle law
- Marchenko-Pastur laws for Gaussian rectangular matrices
- Universality of eigenvalue statistics
- Outlier singular values:
- robustness of the bulk singular values
- spiked covariance models
- the BBP transition
- Application of the BBP transition to understand the generic behavior of PCA, and discussion of fine-tuning in recent work of Johnstone and Donoho.
- Introduction to community detection: the stochastic block model.

**Prerequisite:** The prerequisites are Linear Algebra (MATH 18) and Probability Theory (MATH 180A).
MATH 109 (Mathematical Reasoning) is also strongly recommended as a prerequisite or corequisite. Also: MATH 102
(Applied Linear Algebra) would be beneficial, but is not required.
For the lab component of the course, some familiarity with Python and MATLAB is helpful, but not
required.

**Lecture:** Attending the lecture is a fundamental part of the course; you are
responsible for material presented in the lecture *whether or not it is discussed in the notes.*
You should expect questions on the homework and exams that will test your understanding of concepts discussed in the lecture.

**Homework:** Homework assignments are posted below, and will be due at 11:59pm
on the indicated due date. You must turn in your homework through Gradescope; if you have produced it on paper,
you can scan it or simply take clear photos of it to upload. It is allowed and even
encouraged to discuss homework problems with your classmates and your instructor and TA, but your final write up of your
homework solutions must be your own work.

**Labs:** The data science labs are accessible through DataHub. The turn-in components
should be exported as pdf files and turned in through Gradescope; they are due at 11:59pm on the dates indicated on the labs.

**Lab Project:** You will choose a real-world high-dimensional data set, and implement the PCA algorithm to analyze it. You will use the tools
explored in this class to give a careful analysis of how the PCA algorithm performed, what it discovered about the data, and what structural shortcomings
were evidence in the analysis. Topics and data-sets to be approved by the intructor.

**Take-Home Midterm Exam:** There will be a single take-home midterm exam, available immediately after the lecture on Monday, February 10,
due the following day before midnight. You are free to use any paper / online resources you like during the exam, but **collaboration with other people
is not allowed**. This will be enforced by the honor-system; be warned, we will grade carefully looking for evidence of collaboration on the exam, and
any suspicious cases will be reported as academic integrity violions (with likely severe penalties).

**Final Exam:** The final examination will be held at the date and time stated above.

- It is your responsibility to ensure that you do not have a schedule conflict involving the final examination; you should not enroll in this class if you cannot take the final examination at its scheduled time.
- The exam will be
**open book**: you may use the recommended course textbooks, your course notes, and all resources on this course webpage (slide/note transcripts before/after, course notes, and the EVT recording of the final lecture) during the exam. Please do not use other resources. You may not collaborate or interact with other humans during the exam.

**Administrative Links:** Here are two links regarding UC San Diego policies on exams:

- Exam Responsibilities An outline of the responsibilities of faculty and students with regard to final exams
- Policies on Examinations The Academic Senate policy regarding final examinations (These are the rules!)

**Regrade Policy:**

- Your exams, homeworks, and labs will be graded using Gradescope.
You will be able to request regrades
through Gradescope for a specified window of time. Be sure to make your request within the specified window of time; no regrade requests will be accepted after the deadline.*directly from your TA***Note:**Your grader will consider your regrade request only if you have explained clearly, thoroughly, and politely why you think an error in grading was made.

**Grading:**
Your cumulative average will be determined by whichever of the following four weighted averages is higher (for you):

- 15% Homework, 15% Labs, 20% Take-Home Midterm, 10% Lab Project, 40% Final Exam
- 15% Homework, 15% Labs, 10% Take-Home Midterm, 20% Lab Project, 40% Final Exam
- 25% Homework, 25% Labs, 30% Take-Home Midterm, 20% Lab Project
- 25% Homework, 25% Labs, 20% Take-Home Midterm, 30% Lab Project

Your course grade will be determined by your cumulative average at the end of the quarter, and will be based on the following scale:

A+ |
A |
A- |
B+ |
B |
B- |
C+ |
C |
C- |

97 |
93 |
90 |
87 |
83 |
80 |
77 |
73 |
70 |

The above scale is guaranteed: for example, if your cumulative average is 80, your final grade will be *at least* B-. However,
your instructor may adjust the above scale to be more generous.

**Academic Integrity:** UC San Diego's
code of academic integrity
outlines the expected academic honesty of all studentd and faculty, and details the consequences for academic dishonesty.
The main issues are cheating and plagiarism, of course, for which we have a zero-tolerance policy. (Penalties for these
offenses always include assignment of a failing grade in the course, and usually involve an administrative penalty, such
as suspension or expulsion, as well.) However, academic integrity also includes things like giving credit where credit
is due (listing your collaborators on homework assignments, noting books or papers containing information you used in
solutions, etc.), and treating your peers respectfully in class. In addition, here are a few of our expectations for
etiquette in and out of class.

**Entering/exiting class:**Please arrive on time and stay for the entire class/section period. If, despite your best efforts, you arrive late, please enter quietly through the rear door and take a seat near where you entered. Similarly, in the rare event that you must leave early (e.g. for a medical appointment), please sit close to the rear exit and leave as unobtrusively as possible.**Noise and common courtesy:**When class/section begins, please stop your conversations. Wait until class/section is over before putting your materials away in your backpack, standing up, or talking to friends. Do not disturb others by engaging in disruptive behavior. Disruption interferes with the learning environment and impairs the ability of others to focus, participate, and engage.**Electronic devices:**Please do not use devices (such as cell phones, laptops, tablets, iPods) for non-class-related matters while in class/section. No visual or audio recording is allowed in class/section without prior permission of the instructor (whether by camera, cell phone, or other means).**E-mail etiquette:**You are expected to write as you would in any professional correspondence. E-mail communication should be courteous and respectful in manner and tone. Please do not send e-mails that are curt or demanding.

Weekly homework assignments are posted here. Homework is due by 11:59pm on the posted date, through Gradescope. Late homework will not be accepted.

Due Date | Homework | Solutions |

01/17/20 | 182.HW1.pdf | 182.HW1.Solutions.pdf |

01/31/20 | 182.HW2.pdf | 182.HW2.Solutions.pdf |

03/06/20 | 182.HW3.pdf | 182.HW3.Solutions.pdf |

03/16/20 | 182.HW4.pdf | 182.HW4.Solutions.pdf |