# 5.  Orthogonality and Least Squares

§5.1 Introduction

Today we will talk about orthogonality and least squares.  Given a basis for a vector space, a linear algebra student should begin salivating, much like Pavlov's dog, and ask, can I convert this into an orthonormal basis?  (Please avoid salivating on the computer equipment).  The idea is, of course, that orthogonal bases are much easier to work with and lend themselves nicely to performing various computations such as finding vector coordinates with respect to this basis and projecting vectors on various subspaces of a vector space.  Closely tied to this idea is the concept of orthogonal matrices: square matrices whose columns form an orthonormal set.  Despite this simple definition, they possess many nice properties, some of which we will discuss in this lab.  As far as applications go, we will talk about least squares.  It is a problem that often arises with experimental data wherein we try to find a function that most closely models our collected data.  So, sit back, relax, and enjoy the last lab of the quarter.

§5.2 Orthogonality

We start with vector orthogonality.  Two vectors x and y  are said to be orthogonal if their inner product is 0; i.e.,

< x, y > = xy xTy = 0

Remember that the MATLAB command that calculates inner products looks like

>> x'*y

or alternatively, you could enter

>> dot(x,y)

where, "dot" stands for "dot product".

Enter the following vectors into MATLAB.  Remember to use semicolons between entries so that you produce column vectors.

 Exercise 5.1 (a)  List all maximal orthogonal subsets of the above set.     Note: B is called a maximal orthogonal subset of X, if there are no larger orthogonal subsets of X containing B.  In other words, this question is asking you to group the vectors v, w, x, y, and z in as many ways as possible so that all the vectors in each group are orthogonal to each other, and so that you can't add any other vectors to each group that are orthogonal to the ones in that group.  For example, the subset {w, x} contains two vectors that are orthogonal to each other, and no other vector is orthogonal to both of these.  But this is only one example; there are others.     What is the maximum number of non-zero orthogonal vectors that you can find in R3?  What about Rn?  Explain. (b)  Pick the largest subset from part (a) and normalize all the vectors using the following command: >> v = v/norm(v) That is, we use the norm() command to calculate the norm of the vector v and then we divide by the norm to get a vector of length 1.  (The above will normalize the vector v, so you will need to change the letter "v" to other letters to normalize other vectors.)     Store the resulting vectors in MATLAB as columns of a matrix W.  Enter them in alphabetical order from left to right.

Let us take a closer look at the resulting matrix W.  It is an example of what are called orthogonal matrices.  They are square matrices whose columns form an orthonormal set.

 Exercise 5.2 (a)  Calculate WTW.  What do you get?  Why should we expect this result? (b)  Enter the following vectors into MATLAB: Compute the norm of b and the norm of Wb.  What do you notice?  Compute the dot product < a, b > and compare it to < Wa, Wb >.  One of the fundamental properties of orthogonal transformations is that they preserve norms of vectors and the standard inner product.  In fact, we could have alternatively defined an orthogonal transformation as one that preserves the standard inner product. (c)  It should be clear that our matrix W is invertible (the columns are linearly independent as the vectors are orthogonal.)  Hence we can compute W-1.  Use inv(W)to compute the inverse of W and compare it to WT.  What do you observe?

Remark 5.1  Orthogonal matrices are incredibly important in geometry and physics.  One of the reasons why, is that orthogonal 2 x 2 matrices with positive determinant represent rigid motions of the plane that keep the origin fixed.  Similarly, 3 x 3 orthogonal matrices with positive determinant represent rigid motions of space keeping the origin fixed.  If you allow the determinant to be negative, you also allow reflections through a line in the case of the plane, or through a plane in the case of 3-space.  In any case, n x n orthogonal matrices are precisely the distance preserving transformations of n-space that keep the origin fixed.  As an easy exercise, try to see geometrically why all distance-preserving transformations of the plane that keep the origin fixed must either be reflections or rotations.  As a more challenging exercise, try to classify all distance preserving transformations of the plane.

§5.3 Orthogonal Projections and Gram-Schmidt Process

First, we will look at orthogonal projections onto a vector.  Hopefully, you were able to determine in §5.2 above that the vectors v and w are not orthogonal.  (In this section, we will look at vectors that are not orthogonal.  This is because orthogonal projections are trivial if the vectors we start with are orthogonal.  Projecting a vector onto an orthogonal vector gives 0, the zero vector, as its orthogonal projection.)  Suppose we want to find the orthogonal projection of v onto w, let's denote it by v.  The projection formula tells us that

v = ( < v, w > / < w, w > ) w

Note that if w is normalized to length 1, you need not divide by <w, w> since it will be equal to 1 already.

Here's a pictorial depiction of what we are doing

It is not possible to enter the symbol v into MATLAB.  Instead, when defining the vector v, name the variable "vbar".

 Exercise 5.3 (a)  Find v = (the orthogonal projection of v onto w) and z = (the component of v orthogonal to w).  (Note from the picture that z  = v - v.)  Then write v as the sum of these two vectors. (This is the vector from exercise 5.1). (b)  Use MATLAB to check that z is orthogonal to v.  (Keep in mind that rounding errors introduced at each step may give you an answer that doesn't quite look like what you might expect.)

The idea of projecting a vector onto a subspace is easily extended from the above case by looking at an orthogonal basis for the subspace, projecting the vector onto each basis element, and then adding them all up to form the orthogonal projection.

 Exercise 5.4  Use MATLAB to find the projection of the vector (3, 3, 3)T onto the subspace spanned by the vectors {x, y}.  (These vectors are from exercise 5.1.)

As you can see, in order for us to project a vector onto a subspace, we must be able to come up with an orthogonal basis for that subspace.  As you know the Gram-Schmidt process accomplishes just that.  The MATLAB command that performs the Gram-Schmidt process is called qr().  (Note that this is qr, and not gr.)  Given a matrix A, it performs the QR-factorization algorithm, and returns an orthogonal matrix Q whose columns are obtained from columns of A by using Gram-Schmidt, and an upper triangular matrix R, such that A = QR.

Example 5.1

Let's try to find an orthonormal basis for the subspace spanned by the vectors a and b from exercise 5.2.  We start by defining a new matrix A with a and b as its columns.

>> A = [a b]

A =

1     2
1     0
0     3

Next, we apply the qr() command.

>> [Q,R]=qr(A)

Q =

-0.7071     0.3015    -0.6396
-0.7071    -0.3015     0.6396
0          0.9045     0.4264

R =

-1.4142     -1.4142
0           3.3166
0           0

Remark 5.2  There is a slight ambiguity above that results from the fact that our matrix is not square.  The output consists of a 3 x 3 matrix Q.  However, when we perform Gram-Schmidt on our two vectors we should get two (instead of three) orthonormal vectors.  That is why we are going to use a shortened version of the command.  Type the following and notice the difference in output

>> [Q,R]=qr(A,0)

Q =

-0.7071     0.3015
-0.7071    -0.3015
0          0.9045

R =

-1.4142     -1.4142
0           3.3166

We get that the orthonormal basis for the subspace spanned by a and b is

 Exercise 5.5 (a)  The following is a basis for R3: {(1, 2, 1)T, (2, 1, 2)T, (1, 1, 2)T}     As in the example above, turn it into an orthonormal basis. You should obtain an orthogonal matrix Q, whose columns are the vectors obtained by performing Gram-Schmidt on the above set. (b)  Store the eigenvalues of Q in a vector v as follows: >> v = eig(Q) Note that in addition to the eigenvalue 1, two other eigenvalues are complex numbers that are conjugate to each other.  That's quite all right since solving the characteristic equation may result in complex solutions.  What we will do next is examine the nature of these eigenvalues. Let's enter the following into MATLAB: >> norm(v(2)) This gives the absolute value of the second entry in the vector v, that is, the absolute value of the second eigenvalue.  Do the same for the third eigenvalue.  What do you observe?

Remark 5.3  (Euler's Theorem) It is always the case that the eigenvalues of any orthogonal matrix have absolute value 1. Now, suppose we have a 3x3 orthogonal matrix T whose determinant is equal to 1. Its characteristic polynomial will be of degree 3, and hence will have a real root. This root must be ±1, since the absolute value of any eigenvalue of an orthogonal matrix is 1. Thus we have the following options for the set of eigenvalues of T:

{±1, ±1, ±1} or {±1, z, z}

where z is a complex number and z is its conjugate. In any case, the product of eigenvalues must equal to the determinant of the matrix which is 1, and so the only possibilities are:

{1, 1, 1}, {1, -1, -1} and {1, z, z}.

It follows, that any 3x3 orthogonal matrix whose determinant is equal to one, must have an eigenvalue 1. If v is an eigenvector corresponding to the eigenvalue 1, then T(tv) = tTv = tv for any real number t. So T fixes the entire axis on which v lies. Now, remember that T is an orthogonal matrix so it represents some rigid motion of space, and since it leaves some axis fixed, it must be a rotation around that axis!

This fact lies in the heart of Euler's celebrated theorem which states that a motion of a rigid body with one point fixed must be a rotation around some axis.  This is quite striking if you think about it.  Imagine that you are moving your linear algebra textbook around in your hands in such a way that its center always stays in the same place.  You may be doing very chaotic things to the book: turning it to the right or left, rotating it up or down.  You may stop moving it completely, only to start again moments later.  But as long as the center of the book stays in the same place, in the end of all the moving around, you will have done nothing to the book but a very orderly rotation about some axis.

§5.4 Least Squares

Let us now turn our attention to something a bit more practical.  Suppose we are interested in studying the relationship between temperature and pressure.  The first step is to obtain some experimental data.  In order to do that, we attach a temperature and pressure sensor to a metal container.  Starting at room temperature (75°F) we slowly heat the air inside our container to 195°F and record temperature and pressure readings.  We get a total of five data points :

 Temperature (°F) Pressure (lb/sq in.) 75 15 100 23 128 26 159 34 195 38

Since we suspect the relationship to be linear, we will be looking for a straight line with the equation y = mx +b, where y is the pressure, x the temperature, m the slope, and b the y-intercept. We would like this line to pass through the data points we collected from the experiment. Plugging in these data points into the equation of the line, we obtain the following system of equations:

b + 75m = 15
b + 100m = 23
b + 128m = 26
b + 159m = 34
b + 195m = 38

We can attempt to solve this system of equations, by first rewriting it in the matrix form:

Unfortunately, this system does not have a solution, even though the physical law is of linear order.  This is due to the fact that it is not possible to get perfect accuracy when recording data.

Since it is not possible to solve the above system, we use the least squares method to find the closest solution.  That is, we want to get the best fit line.

Remark 5.4  By "best fit line", we mean a line that minimizes the sum of the squares of the distances from each point to the line.

To simplify notation let us assign names to the matrices and vectors of the above system.

To remind you of the idea behind least squares, recall that we are trying to find a vector c that minimizes the distance || Bc - d ||.  Of course, we can never hope for this norm to be zero since Bc = d has no solutions.  However, the equation Bc = projV d, where V is the column space of B, does have a solution.  This is because projV d lies in the column space of B, and B hits every vector in its column space.  The solution to this equation turns out to minimize the norm || Bc - d || (You can look up why in your linear algebra book).  In the following exercise we will solve the equation Bc = projV d directly.

 Exercise 5.6 (a)  Enter the matrix B and the vector d into MATLAB.  Now follow the steps below to compute projV d: 1. We must first find an orthonormal basis for V. For this we use the qr() command. Type in: >> [Q, R] = qr(B,0) 2. The columns of Q form an orthonormal basis for V. Let us give them their own names. Type in: >> x = Q(:,1) >> y = Q(:,2) 3. Now we are ready to compute projV d. Recall that to do this, we must add up the projections of d onto each element in the orthonormal basis of V. Typing in the following will accomplish just that: >> v = dot(x,d)*x  + dot(y,d)*y Remember to include all your input and output in your write-up. (b)  Now solve the equation Bc  =  projV d  =  v, by typing in: >> c = B\v Check your answer is correct by entering: >> B*c - v and making sure the answer is zero.  (Remember that MATLAB may return a very small number instead of zero due to a rounding error.) (c)  Now let's compare the answer in the previous part to what MATLAB gives us using the built-in least squares routine called "lscov".  To use it, we must specify three parameters: the matrix B, the vector d, and a covariance matrix X, which you don't have to worry about in this course.  Type in the following command: >> c1 = lscov (B, d, eye(5)) (Note that for the matrix X we just took a 5 x 5 identity matrix, but don't worry about why.)     How does your answer to this part compare to your answer in part (b)?

In the next exercise we will plot our data points, along with the best-fit line, to illustrate the effectiveness of the least squares method.

 Exercise 5.7 (a)  What is the equation of the best fit line? (Remember that the equation of the line is y = mx + b, and the vector c you obtained in the previous question is equal to  (b, m)T) (b)  Use the equation of the line you obtained in part (a) to calculate the pressure at the following temperatures: 35°F, 170°F, and 290°F. (c)  We can now plot the data points and our line to see visually if the approximation is good or not.  Type the following: >> x=B(:,2); >> y=d; >> t=0:1:300; >> z=polyval([c(2);c(1)],t); >> plot(x,y,'x',t,z)

Remark 5.5  polyval()returns a polynomial of specific degree with coefficient specified by a vector.

§5.5 Feedback
Introduction to MATLAB
We're almost done, but before we finish we'd like to get some (optional) feedback from you so that we can improve this lab.  If you have any comments, write them (by hand if you like) on a separate page at the end of your assignment, so that we can tear it off before we give back your assignment.

 Feedback (Optional) Do you have any suggestions to help improve this lab?  This includes errors or potential errors, typos, unclear statements or questions, or anything else that comes to mind.  The more specific you can be, the easier it will be for us to improve the lab.

§5.6
Conclusion

So what have we learned in this lab?  We've learned that the concept of orthogonal bases cannot be overemphasized.  They are, arguably, the most convenient bases one can use to perform computations.  The related idea of orthogonal matrices is also very important.  Not only do such matrices constitute an area of intense study in mathematics, but they also manifest themselves in physics as rigid motions of space.  We've also talked about the least squares method and we've learned that it is quite useful in analyzing all sorts of experimental data.  As long as we have some idea about the relationship between the data, we can use least squares to make predictions and extend the results beyond the scope of our experiment.