Hypertext Maths Dictionary

Index

Adjoint

The transpose of the complex conjugate of a matrix. Often called Hermitian Conjugate.

Return to index

Basis of a space

A basis of a space is a set of vectors that can generate any point in the space by linear combination. The vectors may be linearly independent, such as the x,y,z axes of 3D Cartesian space, but don't have to be.
The smallest set of basis vectors is called a minimal spanning set, and they are linearly independent, and therefore orthogonal.

Return to index

Bayes' Theorem

The conditional probability
P(B|A)=P(A|B)P(B)/P(A)

Return to index

Conditional Probability

The probability of event A given event B is expressed as P(A|B) and can be calculated from:
P(A|B)=P(A & B)/P(B)
where P(B) is the probability of event B and P(A&B) is the joint probability of A and B.

Return to index

Constrained Optimisation

Optimisation is the process of minimising or maximising a function. Constrained optimisation is optimisation subject to constraints on the functional parameters or value.
eg: find the maximum of y=x^2+3 for x1
see Linear Programming and Lagrange Multipliers

Return to index

Correlation

The extent of the correspondence between two random variables. Can be calculated by
cor(x,y)=cov(x,y)/sqrt(vx x vy)
where cor(x,y) is the correlation between x and y, cov(x,y) is the covariance of x and y and vx, vy are the variances of x and y.
If x and y are uncorrelated, then they are independent. They are positively correlated if x increases as y does and negatively correlated if x decreases as y increases.
The correlation matrix Ca of vector A of length n is the n x n matrix the entries of which are the pairwise correlations of the elements of A. The i,jth element of Ca is the correlation of the ith and jth elements of A.

Return to index

Covariance

Is a measure of the association between two random variables, X and Y.
Is calculated as the expected value of the product of their deviations from their means.
cov(X,Y)=E((X-Mx)(Y-My)) (=E(XY)-MxMy =0 if X,Y are independent)
Mx= E(X) is the mean of X
A covariance matrix is a matrix, the entries of which are the covariances of the pairs of variables in a set.
e.g. if a set of variables is defined as {x,y,z} and the covariances are respectively cov(x,y) cov(x,z) cov(y,z) then the covariance matrix is:

������� -������������������������� -

������� |cov(x,x) cov(x,y) cov(x,z)|

������� |cov(x,y) cov(y,y) cov(y,z)|

������� |cov(x,z) cov(y,z) cov(z,z)|

������� -������������������������� -

Note that cov(y,x)=cov(x,y) so the covariance matrix is symmetric.
Also, cov(x,x)=E((X-Mx)^2)=variance(x)
See Correlation

Return to index

DES Cryptography

A 'secret key' method of enciphering a message.
DES is the Data Encryption Standard, an encryption block cipher defined and endorsed by the U.S. government in 1977 as an official standard; the details can be found in the latest official FIPS (Federal Information Processing Standards) publication concerning DES [NIS93b]. It was originally developed at IBM. DES has been extensively studied since its publication and is the most well-known and widely used cryptosystem in the world.
DES is a symmetric cryptosystem: when used for communication, both sender and receiver must know the same secret key, which is used both to encrypt and decrypt the message. DES can also be used for single-user encryption, such as to store files on a hard disk in encrypted form. In a multi-user environment, secure key distribution may be difficult; public-key cryptography provides an ideal solution to this problem (see RSA Cryptography).
DES has a 64-bit block size and uses a 56-bit key during encryption. It is a 16-round Feistel cipher and was originally designed for implementation in hardware.
The algorithm is as follows:
Feistel ciphers are a special class of iterated block ciphers where the ciphertext is calculated from the plaintext by repeated application of the same transformation or round function.
In a Feistel cipher the text being encrypted is split into two halves (32 bits each for DES). The round function f is applied to one half using a subkey (derived from the 56 bit secret key) and the output of f is exclusive-ored with the other half. The two halves are then swapped. For DES, the function f is a 'permutation' operation where the bits of the block to be coded are shuffled according to the value of the subkey. Each round follows the same pattern except for the last round where there is no swap. For DES, there are 16 rounds (iterations).
Decoding is the same as encoding, using the same key, which is why this is called a symmetric encryption process.
[Fei73] H. Feistel. Cryptography and Computer Privacy, Scientific American, May 1973.

Return to index

Dynamic Programming

Based on Bellman's principle, is a method to produce the optimal solution for an n step process based on the optimal solution to an n-1 step process, which commences with the optimal solution of the first step. (Compare this to proof by induction?) (BTW, Bellman's principle of optimality should not be confused with Lewis Carrol's Bellmans principle, "What is said thrice is true." There does however seem to be some similarity. :-) )

Return to index

Eigenvector

An eigenvector of a matrix A is a vector e such that eA=Le where L is a scalar. L is known as an eigenvalue of A corresponding to the eigenvector e.

Return to index

Gradient

The gradient of a scalar field is a vector field of the first derivatives of the scalar function along each axis. eg

Where i and j denote the unit vectors along the x and y axes.

Return to index

Hermitian

A matrix is hermitian if it is equal to its conjugate transpose. For a real matrix, this means if and only if (iff) it is symmetric

Return to index

Hessian

A Hessian matrix H of a function is the matrix of its second differentials evaluated at a point, giving a measure of the curvature at that point.

At a minimum of the function, H is positive definite and at a maximum, negative definite. It is indeterminate at a saddle.
If the function is of quadratic form,
f(x)=x'Hx+b'x+c
then the Hessian can be used to map differences in position into differences in gradient.
If g1 and g2 are the gradients at x1 and x2 respectively, then
g2-g1=H(x2-x1).

Return to index

Inner Product

The inner product of two vectors is their product in a given inner product space.
In particular in Euclidean space it is the scalar that results from the sum of the products of the components of the vector for each axis. e.g.
if A=ax+by+cz
and B=dx+ey+fz
then their inner product
C=A.B=ad+be+cf
A.B can also be calculated as |A||B|cos(g) where g is the angle between the vectors A and B
(Bold, such as A refers to vector, and x,y and z are the unit cartesian axes for 3d space)

Return to index

Hidden Markov Model

A hidden Markov model is a stochastic automaton consisting of a set of states qi each with an associated conditional probability distribution p(xn|qi) for an output vector xn, and transition probabilities P(qj|qi) for transitions between states. It is called hidden because the states are not seen, only the output vectors.

Return to index

Joint Probability

The probability of event A and event B is expressed as P(A&B).
For independant variables, P(A&B)=P(A)P(B).
eg for a two-up toss, the prob of a head and a tail is 1/2*1/2
P(A&B) is also known as the prob of A 'intersection' B, from the Venn diagram showing the classes in sample space where B is the event 'at least one tail' and A is the event 'at least one head'. The intersection of the two classes is the subclass, 'a head and a tail'.

Return to index

Kalman Filter

A Dynamic Programming method that produces optimum (in least mean square sense) estimates of the n+1th state of a linear dynamical system based on the estimate for the nth state.

Reference: Haykin. Adaptive Filter Theory. Prentice Hall. ISBN 0-13-004052-5 025

Return to index

Karhunen-Loeve transform

Also known as Hotelling, eigenvector, or principal component transform. Often called simply the KL transform, especially in the image coding field. Based on the statistical properties of a set of data points, often images. Principal uses are data compression and image rotation.
Imagine an NxN image transmitted a number of times. Make a set of vectors xi corresponding to each copy of the image such that xi=[xi1,xi2.......xiM]' where M=NxN and the xin are the pixels of each copy of the image, de-rasterised. The covariance matrix of the xi vectors is Cx, and is an M x M matrix. The average image is mx which has pixel values averaged over all the xi.
Let ei and Li, for i=1,2...M be the eigenvectors and corresponding eigenvalues of Cx, arranged so that L1 >= L2 >=L3 etc.
An MxM transformation matrix A is given by making the rows of A equal to the eigenvectors, ei, of Cx. The KL transform of an image x is then simply found by multiplying the centralised image vector (x-mx) by A to get y=A(x-mx). y is the KL transform of x. The properties of y are that it is uncorrelated, and that it is possible to reconstruct x from y using x=A'y+mx.
Consider the transform as an expansion,
y=e0(x-mx)+e1(x-mx)+e2(x-mx)....
then important features of the KL are that the terms of the expansion due to the eigenvectors are ranked in order of significance and those terms associated with smaller eigenvalues can be dropped for purposes of compression or approximation. The terms are orthogonal and also the term associated with the largest eigenvalue can be considered a rotation in the measurement space that aligns the new coordinate system with the direction of largest variance, as shown in the 2D example below, where points in x=[x1,x2] space are the measurements, and points in y=[y1,y2i] are in the transformed space, in which y1=e0(x) is the variable with largest variance. Note that y1 and y2 are orthogonal.

See also Singular Value Decomposition.

Return to index

Linear Programming or Linear Optimisation

Linear programming is constrained optimisation where both the function to be optimised and the constraints are linear functions of the independent variables.
eg: find the maximum of

One method that can be used for this problem is the Simplex Method.

Return to index

Linear Regression

A process for fitting a straight line to a set of observations.
e.g. Given a set of N measurements of the color values in images of faces {R,G,B} find a linear equation R=gG+bB+c that fits the observations best in a least square error sense. This can be done by writing an equation that describes a figure of merit for the fit called Chi squared.

To minimise Chi squared, and hence optimise the fit, differentiate Chi squared wth respect to g,b and c and set the 3 resulting equations to 0.
Rewrite the equations as
SP=R where

and then solve for P and hence the optimum values of g,b and c using

There is more to this, of course. One should also consider the final value of Chi squared as a measure of how good the parameters g,b and c are, ie how well the line models the actual observations. If there are values of the variance of the R G and B measurements made, they can also be used to improve the accuracy of Chi squared.
Reference: Numerical Recipes in 'C'. 2nd Ed. Cambridge Uni. Press

Return to index

Lagrange multipliers

For certain constrained optimisation problems, where the gradients of the constraints are independent at the minimum x0, the problem can be reduced to an unconstrained optimisation problem by reformulating the problem with the addition of Lagrange multipliers, which are real numbers, multiplying the constraint functions.

See also Constrained Optimisation and Linear Programming

Return to index

Laplacian

The Laplacian of a point in a scalar field is the sum of the second derivatives along the coordinate axes. It is also twice the average curvature at that point. Given the Hessian matrix H for the function, the Laplacian can be calculated as the trace of H. (The sum along the diagonal)

Return to index

Orthogonal

Two vectors are orthogonal if their inner product is zero. In Cartesian space this means that they are perpendicular. If the vectors are of unit length, they are called orthonormal. Orthonormal vectors can be used to define the basis of a space.

Return to index

Positive definite

A matrix A is positive definite if X'AX > 0 for every real nonzero column matrix X.

Return to index

Restricted normal form

A formulation of a Linear Optimisation problem that conforms to a set of rules. This form is suitable for solution via the Simplex method. The rules are as follows:

1. Linear in the variables and in the objective function. E.g. z=2x₂-4x₃is the objective function. This is the function to optimise. (If the Simplex method is to be used, the problem needs to be formulated as a maximisation.)

_2.Equality or non-negativity constraints only on the variables. E.g. x₁,x₂,x₃,x₄ >=0 and x₁=2-6x₂+x₃, x₄=8+3x₂-4x₃

3. Each equality constraint must have at least one variable that has a positive coefficient and that appears in that one constraint only. See the example in 2 above where this is satisfied for x₁ and x₄.

Linear optimisation tasks not in restricted normal form can be re-written in that form with the introduction of dummy variables to enforce rules 2 and 3. The objective function must also be rewritten taking the dummy variables into account. The dummy variables can be removed at an early stage in the Simplex method.

Reference: Numerical Recipes in C, Second edition, W. Press, S. Teukolsky, W.Vetterling, B.Flannery

Return to index

RSA public key cryptography

A method of encryption that relies on the difficulty of factorising large numbers.
Method: Choose two large primes p and q. n=pq is called the modulus. Choose a number e such that e is less than n and such that e and (p-1)(q-1) have no common factors except 1. (i.e. e is relatively prime to (p-1)(q-1)) Find d such that (p-1)(q-1) is a factor of(ed-1)
e is called the public exponent, and d is the private exponent.
(n,e) is the public key and (n,d) is the private key.
The following holds true: For a message m, c=(m^e)modulo n is the cipher and m=(c^d) modulo n is the recovered message. Anyone who knows the public key (n,e) can encipher a message, but only the holder of the secret private key (n,d) can decipher it.
In practise, this method of encryption is often used to encipher the secret key of a secret key encryption method, such as DES so that it can be transmitted with a DES encrypted message. Only the holder of the secret key for the RSA encryption method can recover the secret DES key and hence decode the DES encrypted message.
RSA encryption and similar public key methods can also be used for authentication. Given a cipher c=(m^d) modulo n where (n,d) is Fred's private key, then if Fred sends m and c to George, who has (n,e) which is Fred's public key, George can calculate (c^e)modulo n which should be equal to m if the message is authentic.
Example, with small primes. (i.e. not secure, just explanatory)

Choose p=5
Choose q=3
=>n=15
Choose e=7 Is less than n and has no common factors with (p-1)(q-1)=4*2=8.
To choose d, if 8 is a factor of 7*d-1 then 7*d-1=s*8 where s is a dummy integer.
d has to be an integer and so does s.
d=15 =>s= 13 Choose d=15 (THIS IS THE HARD PART!)

So (15,7) is the public key and (15,15) is the private key.

IMPORTANT NOTE! Some pathological cases

1)m has to be less than n-1 for this to work. ( % = modulo )
PROOF: (x^y)%z =(x*x*...)%z =(x%z*x%z*....)%z =((x%z)^y)%z because modulo arithmatic states that 2 numbers are equivalent if they differ by an integer times the modulus. i.e x=(x+n*z)%z.
Using this, we see that(m^e)%n=((m%n)^e)%n and if m>n-1 then m%n=x (where n>x ) then during encryption, (m^e)%n=(x^e)%n=y but then during decryption(y^d)%n=x not m and although x and m might be equivalent under modulo n this doesn't help recover the encrypted message because we don't know how many n's to add to x to get m.

2) m=n gives 0 for everything.

3) m=n-1 -> c=1 if e is even and c=m if e is odd, which isn't interesting

4) c often = m e.g. m=10 ->c=10 for n=15 regardless of the value of e.

TEST: choose m=13 as the plaintext message.
c=(13^7)%15 = 7 is the encrypted text
m'=(7^15)%15 =13 is the decoded message and = m

Return to index

Simplex Method for Linear Optimisation

A method, due to Dantzig, of solving linear optimisation problems that have been converted to restricted normal form. It is best explained by example.

Problem: Maximise z=2x₂-4x₃ with x₁,x₂,x₃,x₄ >=0 and x₁=2-6x₂+x₃, x₄=8+3x₂-4x₃

Solution:

1. Identify right-hand and left-hand variables and tabulate the problem. Here z ,x₁ and x₄are left-hand variables because they appear on the left of the equations, and x₂ and x₃ are right hand variables, because they appear only in the right-hand side of the equations. This particular example is not a unique arrangement, because the equations could be rerwritten with x2 as a left-hand variable and x1 as a right-hand variable. (In fact, the Simplex method relies on this, as can be seen later.)

Table 1

		X₂	X₃
z	0	2	-4
X₁	2	-6	1
X₄	8	3	-4

2. Next, examine the coefficients in the columns of the z row. If both of the coefficients for the variables x₂ and x₃ (the 2 and the -4 in this example) are negative, any increase in the right-hand variables x₂ and x₃ will cause a decrease in z, and so the equation for z is an objective function that is maximised by x₂ and x₃ being zero, and the values of x₁ and x₄ can be found from them, and so we are finished. In this case, however, we do have a positive entry in the column belonging to x₂, and it has the value 2. So, how much can we increase x₂ by before one of the left hand variables is driven negative? (They all must be >=0, by the constraints given). The answer is 2/6, given looking down the x₂ column until we find a negative coefficient (-6) which is then called the pivot element, and then by dividing the value in the constant column, 2, by the magnitude of the pivot element, which is 6, to give 2/6. This can be explained by considering the equation for x₁, x₁=2-6x₂+x₃ which goes to 0 for x₂=2/6 (remember that x₁ must be >=0 for all values of x₂ and x₃, so we set x₃ to 0 when finding the maximum value for x2). So x₂ can take the maximum value of 2/6. Note that if no pivot element can be found, i.e. all the entries in the chosen column are positive, then this means that the equation for z is unbounded because x₂ can increase indefinitely without breaking any constraints. This also finishes the problem.

_3.Now, give x₂ the value of 2/6, and convert x₂ into a left-hand value by rewriting the equation for x₁. You can see that what was the x₁ row and the x₂ columns pivot about the -6, when we write x₂=1/3-1/6x₁+1/6x₃and substitute the new value for x₂ into the z equation giving z=2(1/3-1/6x₁+1/6x₃)-4x₃=2/3-1/3x₁-11/3x_3.Similarly, the equation for x₄ changes to x₄=9-1/2x₁-7/2x₃

4. Next, repeat step 1 and construct a new table using the new formulation.

		X₁	X₃
z	2/3	-1/3	-11/3
X₂	1/3	-1/6	1/6
X₄	9	-1/2	-7/2

5. Repeat steps 2 and 3, until a formulation is reached where there is no pivot element, i.e. all the entries in the z row (ignoring the constant) are negative. Such is the situation that we have in the Table 2 above. Now the values of x1,x2,x3,x4 that maximize the z function can be read from the table. z has a maximum of 2/3 when x₂=1/3, x₄=9 and x₁ and x₃ are 0.

Summing up, the principle of this technique is to rewrite the equations to arrive at an equation for z that has negative coefficients for all the right-hand variables, which directly gives the maximum for z when the right-hand variables are 0, and gives the values of the remaining, left-hand, variables from their equations.

Note that this method can be carried out using software, rather than manually. Also note that is not called the Simplex method because of its comparitive simplicity. The name arises from the formal meaning of the mathematical term simplex, which describes the most elementary figure for a given dimensionality.

Reference: Numerical Recipes in C, Second edition, W. Press, S. Teukolsky, W.Vetterling, B.Flannery

Return to index

Singular

A square matrix that has a determinant equal to zero. Not having an inverse.

Return to index

Singular Values

Any of the square roots of the eigenvalues of the product of a real matrix A with its transpose A'

Return to index

Singular Value Decomposition (SVD)

The representation of a matrix A as USU' where U is a unitary matrix, U' is its adjoint and S is a diagonal matrix, the elements of which are the singular values of A arranged in descending order. The rows of U are the eigenvectors of AA'. The value of this is that A can be approximated in a least square error optimal fashion by zeroing the smaller singular values in S.

Return to index

Taylors Series

Is the expansion of a function into a series.

where f' denotes first derivative of f(x) with respect to x, f'' the second, etc. The series may be truncated to form an approximation.

Return to index

Unitary matrix

A matrix for which the Hermitian conjugate is its inverse.
A real matrix is called unitary if it is the inverse of its transpose, and also called an orthogonal matrix, because any pair of distinct rows or columns are orthogonal vectors.

Return to index

Variance

Var(x) is the square of the standard deviation of a random variable x and is calculated using:
Var(x)=E((x-E(x))^2)
where E(x) is the expected value of x.

Simple variance
The estimator of the variance of a population:

Mx =E(x) is the mean of x. The variance, and hence the standard deviation, can be calculated on the run, as measurements are made, by accumulating the sum of the measurements and calculating the mean at every point, then accumulating the sum of the differences squared between the measurement and the mean then dividing by n-1.

Return to index

Back