The transpose of the
complex conjugate of a matrix. Often called Hermitian Conjugate.
A basis of a space is a set
of vectors that can generate any point in the space by linear combination. The
vectors may be linearly independent, such as the x,y,z axes of 3D Cartesian
space, but don't have to be.
The smallest set of basis vectors is called a minimal spanning set, and they are
linearly independent, and therefore orthogonal.
The conditional
probability
P(B|A)=P(A|B)P(B)/P(A)
The probability of event A
given event B is expressed as P(A|B) and can be calculated from:
P(A|B)=P(A & B)/P(B)
where P(B) is the probability of event B and P(A&B) is the joint
probability of A
and B.
Optimisation is the process
of minimising or maximising a function. Constrained optimisation is
optimisation subject to constraints on the functional parameters or value.
eg: find the maximum of y=x^2+3 for x1
see Linear Programming and Lagrange Multipliers
The extent of the
correspondence between two random variables. Can be calculated by
cor(x,y)=cov(x,y)/sqrt(vx x vy)
where cor(x,y) is the correlation between x and y, cov(x,y) is the covariance of x and y and vx, vy are the variances of x and y.
If x and y are uncorrelated, then they are independent. They are positively
correlated if x increases as y does and negatively correlated if x decreases as
y increases.
The correlation matrix Ca of vector A of length n is the n x n
matrix the entries of which are the pairwise correlations of the elements of A.
The i,jth element of Ca is the correlation of the ith and jth elements
of A.
Is a measure of the
association between two random variables, X and Y.
Is calculated as the expected value of the product of their deviations from
their means.
cov(X,Y)=E((X-Mx)(Y-My)) (=E(XY)-MxMy =0 if X,Y are independent)
Mx= E(X) is the mean of X
A covariance matrix is a matrix, the entries of which are the covariances of
the pairs of variables in a set.
e.g. if a set of variables is defined as {x,y,z} and the covariances are
respectively cov(x,y) cov(x,z) cov(y,z) then the covariance matrix is:
- -
|cov(x,x) cov(x,y) cov(x,z)|
|cov(x,y) cov(y,y) cov(y,z)|
|cov(x,z) cov(y,z) cov(z,z)|
- -
Note that cov(y,x)=cov(x,y) so the covariance matrix is symmetric.
Also, cov(x,x)=E((X-Mx)^2)=variance(x)
See Correlation
A 'secret key' method of
enciphering a message.
DES is the Data Encryption Standard, an encryption block cipher defined and
endorsed by the U.S. government in 1977 as an official standard; the details
can be found in the latest official FIPS (Federal Information Processing
Standards) publication concerning DES [NIS93b]. It was originally developed at
IBM. DES has been extensively studied since its publication and is the most
well-known and widely used cryptosystem in the world.
DES is a symmetric cryptosystem: when used for communication, both sender and
receiver must know the same secret key, which is used both to encrypt and
decrypt the message. DES can also be used for single-user encryption, such as
to store files on a hard disk in encrypted form. In a multi-user environment,
secure key distribution may be difficult; public-key cryptography provides an
ideal solution to this problem (see RSA Cryptography).
DES has a 64-bit block size and uses a 56-bit key during encryption. It is a
16-round Feistel cipher
and was originally designed for implementation in hardware.
The algorithm is as follows:
Feistel ciphers are a special class of iterated block ciphers where the
ciphertext is calculated from the plaintext by repeated application of the same
transformation or round function.
In a Feistel cipher the text being encrypted is split into two halves (32 bits
each for DES). The round function f is applied to one half using a subkey
(derived from the 56 bit secret key) and the output of f is exclusive-ored with
the other half. The two halves are then swapped. For DES, the function f is a
'permutation' operation where the bits of the block to be coded are shuffled
according to the value of the subkey. Each round follows the same pattern
except for the last round where there is no swap. For DES, there are 16 rounds
(iterations).
Decoding is the same as encoding, using the same key, which is why this is
called a symmetric encryption process.
[Fei73] H. Feistel. Cryptography and Computer Privacy,
Scientific American, May 1973.
Based on Bellman's
principle, is a method to produce the optimal solution for an n step process
based on the optimal solution to an n-1 step process, which commences with the
optimal solution of the first step. (Compare this to proof by induction?) (BTW,
Bellman's principle of optimality should not be confused with Lewis Carrol's
Bellmans principle, "What is said thrice is true." There does however
seem to be some similarity. :-) )
An eigenvector of a matrix
A is a vector e such that eA=Le where L is a scalar. L is known as an
eigenvalue of A corresponding to the eigenvector e.
The gradient of a scalar
field is a vector field of the first derivatives of the scalar function along
each axis. eg
Where i and j denote the
unit vectors along the x and y axes.
A matrix is hermitian if it
is equal to its conjugate transpose. For a real matrix, this means if and only
if (iff) it is symmetric
A Hessian matrix H
of a function is the matrix of its second differentials evaluated at a point,
giving a measure of the curvature at that point.
At a minimum of the
function, H is positive definite and at a maximum, negative definite. It is
indeterminate at a saddle.
If the function is of quadratic form,
f(x)=x'Hx+b'x+c
then the Hessian can be used to map differences in position into differences in
gradient.
If g1 and g2 are the gradients at x1 and x2
respectively, then
g2-g1=H(x2-x1).
The inner product of two
vectors is their product in a given inner product space.
In particular in Euclidean space it is the scalar that results from the sum of
the products of the components of the vector for each axis. e.g.
if A=ax+by+cz
and B=dx+ey+fz
then their inner product
C=A.B=ad+be+cf
A.B can also be calculated as |A||B|cos(g) where g is the
angle between the vectors A and B
(Bold, such as A refers to vector, and x,y and z
are the unit cartesian axes for 3d space)
A hidden Markov model is a
stochastic automaton consisting of a set of states qi each with an associated conditional
probability distribution p(xn|qi) for an output vector xn, and transition probabilities P(qj|qi)
for transitions between states. It is called hidden because the states are not
seen, only the output vectors.
The probability of event A
and event B is expressed as P(A&B).
For independant variables, P(A&B)=P(A)P(B).
eg for a two-up toss, the prob of a head and a tail is 1/2*1/2
P(A&B) is also known as the prob of A 'intersection' B, from the Venn
diagram showing the classes in sample space where B is the event 'at least one
tail' and A is the event 'at least one head'. The intersection of the two
classes is the subclass, 'a head and a tail'.
A Dynamic
Programming method
that produces optimum (in least mean square sense) estimates of the n+1th state
of a linear dynamical system based on the estimate for the nth state.
Reference: Haykin. Adaptive Filter Theory. Prentice Hall. ISBN 0-13-004052-5
025
Also known as Hotelling,
eigenvector, or principal component transform. Often called simply the KL
transform, especially in the image coding field. Based on the statistical
properties of a set of data points, often images. Principal uses are data
compression and image rotation.
Imagine an NxN image transmitted a number of times. Make a set of vectors xi
corresponding to each copy of the image such that xi=[xi1,xi2.......xiM]' where
M=NxN and the xin are the pixels of each copy of the image, de-rasterised. The covariance
matrix of the xi
vectors is Cx, and is an M x M matrix. The average image is mx which has pixel
values averaged over all the xi.
Let ei and Li, for i=1,2...M be the eigenvectors and corresponding eigenvalues of Cx, arranged so that L1 >= L2
>=L3 etc.
An MxM transformation matrix A is given by making the rows of A equal to the
eigenvectors, ei, of Cx. The KL transform of an image x is then simply found by
multiplying the centralised image vector (x-mx) by A to get y=A(x-mx). y is the
KL transform of x. The properties of y are that it is uncorrelated, and that it is possible to
reconstruct x from y using x=A'y+mx.
Consider the transform as an expansion,
y=e0(x-mx)+e1(x-mx)+e2(x-mx)....
then important features of the KL are that the terms of the expansion due to
the eigenvectors are ranked in order of significance and those terms associated
with smaller eigenvalues can be dropped for purposes of compression or
approximation. The terms are orthogonal and also the term associated with the largest
eigenvalue can be considered a rotation in the measurement space that aligns
the new coordinate system with the direction of largest variance, as shown in
the 2D example below, where points in x=[x1,x2] space are the measurements, and
points in y=[y1,y2i] are in the transformed space, in which y1=e0(x) is the
variable with largest variance. Note that y1 and y2 are orthogonal.
See also Singular
Value Decomposition.
Linear programming is constrained
optimisation where
both the function to be optimised and the constraints are linear functions of
the independent variables.
eg: find the maximum of
One method that can be used
for this problem is the Simplex Method.
A process for fitting a
straight line to a set of observations.
e.g. Given a set of N measurements of the color values in images of faces
{R,G,B} find a linear equation R=gG+bB+c that fits the observations best in a
least square error sense. This can be done by writing an equation that
describes a figure of merit for the fit called Chi squared.
To minimise Chi squared, and hence optimise the fit, differentiate Chi squared
wth respect to g,b and c and set the 3 resulting equations to 0.
Rewrite the equations as
SP=R where
and then solve for P and hence the optimum values of g,b and c using
There is more to this, of course. One should also consider the final value of
Chi squared as a measure of how good the parameters g,b and c are, ie how well
the line models the actual observations. If there are values of the variance of
the R G and B measurements made, they can also be used to improve the accuracy
of Chi squared.
Reference: Numerical Recipes in 'C'. 2nd Ed. Cambridge Uni. Press
For certain constrained
optimisation problems, where the gradients of the constraints are independent
at the minimum x0, the problem can be reduced to an unconstrained optimisation
problem by reformulating the problem with the addition of Lagrange multipliers,
which are real numbers, multiplying the constraint functions.
See also Constrained
Optimisation and Linear
Programming
The Laplacian of a point in
a scalar field is the sum of the second derivatives along the coordinate axes. It
is also twice the average curvature at that point. Given the Hessian matrix H for the function, the Laplacian can be calculated as the trace of H.
(The sum along the diagonal)
Two vectors are orthogonal
if their inner product is zero. In Cartesian space this means that they are perpendicular. If
the vectors are of unit length, they are called orthonormal. Orthonormal
vectors can be used to define the basis of a space.
A matrix A is
positive definite if X'AX > 0 for every real nonzero column matrix X.
A formulation of a Linear
Optimisation problem that conforms to a set of rules. This form is suitable for
solution via the Simplex method.
The rules are as follows:
1.
Linear
in the variables and in the objective function. E.g. z=2x2-4x3 is
the objective function. This is the function to optimise. (If the Simplex
method is to be used, the problem needs to be formulated as a maximisation.)
Linear optimisation tasks
not in restricted normal form can be re-written in that form with the introduction
of dummy variables to enforce rules 2 and 3. The objective function must also
be rewritten taking the dummy variables into account. The dummy variables can
be removed at an early stage in the Simplex method.
Reference: Numerical
Recipes in C, Second edition, W. Press, S. Teukolsky, W.Vetterling, B.Flannery
A method of encryption that
relies on the difficulty of factorising large numbers.
Method: Choose two large primes p and q. n=pq is called the modulus. Choose a
number e such that e is less than n and such that e and (p-1)(q-1) have no
common factors except 1. (i.e. e is relatively prime to (p-1)(q-1)) Find d such
that (p-1)(q-1) is a factor of(ed-1)
e is called the public exponent, and d is the private exponent.
(n,e) is the public key and (n,d) is the private key.
The following holds true: For a message m, c=(m^e)modulo n is the cipher and
m=(c^d) modulo n is the recovered message. Anyone who knows the public key
(n,e) can encipher a message, but only the holder of the secret private key
(n,d) can decipher it.
In practise, this method of encryption is often used to encipher the secret key
of a secret key encryption method, such as DES so that it can be transmitted with a DES
encrypted message. Only the holder of the secret key for the RSA encryption
method can recover the secret DES key and hence decode the DES encrypted
message.
RSA encryption and similar public key methods can also be used for authentication.
Given a cipher c=(m^d) modulo n where (n,d) is Fred's private key, then if Fred
sends m and c to George, who has (n,e) which is Fred's public key, George can
calculate (c^e)modulo n which should be equal to m if the message is authentic.
Example, with small primes. (i.e. not secure, just explanatory)
Choose p=5
Choose q=3
=>n=15
Choose e=7 Is less than n and has no common factors with (p-1)(q-1)=4*2=8.
To choose d, if 8 is a factor of 7*d-1 then 7*d-1=s*8 where s is a dummy
integer.
d has to be an integer and so does s.
d=15 =>s= 13 Choose d=15 (THIS IS THE HARD PART!)
So (15,7) is the public key
and (15,15) is the private key.
IMPORTANT NOTE! Some
pathological cases
1)m has to be less than n-1
for this to work. ( % = modulo )
PROOF: (x^y)%z =(x*x*...)%z =(x%z*x%z*....)%z =((x%z)^y)%z because modulo
arithmatic states that 2 numbers are equivalent if they differ by an integer
times the modulus. i.e x=(x+n*z)%z.
Using this, we see that(m^e)%n=((m%n)^e)%n and if m>n-1 then m%n=x (where
n>x ) then during encryption, (m^e)%n=(x^e)%n=y but then during
decryption(y^d)%n=x not m and although x and m might be equivalent under modulo
n this doesn't help recover the encrypted message because we don't know how
many n's to add to x to get m.
2) m=n gives 0 for
everything.
3) m=n-1 -> c=1 if e is
even and c=m if e is odd, which isn't interesting
4) c often = m e.g. m=10
->c=10 for n=15 regardless of the value of e.
TEST: choose m=13 as the
plaintext message.
c=(13^7)%15 = 7 is the encrypted text
m'=(7^15)%15 =13 is the decoded message and = m
A method, due to Dantzig,
of solving linear optimisation problems that have been converted to restricted
normal form. It is
best explained by example.
Problem: Maximise z=2x2-4x3
with x1,x2,x3,x4 >=0 and x1=2-6x2+x3,
x4=8+3x2-4x3
Solution:
1.
Identify
right-hand and left-hand variables and tabulate the problem. Here z ,x1
and x4 are left-hand variables because they appear on the left of
the equations, and x2 and x3 are right hand variables,
because they appear only in the right-hand side of the equations. This particular
example is not a unique arrangement, because the equations could be rerwritten
with x2 as a left-hand variable and x1 as a right-hand variable. (In fact, the
Simplex method relies on this, as can be seen later.)
Table 1
|
X2 |
X3 |
|
z |
0 |
2 |
-4 |
X1 |
2 |
-6 |
1 |
X4 |
8 |
3 |
-4 |
2.
Next,
examine the coefficients in the columns of the z row. If both of the
coefficients for the variables x2 and x3 (the 2 and the
-4 in this example) are negative, any increase in the right-hand variables x2
and x3 will cause a decrease in z, and so the equation for z is an
objective function that is maximised by x2 and x3 being
zero, and the values of x1 and x4 can be found from them,
and so we are finished. In this case, however, we do have a positive entry in
the column belonging to x2, and it has the value 2. So, how much can
we increase x2 by before one of the left hand variables is driven
negative? (They all must be >=0, by the constraints given). The answer is
2/6, given looking down the x2 column until we find a negative
coefficient (-6) which is then called the pivot element, and then by dividing
the value in the constant column, 2, by the magnitude of the pivot element,
which is 6, to give 2/6. This can be explained by considering the equation for
x1, x1=2-6x2+x3 which goes to 0 for
x2=2/6 (remember that x1 must be >=0 for all values of
x2 and x3, so we set x3 to 0 when finding the
maximum value for x2). So x2 can take the maximum value of 2/6. Note
that if no pivot element can be found, i.e. all the entries in the chosen
column are positive, then this means that the equation for z is unbounded
because x2 can increase indefinitely without breaking any
constraints. This also finishes the problem.
3.
Now,
give x2 the value of 2/6, and convert x2 into a left-hand
value by rewriting the equation for x1. You can see that what was
the x1 row and the x2 columns pivot about the -6, when we
write x2=1/3-1/6x1+1/6x3 and substitute the
new value for x2 into the z equation giving z=2(1/3-1/6x1+1/6x3)-4x3=2/3-1/3x1-11/3x3.
Similarly, the equation for x4 changes to x4=9-1/2x1-7/2x3
4.
Next,
repeat step 1 and construct a new table using the new formulation.
|
X1 |
X3 |
|
z |
2/3 |
-1/3 |
-11/3 |
X2 |
1/3 |
-1/6 |
1/6 |
X4 |
9 |
-1/2 |
-7/2 |
5.
Repeat
steps 2 and 3, until a formulation is reached where there is no pivot element,
i.e. all the entries in the z row (ignoring the constant) are negative. Such is
the situation that we have in the Table 2 above. Now the values of x1,x2,x3,x4
that maximize the z function can be read from the table. z has a maximum of 2/3
when x2=1/3, x4=9 and x1 and x3 are
0.
Summing up, the principle
of this technique is to rewrite the equations to arrive at an equation for z
that has negative coefficients for all the right-hand variables, which directly
gives the maximum for z when the right-hand variables are 0, and gives the
values of the remaining, left-hand, variables from their equations.
Note that this method can
be carried out using software, rather than manually. Also note that is not
called the Simplex method because of its comparitive simplicity. The name
arises from the formal meaning of the mathematical term simplex, which
describes the most elementary figure for a given dimensionality.
Reference: Numerical Recipes
in C, Second edition, W. Press, S. Teukolsky, W.Vetterling, B.Flannery
A square matrix that has a
determinant equal to zero. Not having an inverse.
Any of the square roots of
the eigenvalues
of the product of a real matrix A with its transpose A'
The representation of a matrix
A as USU' where U is a unitary matrix, U' is its adjoint and S is a diagonal matrix,
the elements of which are the singular values of A arranged in descending
order. The rows of U are the eigenvectors of AA'. The value of this is
that A can be approximated in a least square error optimal fashion by
zeroing the smaller singular values in S.
Is the expansion of a
function into a series.
where f' denotes first derivative of f(x) with respect to x, f'' the second,
etc. The series may be truncated to form an approximation.
A matrix for which the Hermitian
conjugate is its
inverse.
A real matrix is called unitary if it is the inverse of its transpose, and also
called an orthogonal
matrix, because any pair of distinct rows or columns are orthogonal vectors.
Var(x) is the square of the
standard deviation of a random variable x and is calculated using:
Var(x)=E((x-E(x))^2)
where E(x) is the expected value of x.
Simple variance
The estimator of the variance of a population:
Mx =E(x) is the mean of x. The variance, and hence the standard deviation, can
be calculated on the run, as measurements are made, by accumulating the sum of
the measurements and calculating the mean at every point, then accumulating the
sum of the differences squared between the measurement and the mean then
dividing by n-1.
Back