reduntex.txt in /html/abs
Some notes concerning redundant (hence non-orthogonal) expansions
Hans G. Feichtinger
started Dec. 16th, 1994
Physics has obtained a lot of insight into the behavior of matter by the
observation that macroscopic (i.e. large) pieces of matter can be understood
as a collection of smaller objects (such as the physical atoms), which
have quite a lot of similarity among each other (but there are also many
different ones). For example, the reaction of a body under a field of
forces can be understood quite well as the superposition of the action
of this field on the individual atoms. [ I do not elaborate on the
scientific and philosophical relevance of the - should I say - invention:
of the idea of a physical force, because nobody has ever seen a force
by itself, but always only concluded indirectly that such a "thing" must
exist, because why should a piece of matter otherwise speed up, or change
direction, or just "fall" towards the center of the earth].
There is also the observation that in order to identify what is usually
called the "center of mass" within a physical body (let us think of
some ball, or cone, or cube) one can apply integral calculus, which
is more elegant than summations over an extremely large sum of forces
acting upon single atoms, whose position one does not know anyway.
However, from the mathematical point of view (since Riemannian sums
are actually convergent towards the Riemannian integral, for example)
it what not be contradictive to think that the matter has some kind
of continuity, and to idealize this piece of matter as a "continuous
integral" over an infinite variety of infinitely small "generalized
atoms" which neither have mass nor extension (but as a whole make up
this piece of matter).
When it comes to functions and operators (or linear functionals)
similar ideas can be used: We can see a (for simplicity) continuous
function f as the "list of its values" {f(x),x \in \R},
where for each point on the real line x
we have some function value f(x). Although a function which
takes only non-zero values over finitely many points (or even just
a single one) does have integral zero, this (uncountable) collection
of values HAS a non-zero integral (at least if f is positive and non-zero).
However, the integral (or any other linear functional on a
space of functions, or even a linear operator)
might be understood in the same way as the sum of actions on some
"larger" , "elementary building-blocks" which make up the function
f altogether. For example, if f happens to be a polynomial function
we have the "natural" expression for f(x) as a sum over the monomial
functions x > x^p, with p = 0,1,2... k (the order of the polynomial f).
In other cases trigonometric functions or other systems of classical
functions might be more natural (for interpretation). However, one
must say that in principle (from a logical point of view) there is
no real argument for one or the other "representation" or "expansion"
of a given function (signal in the engineering termology), it is rather
a question of convenience (in practical terms), or of the underlying
modelling, which makes one description more likely than another one.
The questions, what the "true nature" of a given function is (in
other word, whether it is "truly" a sum over this or another collection
of building blocks) belongs rather to the field of philosophy and
will not be discussed here.
Let us come back to the idea of matter being a superposition of, now what
should we say? True atoms (with "nothing in between"), or a space filling
collection of atoms (like in a crystallographic packing), or a continuity
of "phantomic" atoms? Each of them has its virtue, but in some cases
it is even very reasonable to think of atoms not something FIXED in space,
but moving around, with some probability to be at a certain position
more or less frequently, with maybe a center for the locations of
high probability. In that case there is also NO reason, that there
are places in space (actually all of 3D space will have this property
for a natural modelling) where SEVERAL atoms may be found, with some
probabilty (the total probability over those atoms being of course not
larger than one). What does this "physical" model tell us about the
decomposition of functions. I would like to argue in the following way:
Whereas orthogonal systems (once fixed) have all the beauty of some
crystallographic lattice, it might be the case, that in order to expand
a given signal a certain choice of a (complete) orthonormal system might
be relevant, whereas in the case of the atomic decomposition (as we probably
think of our physical world) one has less strict order, but still a unique
description (we speak about the atoms which make up some piece of matter).
In mathematical terms one might compare this with a so-called Riesz basis,
which is a collection of linear independent atoms in a Banach space
(complete normed space), which allows to represent any element in that
linear space as a series ("infinite sum", i.e. the limit of finite
partial sums) in a uniquely defined way. Observe, that such a series
expansion is a much stronger requirement than just the approximation
of a function f by finite linear combination of "atoms" (or building
blocks). In the first case (series expansion) one is only allow
to add more and more terms, and essentially (in the long run) has to
take precautions that by just adding more and more terms one comes
closer and closer to the function f (to be expanded), WITHOUT ever
getting far away from it late in the process of summation. In the
second case it is enough that for a pre-given degree of approximation \eps
one can find some finite linear combination of atoms such that one
can achieve \eps-approximation. However, it is allowed to take a
completely NEW set of building blocks whenever one has to get to
better approximation, and does NOT have to use the previously used
atoms. In real life this difference can be explained by the difference
in planning for an large building. Whereas in the second situation
it means that in order to "enlarge" some big house one is allowed
to tier it down and destruct it and just build a new one of appropriate
size, the first situation corresponds more to the situation, maybe
given to an architect, that (at least for some while) he should take
precautions that the given house can be expanded more and more
(WITHOUT destruction going on) to fulfill better and better the
needs for more and more space (in the sense of office space, if this
house might be a large office building).
Picking up our main line of ideas we compare now the previously
explained idea of "particles having some probability of being located
here and there" with a somewhat "overlapping", or in mathematical
terms, linear DEPENDENT family of atoms. In that case a given function
may have MANY different representations (as a series), but one cannot
say that any of those expansions is the "true" or the "real" one (if
such an argument does not come from some modelling process).
A point in case would be the description of a little model house
built by a kid from LEGO "bricks", but everybody knows that there
are MANY different ways of building the "same" house.
We may take this analogy even further and argue in the following way:
Thinking of LEGO bricks it is obvious that the way how a given
"block-house" can be built, and how many "stones" we need will depend
very much on the collection of available atoms. If the collection is
large the advantage will be that for certain situations one needs only
VERY FEW terms. If the house has the same window-front on all four
sides one might just use four copies of the same side-wall plus a roof
in order to complete the house. From the point of view of variety one
then has to make a choice. Either one restricts the possibilities
(everybody has experienced as a child that certain things cannot
be built, using only a limited collection of LEGO bricks, one woule
like to have more and different ones), or one enlarges the number and
variety of possible bricks. However there the danger is immanent that
the number of different types of blocks will increase quickly beyond
any reasonable size, it is hard to keep track of them, and in a given
situation it might be actually hard to find a good collection of bricks
fitting "best" to a given plan, because with the number of building
blocks the number of possible variations (in building the house) might
increase and finding even only a "near optimal" choice among those
many possibilities might become an intractible combinatorial optimization
problem. So at the end (taking the average over a large number of
completely different houses to be built) it might be really easier
to just go back to the old system of using the small standard bricks,
which have all more or less equal size, and using many of them, but
in a straigforward way in order to raise the house according to the
plans of our architect.
Going back to our mathematical problem, we would like to match the
given function (our plan) using a given set of "atoms" (building blocks).
Then, in order to do this efficiently, one has to say what that means:
Do we want to have an efficient way of choosing the atoms AND then
the appropriate coefficents, are the atoms given and do we want to
minimize the total (square) sum of the coefficients needed, do we
hope that as many functions (e.g. those in a large subspace of our
function/signal space) can be obtained with a rather samll number
of non-zero terms? Is speed of the calculation the optimization
criterion, will those coefficients and maybe the parameters of
the corresponding atoms be stored somewhere, or transmitted. In that
case the speed of synthesis (the availability of a fast synthesis
algorithm) might be a matter of concern. There are many real world
situations where really such differences occur. In one case one
tries to transmit some image from a satellite down to earth. The
computer on the satellite has limited capacity, but the information
channel (connecting it with the station on the earth) might be even
more limited. On the other hand the computers down on the earth
are much more powerfull, so synthesis might be much less of a problem
in this case. Conversely, in the case of digital broad-casting the
task is also to bring high-level information (such as CD-quality
sound) to the end-user. In that case one certainly will have to
design the overall system in such a way that allows for cheap
receivers (you may think of the hope for allowing a single
signal processor card do the synthesis work, even for stereo
signals received), whereas in this case the sender's side can
be much more expensive. Indeed, there are only comparatively few
(and well financed) radio stations, but millions of customers which
might not like to pay a lot of money for very expensive receivers.
Finally, in some other systems only the amount of information stored
is really relevant (e.g. for image data base systems).
There is a further aspect, that might go along with the choice
of a concrete system of building blocks: interpretation of the
given representation or the coefficients used. Here, for the case
of image analysis or digital signal processing the terms "multiresolution"
and time-frequency localization are used in many differnent ways.
One speeks about coarse and fine details of some image, or low and
high frequencies in a given sound signal (in some melody). Certainly,
schemes of coefficients which have such interpretations (such as
wavelet or Gabor- expansions) have advantages for the practical use
over others, because in that case coefficients are "not just numbers"
but carry extra information which may be very useful for the "user".
..redundancy...
relevance for pattern recognition tasks..