Trend analysis
Our human
brain learns by pattern recognition or trend analysis. A pattern is a set of
rules or a set that repeat itself on all occurrences. Let's say that we study
the occurrence of an event which is random in nature, with each outcome baring
specific probability weight. This probability weight is supposed to be our
"pattern" or "repeating rule" in the trend that is
observed. A random experiment (of observing the occurrence of a likely event
among the total number of occurrences) would be enough to provide the required
pattern in most cases where the data in our hand is limited to only the
occurrence of the event. But if we extend our available resources to many other
variable parameters that change in accordance with the change in the outcome of
the event being studied,then we may extend the probability density function
from a leaner constant to a relation involving the new parameter.
A general
probability equation is just a constant value ($C_x$) which gives the
occurrence of the incident $x$ given that $x\in U$ where, $U$ is the set
containing all the possible occurrences. Even subjects like artificial
intelligence depend upon pattern matching or trend analysis for the evaluation.
Probabilistic analysis is at the heart of all discoveries; The above defined
constant $C_x$ can be extended to an expression which involves more than one
parameters, i.e. let there be a probabilistic relation $P(x, p_1, p_2,...,p_n)$
which includes the parameter $C_x$ but involves in forming a $\mathbb{R} \to \mathbb{R}$
relation. In this article I am explicitly going to prove that any
differentiable and continues function defined in the real numbers that's used
to predict an attribute of a system can be said to be a leaner polynomial
relation of probability density function that expresses the probability of a
specific attribute of that system. If we define the function to be continues
and differentiable in the region of real numbers, then the function $f(x)$ can
be defined as:
$f(x) = ap(x)+b,
a,b \in \mathbb{R}$. We know that $p(x)$ takes values in the region $[0,1]$
which is a small subset of the set of real numbers, but on setting an
infinitely large value for $a$ and define the function $p(x)$ such that its
values are infinitesimally small compared to $a$, we can obtain any real
relation we would like to form (including periodic, non-periodic, exponential
relations). But that's not enough; we haven't yet defined the actual attribute
expressed by a probabilistic function defined like that. Let there be a machine
$S$ defined such that, each time it gets an input, it increments the weight
field associated with that input (assuming that each input field is associated
with one weight field). Now assume that a hypothetical user imputed the data
set ( giving the more favored values as inputs and the less favored inputs for
a limited number of times; carrying over this process for infinite times,
thereby inputting a value $x\in\mathbb{R}$ at least once) trains the system to
respond specifically for the inputs gained.The weight values are assumed to be
the output values.
So, the function
which maps the inputs with its associated weight values forms the relative
probabilistic distribution of the likelihood of a particular value in
$\mathbb{R}$ to be imputed at a randomly chosen time instant during the
training process. This demands that each input to be included in the function
is to be inputted for a finite number of times. For any function $f(x)$, $p(x)
=\lim_{a\to\infty}
\frac{(f(x) - c)}{a}$. Also, the other definition that can be added to this is,
\frac{(f(x) - c)}{a}$. Also, the other definition that can be added to this is,
$\int_{0}^{\infty}p(x)
dx = 1$. This clearly shows that $p(x)$ depends on the number of times a
particular variable gets inputted to the system through the infinite number of
attempts. We may use the above analysis methods in commonly accepted and well
known trend analysis algorithms used by computers (and human) in analysing
observable trends in a changing system, which changes w.r.t a specific
parameter, that is generally time. It is clear that, a trend analysis has
variable parameters and one or more constant parameters, which are generally
the rules that are employed in extracting specific parameters from the
observable system.
Analysis based on series of recorded
numbers
Let’s consider
that a series of numbers are inputted to the analysing system. The system’s
motive is to find the appropriate number that satisfies the recorded pattern.
One way to correlate a set of $n$ abstract numbers is, by relating them to a
general polynomial expression of order $n$ with unknown coefficients, substituting
each value in the arithmetically progressive series $k_1, k_2, …, k_n$ and
solving all the obtained $n$ equations to arrive at unique values for each of
the unknown coefficients. But this method is straight forward and of little
value to us on a large scale and so necessity of more rigorous methods to correlate
the observable pattern arises. Because our above method shows that a set of $n$
numbers can be related in infinite possible ways, this generalises the above
method of pattern matching. So we cannot exactly define what kind of pattern
should be looked for.
So there are
numerous kinds of patterns we can observe from a set of numbers. Let’s say that
we wish to observe the pattern based on the rate of change of the value w.r.t.
its preceding value. So if we have a series $k_1, k_2, …, k_n$ then we will
have another difference series $\Delta k_1, \Delta k_2, … \Delta k_{n-1}$. We
can say that, $\Delta k_1 = k_2 – k_1$ and $\Delta k_2 = k_3 – k_2$ and so on. If
the values entered to the system is actually a polynomial series, then the
exact succeeding value can be predicted by this “difference method”. From the
series $S$ defined as $S = k_1, k_2, …, k_n$ we can define another series $S’$
as, $S’ = \Delta k_1, \Delta k_2, …, \Delta k_{n-1} $.
Observing closely,
we can say that the size of the series $S’$ is one less than the size of the
series $S$. This goes on, as we keep using the difference method in obtaining $S’’,
S’’’, … $ (i.e., $S’’$ is defined as $(S’)’$). At one point, the series becomes
unity (with only one term in it). We may represent that final term as $S^n$
(for simplicity, we express $S’’$ as $S^2$ and so on). It’s easy to show that a
series generated by a polynomial of degree $n$ by substituting integral values
in increasing order, ends with exactly one term after $n$ difference
operations. Let the series $S$ represent $k$ terms that are generated by a
polynomial $f$ of degree $n$ ($k>n$). So applying the difference operation after
choosing $n$ terms consecutively from the $k$ terms and on applying the
difference operation successively for $n$ times we get $S^n$ which contains
just one term.
This trend
analysis will again be of practical use only if the given series is defined to
be generated by a polynomial, and existence of terms that are wrong in the
series causes it to deviate completely from the series defined by the polynomial.
Efficiency of algorithms in solving these kind of trend analysis problems greatly
depends upon the influence of “noise” input or input that does not fit into the
actual trend that’s being analysed. Many efficient methods ignore such noise
values to a great precision.
copyright © 2015 K
Sreram, all rights reserved.