MiBlog

Random Process Learning Notes (1)


中文版:随机过程 学习笔记(一)

Origin & Basic Information

just for fun…

Course Video: https://www.bilibili.com/video/BV1wj411k7Tj

Instructor & Time: Zhang Hao (Spring 2023)

Contents:

Annotation Key:

Note Content & References:

Lesson 1: Introduction

In the first lecture, the instructor clearly stated several key organizational issues:

Besides this: The instructor emphasized the importance of writing, especially drafting on paper when understanding abstract concepts and taking notes when learning new knowledge. This seems to echo Feynman’s learning method, i.e., no input without output. The instructor set higher standards, namely analytical solutions and visualizations. He considered these two points as the basis for our understanding and intuition about a knowledge point or phenomenon. Adding restrictive application conditions for recommendation letters in this section may further enhance organizational transparency. A simple example is: “After the results come out, I will announce a list of students. If you are not on this list, it doesn’t mean I will never write you a recommendation letter, just that the possibility decreases.”

For my use: The instructor naturally interspersed the course organization content with a lively explanation of the honesty code and the teacher’s values - that is, we should value students as researchers, emphasizing strengths rather than weaknesses.

Lesson 2: Correlation

What is a Random Process?

What is a Random Variable?

In probability theory, a probability space consists of a sample space, σ-algebra, and probability: $(\Omega, \mathcal F, \text{P})$. A random variable is a function that projects from the sample space to the real number space, denoted as $\hat{x}: \Omega \rightarrow \R$. Thus, the random function itself has no randomness. The source of randomness comes from the elements in the sample space and the sampling points - their occurrence in statistical experiments is uncertain.

We can characterize a random variable using its distribution $F_{\hat{x}}(x)$, but more often we use its density $f_{\hat{x}}$, which is the result of differentiating the distribution function. This is because the probability of a random variable in set $A$ can be found by integrating over set $A$.

$$F_{\hat{x}}(x)=P(\hat{x} \subseteq x) \tag {1}$$ $$f_{\hat{x}}=\frac{\text d}{\text dx}F_{\hat x}(x) \tag {2}$$ $$P(\hat{x}\in A)=\int_A f_{\hat{x}}(x)\text dx \tag {3}$$

Additionally, we can also roughly characterize a random variable by its mean and variance.

$$\mathbb{E}(\hat{x})=\int^{+\infty}_{-\infty} x f_{\hat{x}}(x)\text dx \tag{4}$$ $$ \text {Var} (\hat{x}) = E(\hat{x} - E(\hat{x}))^2 \tag{5}$$

Besides this: I specifically checked the definition of random variables when studying probabilistic machine learning and made detailed notes. I will upload them when I have time and leave a link here for reference. (TO-DO)

For my use: The instructor’s logic was very clear when introducing the basic structure, starting from the overall picture and the basics. He only left symbols on the board without any extra text, accompanied by explanations. For beginners, this is very easy to understand and record.


A random process is a set of random variables. More precisely, it is a set of random variables that depend on the real parameter t. When these random variables are sorted in time order, they are usually called a random process. If these random variables are sorted in space, not time, they are called Random Fields.

$$ X(t) = \hat{x}{1}, \hat{x}{2}, \hat{x}{3} \dots \tag{Random Process}$$

In general, we denote a random process on continuous time t as $X(t)$ in the formula above. If it’s a discrete-time random process, it’s denoted as $X_n$. Here t and n belong to the set of real numbers and natural numbers, respectively. A random process can also be seen as a two-variable function $X(w,t)$, where w is an element of the sample space and t is time.

The focus of studying a random process (in this course) is:

Besides this: A martingale (鞅) is a sequence of random variables. Martingales have specific properties, namely, given past information, the expected value of the future equals the current value. All martingales are random processes, but not all random processes are martingales.

We can express the core property of a martingale as $\mathbb{E} [X_t|\mathcal F_s]=X_s$. Here, s and t are both times, with s<t. F represents all the information up to time s. When the equality in the equation becomes ≤, we call it a supermartingale; when it becomes ≥, we call it a submartingale.

In essence, when the conditional expectation of the next moment is at least greater than the current value, it is a submartingale, i.e., the expected trend is rising.

Correlation: Binary Relation

To determine the property of correlation, there are three results:

By definition, if the joint distribution of two variables equals the product of their densities, they are independent. In layman’s terms, no matter what value variable X takes in its distribution, the distribution of variable Y remains the same; they do not influence each other.

However, being non-independent and correlated are two different definitions. In some cases, two variables are correlated, but their correlation coefficient is zero, which we usually call non-independent; otherwise, it’s correlated.

Typically, two variables with a strong correlation will show a clear same trend in visualization. That is, as the distribution of x moves, the distribution of y moves in a fixed direction. If this direction is opposite, their correlation coefficient is negative.

Besides this: Whether there’s correlation or not should be an empirical conclusion, but probability theory abstractly characterizes it. Note that this abstract independence still requires empirical comparison, or more direct assumptions, to conclude. The random variable for the event “It rained on September 25, 2023” (0 for no rain, 1 for rain) and…

To determine the relationship between two random variables, we need to determine a metric to measure the distance between them, which is the mean square.

$$\mathbb E (X-Y)^2 = \mathbb EX^2+\mathbb EY^2 - 2\mathbb E (XY) \tag{Mean Square}$$

The third term plays a role here, which is the expectation of the product of X and Y, also known as the correlation of X and Y. When it’s zero, the two can be said to be uncorrelated. The expectation is also often represented as $\mathbb E(X-\mathbb EX)(Y-\mathbb EY) = \mathbb E(XY)-\mathbb EX\mathbb EY$. Since $\mathbb EX \mathbb EY$ is a definite value, these two definitions mean the same thing.

Correlation Coefficient

The correlation coefficient is the correlation divided by the square root of the product of their squared expectations. The correlation coefficient is less than or equal to 1. A simple transformation of this equation leads to the Cauchy-Schwarz inequality.

$$|\frac{\mathbb{E}(XY)}{\sqrt{\mathbb{E}X^2\mathbb{E}Y^2}}|\leq 1$$ $$|\mathbb{E}(XY)| \leq \mathbb{E}X^2\mathbb{E}Y^2 \tag{Cauchy–Schwarz Inequality}$$

For the proof of the Cauchy-Schwarz inequality, we can view correlation as an inner product. Since correlation itself satisfies the properties of the inner product (the inner product of itself is non-negative; symmetry; bilinearity), proving that the inner product meets this inequality proves that the correlation meets it. So, the conclusion is: Correlation itself is an inner product.

Seeing correlation as an inner product can naturally lead us to consider from a geometric perspective.

$$\angle (X,Y)=\arccos(\frac{}{\sqrt{}})=\arccos(\frac{\mathbb{E}(XY)}{\sqrt{\mathbb{E}X^2\mathbb{E}Y^2}})$$

Approximation

Consider a situation where we want random variable X to approximate random variable Y.

In this case, intuitively, we can attempt linear estimation. Using the Least Squares Method (LSM) to solve for the value of the parameter $\alpha$ when the expectation of $\alpha X - Y$ is minimized. The solution is:

$$\alpha = \frac{\mathbb E{(XY)}}{\mathbb E{(X^2)}}$$

Geometrically, using the conclusion derived earlier - correlation is also an inner product - the above process of finding the parameter $\alpha$ can be equated to finding the projection of vector Y in the direction of vector X.

Studying the correlation of the values of two random variables in a random process is equivalent to studying the association of this random process at different times. For example, studying the relationship between the random variables $X(t)$ and $X(s)$ of stock prices (this random process) at time $t$ and time $s$, these two random variables represent the future at some time and the current stock price, respectively.

From this, we can introduce a new definition. The function used to describe this relationship is called the correlation function $R_X(t,s)$ of the (continuous) random process $X(t)$.

If the correlation function remains unchanged at any time, it is called wide-sense stationary. In other words, the relative time (time difference) between two random variables is the only parameter that determines their relationship. In fact, wide-sense stationary also has a condition that needs to be satisfied, that is, the expectation of this process at any time is a constant $\mathbb{E}(X(t)) = m(t) = m$, but due to the characteristics of the random process, this requirement is naturally satisfied.

$$R_X(t + T,s + T) = R_X(t,s)$$

This can help us reduce the two-variable function $R_X(t,s)$ to a one-variable function $R_X(\tau)$, where $\tau = t - s$.