When ML meets the Game Theory

Posted on September 6, 2021 by What's up?

I had always some loose thought in my mind which I’m eager but not able to systemically answer, which is what problem could/couldn’t be well solved by model machine learning/deep learning etc methodologies. In particular, I want to claim (loosely) that even if one can find the most powerful machine learning, it won’t be useful in predicting the financial or economics behavior in the world because it’s essentially ruled by game theory/mean field game theory, i.e., the outcome of the result really depends on the other players in the game so that no model could ‘predict’ it.

Admittedly, one can always give some frameworks to claim the effectiveness of ML in financial world, however, I’m eager to see very simple naive models to formalize this problem (either support or against my claim) in an elegant way.

Protected: RL

Posted on September 6, 2021 by What's up?

Invariant of Models

Posted on September 6, 2021 by What's up?

I had been always been intrigued and interested by the concept of invariant variable. Maybe the first impression about this concept was started from the a mathematician Paul Gordan who was known as the king of invariant theory. Later on, I had been introduced more invariant examples, one funny example in my impression was from Prof Xiangjun Wang, to prove there must be a swirl on any head with hair (one application of Brouwer fixed-point theory).

However, I had a belief/intuition that the essence of any models in the world is to understand and find the invariant of the problem. Though the idea here is very abstract and aboard, but let me keep this post here to be finished gradually going forward.

Firstly, let’s focus on the predictive regression model. For example, for any ML model to be deployed, people tend to split the data into training set and validation set which are randomly split, why it’s reasonable to do this? The underlying principle is that data distribution is invariant (i.e., $y\sim f(X)$ ) under the random splitting (this can be proved below). Next, when new test data is introduced, the reason we’re able to use such a data is based on the assumption that data distribution under the given won’t change.

Another simple simple is linear regression $y=a x + b +\sigma \epsilon$ , as one can see that the essence of this model’s prediction power comes from the invariant quantity $\sigma$ , it has prediction power only if $\sigma$ is unchanged. For more general model, the essence of such a model’s prediction power comes from the invariance assumption of the model parameters.

However, should people worry about Legitimacy of this assumption in any of the scenarios?

to be continued…….

Linear Algebra exercises.

Posted on July 30, 2020 by What's up?

What’s eigenvalues of a $n\times n$ matrix $A=\begin{pmatrix} 1 & c & c & \dots & c\\ c & 1 & c & \dots & c\\ c & \vdots &\vdots && \vdots \\ \vdots &\vdots & & 1&c\\ c&c&c\dots&c&1\end{pmatrix}$ ?

Solution 1: Notice that $A=c J+(1-c)I$ , where $I,J$ are identity and ones matrix respectively. Assume $v$ is an eigen-vector associated with an eigen-value $\lambda$ , then

$\begin{aligned} &&cJv+(1-c)Iv=\lambda v\\ &\implies& c|v|\overset{\rightarrow}{1}=(\lambda+c-1) v. \end{aligned}$

Therefore, $v=\overset{\rightarrow}{1}$ , when $c|v|=(\lambda+c-1)$ , i.e., $\lambda_1=1+(n-1)c$ .

Otherwise, $|v|=0$ when $\lambda_{2,\dots,n}=1-c$ , this is a $n-1$ dimension subspace $\text{span}\{\begin{pmatrix} 1\\,-1\\, 0\\, \vdots,\\0\end{pmatrix}, \begin{pmatrix} 1\\,0\\, -1\\, \vdots,\\0\end{pmatrix}, \dots, \begin{pmatrix} 1\\,0\\, 0\\, \vdots,\\-1\end{pmatrix}\}$ .

Solution 2: Matrix Determinant lemma (wikipedia: https://en.wikipedia.org/wiki/Matrix_determinant_lemma#Proof)

Statement: $\text{det} (A+uv^T)=(1+v^TA^{-1}u)\text{det}(A)$ , where $A$ is an invertible matrix, $u,v\in \mathbb{R}^{n,1}$ .

Let’s denote $A=(1-c-\lambda)I_{n,n}$ , and $u,v=\sqrt c \overset{\rightarrow}{1}_{n,1}$ , we can solve the eigenvalue from the corresponding polynomial

$\begin{aligned} (1+\frac{c}{1-c-\lambda}\cdot n)(1-c-\lambda)^n=(1-\lambda+(n-1)c)(1-c-\lambda)^{n-1} \end{aligned}$ .

Therefore, the eigenvalues $\lambda_1=1+(n-1)c, \lambda_{2,\dots,n}=1-c$ .

Monte Hall problem revisit

Posted on July 26, 2020 by What's up?

This post is to write the Bayesian proof of the monte hall goat problem. Below is the statement of the problem from wikipedia:

Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?

Analysis: the answer to this question is depending on whether you’re a Bayesian or a frequentist. Although the former is the approach through which the answer is widely known (with hidden assumptions such as the host is randomly choosing between the rest doors if your first guess is correct).

Bayesian:

Denote ${C=i}$ as event that the car is behind the door i, ${I=i}$ as event of your first guess, ${H=i}$ as event that the host opened the door i with a goat behind. Hence,

$\begin{aligned} P(C=2|I=1,H=3) &=\frac{P(I=1,H=3|C=2)*P(C=2)}{P(I=1,H=3|C=1)*P(C=1)+P(I=1,H=3|C=2)*P(C=2)+P(I=1,H=3|C=3)*P(C=3)}\\ &=\frac{1*\frac{1}{3}}{\frac{1}{2}*\frac{1}{3}+1*\frac{1}{3}+0*\frac{1}{3}}\\ &=2/3 \end{aligned}$

Meanwhile , $P(C=1|I=1,H=3)=\frac{1}{3}$ . Therefore, the correct choice is to switch your decision from door 1 to door 2.

Frequentist:

However, as a frequentist, there is no prior distribution. This approach would draw the inference as below:

Assume, $H_0: C=1, H_a: C=2$ , then

$P_{H_0}(H=3)=\frac{1}{2}$ is a big number, therefore, we’re not able to reject $H_0$ . Vice Versa for H_a.

Unfortunately, frequentist is not able to answer the question of posterior probability so it’s not able to answer the question whether switch or not.

Find the LCA (lowest common ancestor) of two nodes on a binary tree

Posted on July 26, 2016 by What's up?

http://www.geeksforgeeks.org/lowest-common-ancestor-binary-tree-set-1/

A good explanation of mean field game

Posted on April 11, 2015 by What's up?

The New Big Fish Called Mean-Field Game Theory

Reproducing Kernel Hilbert Space

Posted on April 8, 2015 by What's up?

Since writing about the simulation of fractional Brownian motion, I’ll spend an hour to write RKHS.

Definition: A Hilbert space $\mathcal{H}$ is a RKHS if $|\mathcal{F}_t[f]|=|f(t)|\leq M\|f\|_{\mathcal{H}}, \forall f\in \mathcal{H}$

Theorem: If $\mathcal{H}$ is a RKHS, then for each $t \in X$ there exists a function $K_t \in \mathcal{H}$ (called the representer of $t$ ) with the reproducing property

$\mathcal{F}_t[f]=<K_t,f>_{\mathcal{H}}=f(t), \forall f\in\mathcal{H}$

Therefore, $K_t(t^{'})=<K_t,K_{t^{'}}>_{\mathcal{H}}$ .

Definition: $K: X\times X\rightarrow\mathbb{R}$ is a reproducing kernel if it’s symmetric and positive definite.

Theorem: A RKHS defines a corresponding reproducing kernel. Conversely, a reproducing kernel defines a unique RKHS.

Once we have the kernels, if $f(\cdot)=\sum \alpha_i K(t_i,\cdot), g(\cdot)=\sum \beta_i K(t^{'}_i,\cdot)$ ,

then $<f,g>_{\mathcal{H}}=\sum\sum \alpha_i \beta_j K(t_i, t^{'}_j)$

When it comes to the fractional Brownian motion

Theorem: For fBm RKHS $K(x,x^{'})=R(x,x^{'})$ is symmetric and positive definite, , there exists $K^H(x,\cdot)$ s.t $K(x,x^{'})=\int K^H(x,y)K^H(x^{'},y)dy$

Take Wiener integral as an example:

$\begin{pmatrix} \textit{dual space}&\textit{inner product}&E&\leftrightarrow&<\cdot,\cdot>_{\mathcal{H}}&\leftrightarrow&<\cdot,\cdot>_{L^{2}}\\ \delta_{t}(\cdot)& &B^H_t&\leftrightarrow&R(\cdot,t)&\leftrightarrow&K^H(t,\cdot)\\ f(\cdot)& &\textit{"}\int_0^T f(t)B^H_tdt\textit{"}&\leftrightarrow&K^H\circ(K^H)^{*}f&\leftrightarrow&(K^H)^{*}f \end{pmatrix}$

where $K^H\circ f(t)=\int_0^T K^H(t,s)f(s)ds$ ,

$(K^H)^{*}\circ f(t)=\int_0^T K^H(s,t)f(s)ds$ and

$K^H\circ(K^H)^{*}f(t)=\int_0^T R(t,s)f(s)ds$

For example, when $H=\frac{1}{2}$ , $K^H(t,s)=1_{[0,t]}(s),$ , we can check

$E(B_tB_s)=<t\wedge\cdot, s\wedge\cdot>_{\mathcal{H}}=<1_{[0,t]}(\cdot),1_{[0,s]}(\cdot)>_{L^2}$

Claim: By previous theorem, the reproducing kernel $R(t,s)$ uniquely defines a RKHS $L(R(t,\cdot))$ with the inner product $\mathcal{H}$

how to simulate a fractional Brownian motion?

Posted on April 7, 2015 by What's up?

A fractional Brownian motion $B^H_t$ is a Gaussian process satisfying

a) $B^H_0=0$

b) $B^H_t$ is Gaussian

c) $E(B^H_tB^H_s)=R(t,s)=\frac{1}{2}(t^{2H}+s^{2H}-|t-s|^{2H})$

As for simulation, we just need to simulate the correlated Gaussian $(\Delta B^H_{t_1},\dots,\Delta B^H_{t_n})\sim N(0,\Sigma)$ , where

$t_i=i*T/n=i \Delta t$ ,

$\Delta B^H_{t_i}=B^H_{t_i}-B^H_{t_{i-1}}, i=1,\dots,n$ ,

$\Sigma_{i,j}=E(B^H_{t_i}B^H_{t_j})=\frac{\Delta t^{2H}}{2}((i-j+1)^{2H}+(i-j-1)^{2H}-2(i-j)^{2H})$

Now the question is just simulating $L'\cdot N(0,I_{n\times n})$ where $L$ is the Cholesky decomposition of $\Sigma$ , i.e ( $\Sigma=L'L$ )

Method 1: find $\Sigma$ first and then use Cholesky decomposition to compute $L$ . This works theoretically, but not practically because it’s too time and space costy to record $\Sigma$ , not to mention decomposition step.

Method 2:

We denote $\lambda(k)=\frac{(\Delta t)^{2H}}{2}((k+1)^{2H}+(k-1)^{2H}-2k^{2H})=\Sigma_{i,i+k}$

Denote a circulant matrix

$C=\begin{pmatrix}\lambda(0)&\lambda(1)&\lambda(2)&\dots&\lambda(n-1)&\lambda(n-2)&\dots&\lambda(1)\\ \lambda(1)&\lambda(0)&\lambda(1)&\dots&\lambda(n-2)&\lambda(n-1)&\dots&\lambda(2)\\ \lambda(2)&\lambda(1)&\lambda(0)&\dots&\lambda(n-3)&\lambda(n-2)&\dots&\lambda(3)\\ \vdots&\vdots&\vdots&\ddots&\dots&\dots&\dots&\dots\\ \lambda(n-1)&\lambda(n-2)&\lambda(n-3)&\dots&\lambda(0)&\lambda(1)&\dots&\lambda(n-2)\\ \dots \end{pmatrix}_{2n-2\times 2n-2}$

of which $\Sigma$ is the left up $n\times n$ block matrix.

The idea is, suppose $C=DD^*=(D_1+iD_2)*(D_1^T-i D_2^T)$ where $D_1, D_2 \in \mathbb{R}^{2n-2\times 2n-2}$ , now if we generate $i.i.d \quad \epsilon^1,\epsilon^2\sim N(0,I_{2n-2,2n-2})$ ,

We can conclude that

denote $X_1+iX_2=D*(\epsilon^1+i\epsilon^2)=D_1\epsilon^1-D_2\epsilon^2+i(D_2\epsilon^1+D_1\epsilon^2)\in \mathbb{R}^{2n-2,1}$ , then $X_1[1:n]\sim N(0,\Sigma)$ and $X_2[1:n]\sim N(0,\Sigma)$

let Q is the unitary matrix defined by

$Q_{j,k}=(2n-2)^{-1/2}exp(-2i\pi\frac{jk}{2n-2})$ and thus $D=Q*\Lambda^{\frac{1}{2}}$

so how to achieve $\Lambda$ ? The property of circulant matrix shows that

$\lambda_k=\sum_{j=1}^{2n-2}c_j exp(-2i\frac{jk}{2n-2})$ , i.e

$\Lambda=Q*\begin{pmatrix}\lambda(0)&\lambda(1)&\lambda(2)&\dots&\lambda(n-1)&\lambda(n-2)&\dots&\lambda(1)\\ \end{pmatrix}^{'}$ that can be achieved by FFT.

Next, according to the idea stated before, $FFT(diag(\Lambda^{\frac{1}{2}})*(\epsilon^1+i\epsilon^2))=Q*diag(\Lambda^{\frac{1}{2}})*(\epsilon^1+i\epsilon^2)=X_1+iX_2$

And the first n component of $X_1$ will be $N(0,\Sigma)$

Reference : Coeurjolly, JF. Simulation and identification of the fractional Brownian motion

Matlab code:

function []=fBM(n,H)
r=nan(n+1,1); r(1) = 1;
for k=1:n
r(k+1) = 0.5*((k+1)^(2*H) – 2*k^(2*H) + (k-1)^(2*H));
end
r=[r; r(end-1:-1:2)]; % first rwo of circulant matrix
lambda=real(fft(r))/(2*n); % eigenvalues
W=fft(sqrt(lambda).*complex(randn(2*n,1),randn(2*n,1)));
W = n^(-H)*cumsum(real(W(1:n+1))); % rescale
plot((0:n)/n,W);
%Terminal=W(n+1);%terminal value of fBM
end

Somewhere

Posted on February 17, 2015 by What's up?

VictorX

Just another WordPress.com site

When ML meets the Game Theory

Protected: RL

Invariant of Models

Linear Algebra exercises.

Monte Hall problem revisit

Find the LCA (lowest common ancestor) of two nodes on a binary tree

A good explanation of mean field game

Reproducing Kernel Hilbert Space

how to simulate a fractional Brownian motion?

Somewhere