156 lines
7.7 KiB
TeX
156 lines
7.7 KiB
TeX
\documentclass{article}
|
|
\title{Probability of points being ``positive linear optimal'' in high dimensionality}
|
|
\author{Ziao Wang}
|
|
\date{}
|
|
\usepackage{indentfirst}
|
|
\usepackage{amsmath,amssymb}
|
|
\usepackage[margin=1in]{geometry}
|
|
\usepackage{hyperref}
|
|
\usepackage{graphicx}
|
|
\graphicspath{.}
|
|
\usepackage{listings}
|
|
% Optional: Global settings for Python listings
|
|
\lstset{
|
|
language=Python,
|
|
basicstyle=\ttfamily\small, % Monospaced font, small size
|
|
showstringspaces=false, % Don't show spaces in strings
|
|
frame=single, % Add a frame around the listing
|
|
breaklines=true % Allow line breaking
|
|
}
|
|
|
|
\begin{document}
|
|
\maketitle
|
|
|
|
\section*{1. Background}
|
|
|
|
\section*{Lemma 1}
|
|
If $X$ follows a d-dimensional standard Gaussian distribution, and $v$ is a fixed unit vector, then $X \cdot d \sim N(0,1)$.
|
|
|
|
Justification: the d-dimensional standard Gaussian distribution is symmetric on any direction, like a ball; or, use the property that any linear combination of a multivariate normal vector is itself a univariate normal random variable.
|
|
|
|
\section*{Lemma 2}
|
|
When $n >> 0$, the maximum value among n standard Gaussians is very unlikely to exceed $\sqrt{2\ln{n}}$.
|
|
|
|
Relying on Lemma 5.3 in \href{https://psdey.web.illinois.edu/math595fa19/lec05.pdf}{``Concentration for Maximum'' from a UCLA lecture by Partha S. Dey (scribe: Hsin-Po Wang)} to supports this claim. Copying Lemma 5.3 here:
|
|
|
|
Suppose $M_n$ is the maximum of n i.i.d. standard normals, then
|
|
|
|
$$ P[M_n \geq \sqrt{2\log{n}} + \frac{x}{\sqrt{2\log{n}}}] \leq e^{-x} $$
|
|
|
|
\section*{Lemma 3}
|
|
|
|
If x is standard Gaussian, its mean value given that $x>0$ is $\sqrt{\frac{2}{\pi}}$. That is, $x \sim N(0,1) \rightarrow E[x|x>0] = \sqrt{\frac{2}{\pi}}$.
|
|
|
|
Derivation: $E[x|x>0] = \sqrt{\frac{2}{\pi}} \int_{x=0}^\infty xe^{-x^2/2}dx = \sqrt{\frac{2}{\pi}} \int_{u=0}^\infty e^{-u} du = \sqrt{\frac{2}{\pi}}$
|
|
|
|
\section*{2. Problem}
|
|
|
|
Let$X_1,...,X_n$ be i.i.d. samples from a d-dimensional standard Gaussian. We say sample $X_i$ is ``positive linear optimal'' (PLO) if there exists a d-dimensional vector k whose elements are non-negative such that $X_i \cdot k > X_j \cdot k, \forall j\neq i$. Suppose $n >> d >> \ln{n}$. As $n\rightarrow \infty$, is every point almost always, for some probability, or almost never positive linear optimal?
|
|
|
|
\section*{2.1 Lower bound}
|
|
If $X_i$ is greatest in any dimension r, $X_i$ is definitely for PLO - just choose $k=e_r$. The probability of being greatest in any dimension is $\frac{1}{n}$, so the probability of being greatest in any dimension is $1-(1-\frac{1}{n})^d \sim \frac{d}{n} \rightarrow 0$, a lower bound too loose.
|
|
|
|
\section*{2.2 Upper bound}
|
|
If $X_i$ is PLO, then $X_i$ must be Pareto optimal: $\forall j$, we must not have $X_j > X_i$ on every dimension (i.e. $X_i$ is dominated by $X_j$). The probability of being PLO is upper bounded by being Pareto optimal. The latter can be estimated as follows: the probability of $X_i$ being dominated by $X_j$ is the product of every dimension, $\frac{1}{2^d}$. Assume that $X_j > X_i$ is independent from $x_m > X_i$. Then the probability of $X_i$ being Pareto optimal is about $1-\frac{n}{2^d} \rightarrow 1$, also too loose.
|
|
|
|
|
|
\section*{3. Construction}
|
|
|
|
Claim: Every sample is almost always PLO.
|
|
|
|
Construction: For sample $X_i$, we choose k to be $\frac{X_i^{+}}{||X_i^{+}||}$, where $X_i^{+}$ is the vector formed by taking the components of $X_i$ that are positive and setting the non-positive components to 0. We see that k is a unit vector.
|
|
|
|
\section*{3.1. The Candidate's score}
|
|
|
|
We know that every component of $X_i^+$ follows the ``clipped'' standard Gaussian, where the probability mass for negative values are clipped to 0 (which has 50\% chance of happening). Therefore the ``score'' $S$ (we use it to refer to the dot product of a sample with k) of $X_i$ is
|
|
\begin{align*}
|
|
S &= X_i \cdot k = X_i^{+} \cdot k = X_i^{+} \cdot \frac{X_i^+}{||X_i^+||} = ||X_i^+|| \\
|
|
S^2 &= X_i^+ \cdot X_i^+ = \Sigma_{r=1}^d (X_{i,r}^+)^2 \sim d/2
|
|
\end{align*}
|
|
|
|
(Use the law of large numbers here)
|
|
|
|
\section*{3.2 The Competitor's score}
|
|
|
|
$X_i$ must compete against the rest of the samples. For $X_j, j\neq i$, since $X_j$ is independent to $k$ and $k$ is a unit vector, we apply Lemma 1 and see $X_j \cdot k \sim N(0,1)$.
|
|
|
|
Among n-1 competitors, the strongest competitor's score $T$ must not exceed $\sqrt{2\ln{n}}$ a lot, according to Lemma 2. So $T^2 \sim 2\ln{n}$.
|
|
|
|
\section*{3.3 Comparison}
|
|
|
|
Since $d >> \ln{n}$ in the problem statement, we have $S^2 >> T^2$. Therefore every point is almost always PLO.
|
|
|
|
\section*{3.4 Variant: Allow competitors to exclude negative scores}
|
|
|
|
Consider what happens if competitors are allowed to exclude negative components from the dot product. Or equivalently, $X_1^+,...,X_n^+$ is sampled from a clipped d-dimensional standard Gaussian. The candidate's score wouldn't change; will the competitors' scores increase so much as to surpass the candidate? For each competitor j, its score $ T_j = X_j^+ \cdot k = \frac{1}{||X_i^+||} \Sigma_{r=0}^d X_{jr}^+ \cdot X_{ir}^+ $
|
|
|
|
Apply Lemma 3, we see that $E[X_{jr}^+] = \frac{1}{2}*0 + \frac{1}{2}E[x|x>0,x\sim N(0,1)] = \sqrt{\frac{1}{2\pi}}$. We also know that $E[(X_{jr}^+)^2] = \frac{1}{2}$, thus $Var[X_{jr}^+] = (\frac{1}{2}-\frac{1}{2\pi})$.
|
|
|
|
On the first order,
|
|
|
|
$$ E[||X_i^+||*T_j] = \sqrt{\frac{1}{2\pi}} \Sigma_{r=1}^d X_{ir}^+ := \sqrt{\frac{1}{2\pi}} s_1$$
|
|
|
|
For the variance,
|
|
|
|
$$ Var[||X_i^+||*T_j] = \Sigma_{r=1}^d (\frac{1}{2}-\frac{1}{2\pi}) * (X_{ir}^+)^2 := (\frac{1}{2}-\frac{1}{2\pi}) s_2$$
|
|
|
|
Since $T_j$ is the sum of many independent distributions, apply the law of large numbers on the component dimension (assuming $d >> 1$), $T_j$ is approximately a normal distribution:
|
|
|
|
$$ ||X_i^+||*T_j \sim N(\sqrt{\frac{1}{2\pi}} s_1,\ (\frac{1}{2}-\frac{1}{2\pi}) s_2 ) := N(\mu, \sigma^2) $$
|
|
|
|
Apply Lemma 2 here, among n competitors
|
|
|
|
$$ ||X_i^+||*T_{max} \sim \mu + \sqrt{2\ln{n}}*\sigma $$
|
|
|
|
Recall that $||X_i^+|| = \sqrt{s_2}$
|
|
|
|
$$ T_{max} \sim \sqrt{\frac{1}{2\pi}} \frac{s_1}{\sqrt{s_2}} + \sqrt{(1-\frac{1}{\pi})\ln{n}} $$
|
|
|
|
Considering that $s_1 \sim d\sqrt{\frac{1}{2\pi}}$, $s_2 \sim \frac{d}{2}$,
|
|
|
|
$$ T_{max} \sim \frac{\sqrt{d/2}}{\pi} + \sqrt{(1-\frac{1}{\pi})\ln{n}} $$
|
|
|
|
When $d >> \log{n}$, $T_{max} \sim \frac{1}{\pi} \sqrt{d/2}$ while $S \sim \sqrt{d/2}$. The strongest competitor is no longer order of magnitudes smaller than S (thanks to the exclusion of negative scores), but instead a constant fraction (specifically, $\frac{1}{\pi}$) of S. That said, the conclusion doesn't change - $X_i$ almost always wins.
|
|
|
|
\section*{4. Numerical Sanity Check}
|
|
|
|
\begin{lstlisting}
|
|
import numpy as np
|
|
|
|
N = 100000
|
|
d = 1000
|
|
Xs = np.random.normal(size=(N,d))
|
|
|
|
# Find the metric vector
|
|
X0 = Xs[0] # We demonstrate that X0 is optimal
|
|
X0p = X0.clip(min=0)
|
|
L2_X0p = np.linalg.norm(X0p)
|
|
X0p_hat = X0p / L2_X0p
|
|
X0_score = np.dot(X0p_hat, X0) # is also L2_X0p
|
|
print("X0_score: %.4f" % X0_score)
|
|
|
|
# Find scores
|
|
scores = Xs @ X0p_hat
|
|
scores = np.partition(scores, -2)
|
|
print("1st score: %.4f, squared = %.4f ~ d/2 = %.2f" % (scores[-1], scores[-1]**2, d/2))
|
|
print("2nd score: %.4f, squared = %.4f < bd, bd ~ 2log(n) = %.4f" % (scores[-2], scores[-2]**2, 2*np.log(N)))
|
|
|
|
print("If we allow other candidates to remove their negative scores")
|
|
Xs = Xs.clip(min=0)
|
|
scores = Xs @ X0p_hat
|
|
scores = np.partition(scores, -2)
|
|
dpart = np.sqrt(d/2)/np.pi
|
|
npart = np.sqrt((1-1/np.pi)*np.log(N))
|
|
print("2nd score: %.4f ~ sqrt{d/2pi^2} + sqrt{(1-1/pi)ln(n)} = %.4f" % (scores[-2], dpart+npart))
|
|
\end{lstlisting}
|
|
|
|
Example Result:
|
|
\begin{lstlisting}
|
|
X0_score: 21.2264
|
|
1st score: 21.2264, squared = 450.5582 ~ d/2 = 500.00
|
|
2nd score: 4.3707, squared = 19.1034 < bd, bd ~ 2log(n) = 23.0259
|
|
If we allow other candidates to remove their negative scores
|
|
2nd score: 9.6617 ~ sqrt{d/2pi^2} + sqrt{(1-1/pi)ln(n)} = 9.9191
|
|
\end{lstlisting}
|
|
|
|
\end{document} |