Self Repellent Random Walks

Self-Repellent Random Walks on General Graphs

Achieving Minimal Sampling Variance via Nonlinear Markov Chains

By: Vishwaraj Doshi, Jie Hu, Do Young Eun

Why?

Graphs are random ($\pm \varepsilon$)
fully explored graph $\rightarrow$ understanding of the topology of the graph
Seeing the same state twice is a waste of time

Why ⁉️

There is methodology that exists for this already though: Markov Chain Monte Carlo (MCMC).

however they have their own issues

Slow convergence time
Correlated sampling

Main Challenge 🛑

The real focus of the issue is the slow convergence times of these MCMC techniques. these slow convergence times come from the time reversibility requirement of MCMC, and poor sampling variance.

Proposed Solution 💡

The authors propose a solution to dynamically adjust the biases of certain transitions to drastically cut down on time spent "rewinding"

but obviously, we need to massage this a little bit, because if it were trivial, someone else woulda did it!

Formal Goal 🥅

The formal goal of this paper is to introduce a novel approach to a random graph walk that efficiently explores the graph and minimizes sampling variance

Assumptions 🧪

The authors of this paper come in with the presumption that almost any random walk that ignores history will converge faster than traditional approaches and has the capacity to learn the same distribution.

Methodology ⚙️

Starting with a graph $\cal{G}(V, E)$, cast $\cal{G}$ into its matrix representation, so that we can have the following matrices

adj. $\cal A$
neighboors $\cal N$
deg. (degree of node)
$\Sigma$ probability simplex over $\cal N$
$P$ transition probability

Methodology ⚙️

SSRW Transition Kernel

now that we have our problem formulated in a tractable manner, let's develop the kernel that will help us make decisions \[K_{i,j}[x] = \frac{P_{i,j}r_{\mu_j}(x_j)}{\Sigma_{k\in \cal N} P_{ik} r_{\mu_k}(x_k)} \] If any of you have taking ML or DL, this may look sort of familiar to you.

Methodology ⚙️

Non Linear Markov Chains

SSRW use probability distributions as an argument thus they are implicitly non-linear.

This is where they vary from traditional "linear" kernels who are only dependent on the Markov Chain.

\[x_n = \frac{1}{n+1} \sum^{n}_{k=0} \delta_{X_k}\]

where $\delta$ is the measure of change of an entry

Methodology ⚙️

Tunable Parameters

In the population of $r$ we have a tunable parameter $\alpha$ which can be viewed as the strength of the self repellance

If we set this parameter to be too high, we might be too strict in our selection of new states.

Similarly if we set this parameter to be too low, we will basically select random states

Proof✍️

The rest of the paper is effectively a series of small proofs to show:

Computational coplexity
~ Optimiality

Proof✍️

The authors show that their method samples new states at a similar rate to MCMC, which is a known NP-Hard problem, in ~$\cal O(1/\alpha)$

Which is incredible!

Conclusion ✅

In conclusion, this paper was hard:

to do
to understand
to read

I don't blame the authors for this though, what they did was complex and hard. I came into this paper without the background required to understand it.

Thank + Questions

Thanks for listening, please direct any questions to wwu@diegollanes.com