Stochastic limits, part 2: tails, memory, and the Joseph and Noah effects

In the last post, we discussed what happens to the sum \[\begin{equation} \lim_{N\to\infty} S_N - \mu N, \qquad S_N := \sum_{i=1}^{N} X_i, \end{equation}\]

with \(X_i\) drawn from some “nice” distribution with \(\mu := \mathbb{E}[X_i]\). In this scenario, \(\mathbf{S}_N(t) \to \mathbf{B}(t)\), Brownian motion. For this discussion, we’ll have to state a little more explicitly what we mean by “nice.” The two big features we needed are

  1. finite (first two) moments, i.e. \(\mathbb{E}[X], \mathbb{V}[X] < \infty\),
  2. independence, i.e. \(\mathbb{P}(X_i = x_i, X_j = x_j) = \mathbb{P}(X_i = x_i)\mathbb{P}(X_j = x_j)\).

With these in mind, it’s somewhat intuitive the limiting object is Brownian: a process with Gaussian, independent increments. However, we can now ask, what if we break one (or all) of these assumptions? Should we expect any limiting object? If one does exist, should we expect independent increments if we’re building an object out of non-independent things?


As with most mathematical questions, I think a sane first step is to do some caveman simulations. To do these, a little bit of insight is necessary. What is a particular distribution with infinite moments? A great case-study for use here is the Pareto distribution, characterized by its probability density function \[\begin{equation} f_X(x)= \begin{cases} \frac{\alpha x_\mathrm{m}^\alpha}{x^{\alpha+1}} & x \ge x_\mathrm{m}, \\ 0 & x < x_\mathrm{m}, \end{cases} \end{equation}\]

where \(x_m >0\) is some scale parameter and \(\alpha>0\) is some shape parameter. Weirdly, sampling from this distribution isn’t built into the vanilla R distribution, but we can easily define some commands that give us the standard R functionality for a distribution.

dpareto <- function(x, xm, alpha) ifelse(x > xm , alpha*xm**alpha/(x**(alpha+1)), 0)
ppareto <- function(q, xm, alpha) ifelse(q > xm , 1 - (xm/q)**alpha, 0 )
qpareto <- function(p, xm, alpha) ifelse(p < 0 | p > 1, NaN, xm*(1-p)**(-1/alpha))
rpareto <- function(n, xm, alpha) qpareto(runif(n), xm, alpha)
Here, we’ll just take \(x_m=1\) for convenience. The fact that makes the Pareto distribution relevant are its first two moments, first the mean \[\begin{equation} \mathbb{E}[X] = \begin{cases} \infty & \alpha \leq 1 \\ \frac{\alpha}{\alpha-1} & \alpha > 1, \end{cases} \end{equation}\] and variance \[\begin{equation} \mathbb{V}[X] = \begin{cases} \infty & \alpha \leq 2 \\ \frac{\alpha}{(\alpha-1)^2(\alpha-1)} & \alpha > 2. \end{cases} \end{equation}\]

Therefore, we have three scenarios of interest:

  1. If \(\alpha \in (0,1]\), then \(\mathbb{E}[X] = \infty\) and \(\mathbb{V}[X] = \infty\),
  2. if \(\alpha \in (1,2]\), then \(\mathbb{E}[X] = \infty\) but \(\mathbb{V}[X]\) is finite,
  3. if \(\alpha \in (2,\infty]\), both \(\mathbb{E}[X]\) and \(\mathbb{V}[X]\) are finite.

With this in mind, we can easily simulate \(S_N\) where \(X_i\) is drawn from these three regimes.

Infinite first two moments

Figure 1: \(S_n\) for \(1 \leq n \leq N\) with \(X_n \sim \text{pareto}(0,0.25)\).

Finite mean, infinite variance

Figure 2: \(S_n\) for \(1 \leq n \leq N\) with \(X_n \sim \text{pareto}(0,1.1)\).

Finite mean and variance

Figure 3: \(S_n\) for \(1 \leq n \leq N\) with \(X_n \sim \text{pareto}(0,2.5)\).

From this, we gain a bit of intuition. In the case where both moments are infinite (Figure 1) that the process is in some sense unstable, in that huge, fairly common jumps continue to cause the process to blow up. In Figure 2, if we make the mean finite, things get a little more predictable: jumps still occur but the magnitude is not huge previous to previous values. Finally, in Figure 3, with two finite moments, things smooth out and we get back the scenario discussed in the last post.

Relaxing independence

How do we simulate a process where each payoff is non-independent? A classical model for this is the autoregressive moving-average (ARMA) process, described by the relation \[\begin{equation} X_k := (1-\gamma)X_{k-1} + \gamma U_k, \tag{1} \end{equation}\] where \(U_k \sim \text{unif}[0,1]\). Note that this process does indeed have memory, as the value of \(X_k\) is determined by the previous value, \(X_{k-1}\). However, we can still define \(S_N := \sum_{i=1}^{N} X_i\) and ask, what is the behavior as \(N\to\infty\)? It was nice to subtract the mean behavior off (which we just saw isn’t always possible), but here, it actually is, by noting that taking expectations of (1), which yields \[\begin{equation} \mathbb{E}[X] = (1-\gamma)\mathbb{E}[X] + \gamma \mathbb{E}[U], \end{equation}\] which yields \[\begin{equation} \mu = \mathbb{E}[X] = \mathbb{E}[U] = \frac{1}{2}. \end{equation}\]

We can simulate this and subtract off the mean to see what the large \(N\) behavior increments.

Figure 4: Net worth \(S_n - n\mu\) for \(1 \leq n \leq N\) for \(X_n\) generated by an ARMA process with \(\gamma=1/2\).

In Figure 4, we see a familiar looking limiting object: Brownian motion. This tells us that perhaps, independence is not a strict requirement for a FCLT? If it is not, to what extent can we break this assumption?

Biblical analogies

Here, we simulated when things go “wrong,” with the classical central limit theorems and it turns out, these ideas have (archaic) names that referencing the Bible.

Noah effect

Noah on his ark surviving a statistically rare flood.

Figure 5: Noah on his ark surviving a statistically rare flood.

When we broke the finiteness of our samples, we got monumental jumps. This is what is referred to as a heavy-tailed behavior. That is, large values of \(X_i\) (which exist in the tail of the distribution) have a significant enough probability that they occasionally occur. This is sometimes referred to as the Noah effect, referencing the catstrophic flood encountered, for which Noah built his ark. In that context, the flood was unlike one ever seen before, which is exactly the case with events stemming from heavy-tails. While they may be rare enough that one of that magnitude may not have been previously seen, this rarity is offset by the extreme magnitude, and these events have a role in the long term behavior.

Joseph effect

Joseph received seven years of prosper and then seven years of famine.

Figure 6: Joseph received seven years of prosper and then seven years of famine.

When we removed the assumption of independence of our samples, events became correlated, as in, the system gained a memory. This memory may manifest as perioids of persisting with the same behavior: say, if \(X_i\) is positive, maybe \(X_{i+1}\) is more likely to also be positive. This lends itself to the story of Joseph in the bible. In a prophecy, Joseph sees that seven years of abundance will be followed by seven years of famine. That is, the behavior is persistent, or more technically, autocorrelated, which is sometimes called the Joseph effect. I’ve also heard this referred to as the “Netflix binge effect,” where once you watch an episode, there is a higher probability of continuing than just watching just a single episode.

Summarized theory

Self-similarity & tails

One way to understand the limiting behavior is to characterize the self-similarity of the resulting process. A process \(Z(t)\) is said to be self-similar if there exists an \(H>0\) such that \[\begin{equation} Z(at) = a^H Z(t), \end{equation}\]

where \(H\) is known as the Hurst parameter. Notice that Brownian motion is self-similar with \(H=1/2\). Both the Joseph and Noah effects result in growth that is in some sense greater than classical Brownian motion. This manfiests as \(H>1/2\), for different underlying reasons. \(H<1/2\) actually represents a negative time self-similarity, which may occur from say, thinking of \(Z(t)\) as a model of arrivals to a doctors office with scheduling to avoid conflicts.

To distinguish between these effects, we also want to characterize the tail behavior, or the existence of abnormally large events. One way to do so is by a parameter \(\alpha\), corresponding to the relation \[\begin{equation} \mathbb{P}(|X| > t) \sim C t^{-\alpha}, \qquad \text{ as } t \to \infty. \end{equation}\]

That is, \(\alpha\) effectively measures how quickly the tail of the distribution decays. For \(\alpha\) larger, the tail decays faster, killing off the possibility of statistically rare large events. For a process with independent increments, it can be shown that \(H=\alpha^{-1}\). For Brownian motion, we actually know the exact value of these, which is \(H=\alpha^{-1}=1/2\). For the Noah effect, we have independent increments, but the self-similarity and tails are larger than that of classical Brownian motion, so we an characterize the Noah effect precisely when \(H=\alpha^{-1} > 1/2\). The Joseph effect alone is when we don’t have long tails, but the self-similarity (or persistence) of our process is distinct, so this precisely translates to \(H > \alpha^{-1}=1/2\). We could even fathom an ultimately bad scenario with both the Noah and Joseph effects, which would manifest as \(H > \alpha^{-1} > 1/2\).

Convergence to a Lévy process

The rigorous requirements for convergence are too long and annoying to include here, but hopefully unsurprisingly they relate to \(H\) and \(\alpha\). Again, we can consider making our sum some functional by “connecting the dots,” \[\begin{equation} \mathbf{S}_N(t) := c_N^{-1/2}\left(S_{\lfloor Nt \rfloor} - \mu \lfloor N t \rfloor\right), \quad t\in[0,1]. \tag{2} \end{equation}\] Then, under some technical conditions \[\begin{equation} \mathbf{S}_N(t) \rightarrow \sigma \mathbf{S}(t), \quad \text{ as } N\to\infty, \tag{3} \end{equation}\] where \(\mathbf{S}(t)\) is a standard \((\alpha,\beta)\)-stable Lévy motion, distributed by \[\begin{equation} \mathbf{S}(t) \sim t^{1/\alpha} S_\alpha (1,\beta,0) = S_\alpha(t^{1/\alpha}, \beta,0). \tag{4} \end{equation}\]

Right away, in (4), we see that a self-similarity appears, characterized by \(\alpha\). When \(\alpha=2\), the stable Lévy is simply Brownian motion. However, when \(\alpha <2\), this can be thought of as a jump process, which looks a lot like Brownian motion just with non-zero mean increments. For \(0 < \alpha < 1\), paths grow at a reasonable rate, whereas \(1 < \alpha <2\), jumps become unbounded in variance. For \(\alpha > 2\), the motion is no longer stable and blows up in finite time. Hence, we can immediately see this characterizes the different scenarios of the Pareto distribution.

Convergence to Fractional Brownian Motion (FBM)

We need a way to generally characterize the non-independence of \(X_i\), which we can do supposing that it has the representation \[\begin{equation} X_i = \sum_{j=0}^{\infty} a_j Y_{i-j}, \qquad Y_j \sim \mathcal{N}(0,1). \tag{5} \end{equation}\] Then, suppose the weights satisfy \[\begin{equation} \sum_{j=0}^{\infty} a_j < \infty. \tag{6} \end{equation}\] Then, if (5) and (6) are satisfied, then it can be shown that \(\mathbb{V}[X_n] < \infty\). In this case, yet again define \(\mathbf{S}_N\) as in (2), and suppose that \(\mathbb{V}[S_n]\sim n^{2H}\). Then, we can state the FCLT for this case as Then, under some technical conditions \[\begin{equation} \mathbf{S}_N(t) \rightarrow \sigma \mathbf{Z}_H(t), \quad \text{ as } N\to\infty, \tag{7} \end{equation}\] where \(\mathbf{Z}_H(t)\) is fractional Brownian motion, which still is a Gaussian process, but with covariance structure \[\begin{equation} \text{covar}[\mathbf{Z}_H(t), \mathbf{Z}_H(\tau)] = \frac{1}{2}[t^{2H} + \tau^{2H} - (t-\tau)^{2H}]. \end{equation}\]

Recall that if \(H=1/2\) as with Brownian motion, the two different times \(t,\tau\) are uncorrelated. However, for any other value of \(H\), the times are correlated, meaning that there is either positive persistence (things stay the same) or negatively correlated (things are likely to change). This is, in essence, the Joseph effect.

Heavy tails plus dependence

There has been a significant body of work showing that even when you combine these two effects, you still (surprisingly) can get a limiting process. I’ll omit the details here but briefly mention that depending on the conditions, you get a pretty exotic fractional Lévy process.


The landscape of stochastic processes can be very daunting. However, limiting behavior provides a concrete and beautiful way of categorizing their behavior. The most classical case deals with when everything goes right. Suppose we take the nicest samples possible: independent, finite samples and use these as our building blocks. This is the realm of the ever-powerful Central Limit Theorem (CLT). Beyond this, we were able to articulate Donsker’s theorem, or the Functional Central Limit Theorem (FCLT), which provides convergence to Brownian motion and provides the Gaussian structure of the classical CLT.

Deviations from the assumptions of independence and finite moments lead us to two other classical stochastic processes, each of which embodies the aspect of these assumptions we “break.” If we remove the requirement that the moments of our increments are finite, this leads to large jumps, and the limiting behavior is the classical jump process, a Lévy motion. If we remove independence, this inherently adds a memory effect to our system, and we get out the classical model of a process with correlations, fractional Brownian motion. For me, these processes have counterintuitive, esoteric descriptions, but when we “build” them from scratch by specifying their increments, I think their behavior is far more digestable.


comments powered by Disqus