Markov Chain Monte Carlo

Markov Chain Monte Carlo

Markov Chain Monte Carlo (MCMC) is used when a probability distribution is known up to proportionality, but direct integration or exact sampling is too difficult. This is common in Bayesian inference, where the posterior may be high-dimensional and analytically intractable.

MCMC builds a Markov chain whose stationary distribution is the target distribution. After warm-up, the visited states behave like dependent samples from that target. Histograms, expectations, credible intervals, and posterior summaries can then be estimated from those samples.

Where is it used?

MCMC is used in Bayesian modeling, physics simulation, hierarchical medical models, ecological inference, and risk analysis when closed-form calculations are not available.

Best use

It can estimate posterior quantities for models where direct integration is not feasible.

Watch out for

It is notoriously slow. Taking millions of tiny random steps takes a lot of computing power and time.

i

Intuition

How to think about this algorithm

Imagine a landscape where height represents probability density. The sampler proposes nearby moves. Moves to higher-density regions are usually accepted, but some lower-density moves are accepted too.

That occasional willingness to move downhill matters. It keeps the chain from getting trapped at one local peak and lets it explore the distribution. The result is not a perfect map, but an approximation that improves with better proposals, diagnostics, and enough effective samples.

Interactive Diagram

Metropolis-Hastings MCMC Sampling

Click Step or Run 50 to generate proposed random steps. Accepted proposals move the sampler; rejected ones stay.

Target Distribution
Active Sampler
Samples Histogram
Samples Count: 1
Current Position: 5.000
Proposal Step Size (σ)
Standard Dev1.00
Key InsightMCMC proposal jumps are accepted proportionally to the target density ratio. Over time, the histogram of visited coordinates mirrors the target shape.

The Logic

Mathematical core for markov chain monte carlo

1. The Metropolis-Hastings Algorithm

To map out a target probability distribution P(x)P(x), the algorithm is currently at state xx. It proposes a random jump to a new state xx'. It then calculates an acceptance ratio (α\alpha):

α=min(1,P(x)P(x))\alpha = \min\left(1, \frac{P(x')}{P(x)}\right)

This ratio asks whether the proposed state is more probable than the current state. If P(x)P(x') is greater than P(x)P(x), the ratio is 1, and the algorithm always moves to the new spot. If the proposed state is less probable, the algorithm may still accept it with probability α\alpha. Otherwise, it stays at xx and records the current state again.

2. Why it works

Allowing occasional lower-density moves is crucial because it improves exploration. In practice, you still need convergence diagnostics, warm-up removal, and effective sample size checks before trusting the estimates.

Code Example

markov_chain_monte_carlo.py · reference implementation

Python
model_fitting.py
1import numpy as np
2
3# Let's build a simple MCMC walker!
4def target_density(x):
5    # A weird, two-peaked math equation we want to map
6    return np.exp(-0.5 * (x - 2)**2) + 0.5 * np.exp(-0.5 * (x + 2)**2)
7
8current_x = 0.0
9samples = []
10n_iterations = 10000
11
12for _ in range(n_iterations):
13    # Propose a random step
14    proposed_x = current_x + np.random.normal(0, 1.5)
15    
16    # Calculate if the new spot is better or worse
17    acceptance_ratio = target_density(proposed_x) / target_density(current_x)
18    
19    # Roll the dice to see if we accept the move!
20    if np.random.rand() < acceptance_ratio:
21        current_x = proposed_x
22        
23    samples.append(current_x)
24
25print(f"We took {len(samples)} steps to map the mountain!")

Strengths

  • It can estimate posterior quantities for models where direct integration is not feasible.

  • It does not require a Gaussian posterior and can represent skewed, correlated, or multi-modal distributions if the chain mixes well.

!

Limitations

  • It is notoriously slow. Taking millions of tiny random steps takes a lot of computing power and time.

  • It can be very difficult to know exactly when the algorithm has 'finished' mapping the mountain. You often have to run multiple walkers at the same time to check if they agree.

A

Key Assumptions

Scope conditions and interpretation notes

  • 1

    The Markov chain has the intended stationary distribution.

  • 2

    The chain mixes well enough that collected samples represent the target distribution.

R

References

Books and papers for deeper study

  • Gilks, W.R., Richardson, S. and Spiegelhalter, D. (1996) Markov Chain Monte Carlo in Practice. Boca Raton, FL: Chapman and Hall/CRC.