# Introduction to WebPPL

WebPPL is a probabilistic programming language based on Javascript. WebPPL can be used most easily through webppl.org. It can also be installed locally and run from the command line.

The deterministic part of WebPPL is a subset of Javascript.

New to functional programming or JavaScript? We will explain some of the constructs as they come up in WebPPL. For a slightly more substantial introduction, try this short introduction to the deterministic parts of the language.

The probabilistic aspects of WebPPL come from: distributions and sampling, marginal inference, and factors.

Probabilistic model: A mathematical mapping from a set of latent (unobservable) variables to aprobability distributionof observable outcomes or data. Aprobability distributionis simply a mathematical mapping between outcomes and their associated probability of occurrence.

### Some restrictions

Being a probabilistic programming language, WebPPL is more restrictive than JavaScript. Variables can be defined, but (unlike in JavaScript) their values cannot be redefined. For example, the following does not work:

```
var a = 0;
a = 1; // won't work
// var a = 1 // will work
var b = {x: 0};
b.x = 1; // won't work
// var b = {x: 1} // will work
```

This also means looping constructs (such as `for`

) are not available; we use functional programming constructs instead (like `map`

, to be explained below) to operate on arrays.
(Note that tensors are not arrays.)

### Building stochastic sampling functions

A function is a procedure that returns a value. Functions (often) take as input some number of arguments. You can define new functions in WebPPL, just as you would in JavaScript.

```
var myNewFunction = function(args){
return args[0] + args[1]
}
myNewFunction([2, 3])
```

Above, `args`

is the argument to the function `myNewFunction`

; based on the body (content) of `myNewFunction`

, `args`

should be a list.
What the function `myNewFunction`

does is add the 0th element of the list to the 1st element of the list (WebPPL is 0-indexed).

Exercise:Try running`myNewFunction([2, 3, 5])`

. What happens?

The above function is deterministic. It returns the same outcome every time it is called for the same input. To unfold the power of probabilistic programming, we also consider stochastic sampling functions. Sampling functions, like other functions, take in some number of arguments (often, the parameters of a probability distribution), do some (probabilistic) computation, and return the output. In their most basic form, sampling functions return a random value drawn from a known probability distribution.

```
flip(0.6)
```

`flip`

is a function: it takes an argument and returns a value.
What makes `flip`

special is that doesn’t return the same value every time you run it, even with the same arguments: It is a probabilistic function.

Exericise:Try running the code box above multiple times to see this.

`flip`

essentially flips a coin whose probability of landing on heads is given by the parameter value (above: `0.6`

).
(You can treat the value of `true`

as “heads” and `false`

as “tails”).

By chaining various sampling functions together we can create more complex sampling functions. Here is a minimally more complex example:

```
var two_coins = function(){
var c1 = flip(0.5);
var c2 = flip(0.5);
return c1 + c2; // adding Booleans coerces them to integers
}
// sample outcome of two coin flips (added together)
two_coins()
```

#### Built-in sampling functions

`flip()`

is a the simplest stochastic sampling function, but there are many more in WebPPL.
Check out the WebPPL documentation on the available distributions.

```
// returns a single sample from a standard normal
gaussian({mu: 0, sigma: 1})
```

Notice that the sampling functions of built-in probability distributions all start with a lower-case letter (see below for explanation of *distribution objects* which are instantiated by using a capital letter).

#### Higher-order functions: Munging randomness

*Higher-order functions*, i.e., functions that take other functions as arguments, are a key component of functional programming languages; they allow you chain functions or to repeat a computation (e.g., like you would want to do with a for-loop).
Two very useful higher-order functions are `repeat`

and `map`

.

```
repeat(10, flip)
```

`repeat()`

has two arguments: a number `10`

and a function `flip`

.
It returns the outcome of repeating `n`

(here, 10) calls to function (flipping a coin).
Notice that the function `flip`

is being referred to by its name; the function `flip`

is not being called (which you would achieve with: `flip()`

) before it is passed to `repeat`

.

Exercise: Try repeating`flip(0.6)`

. What needs to change from the original code box? (Hint:`repeat()`

wants to take a function as an argument. You can define new functions using the`function(arguments){body}`

construction, as in`var newFn = function(){...}`

`)`

Sometimes, we want to repeat a function but with different arguments. For this, we need `map`

. Here is an example of flipping five coins, each with a different bias:

```
var weights = [0.1, 0.3, 0.5, 0.7, 0.9]
map(function(w){ flip(w) }, weights)
```

`map`

takes two arguments: a function and a list. `map`

returns a list, which is the result of running the function over each element of the list.
If you are uncertain about the arguments to `repeat()`

, `map()`

, or other WebPPL functions, pop over to the WebPPL docs to get it all straight.

#### A brief aside for visualization

WebPPL (as accessed in the browser) provides basic visualization tools.
WebPPL-viz has a lot of functionality, the documentation for which can be found here.
The coolest thing about WebPPL-viz is the default `viz()`

function, which will take whatever you pass it and try to construct a reasonable visualization automatically.
(This used to be called `viz.magic()`

).

```
// visualize the output of this higher-order function call
repeat(10, flip)
```

Exercises:

- Try calling
`viz()`

on the output of the`repeat`

code box above.- Run the code box several times. Does the output surprise you?
- Try repeating the flip 1000 times, and run the code box several times. What has changed? Why?

#### Recursive functions

A recursive function is one that encodes a procedure in which the same function is called. It is easy to get stuck in an infinite regress with recursive function, so it’s often important to define a base case.

```
var firstEven = function(lst){
if ((first(lst) % 2) == 0) {
return first(lst)
} else {
return firstEven(rest(lst))
}
}
firstEven([1,3,2,4])
```

`firstEven`

takes in a list, checks to see if the first element is even, and returns that value if it is even, but otherwise calls itself using the rest of the list (everything except the first element) as the argument.

#### Challenge problem

We can also use recursive functions to build interesting stochastic functions.

```
var geometricCoin = function(){
...
}
```

Exercises:

- Make a function (
`geometricCoin`

) that flips a coin (of whatever weight you’d like). If it comes up heads, you call that same function (`geometricCoin`

) and add 1 to the result. If it comes up tails, you return the number 1.- Pass that function to repeat, repeat it many times, and create a picture using
`viz()`

### Distributions

So far, we looked at *sampling functions*, which produce single samples from probability distributions, e.g., the outcomes of randomly flipping coins (of a certain weight).
The probability distributions were *implicit* in those sampling functions; they specified the probability of the different return values (`true`

or `false`

).
When you repeatedly run the sampling function more and more times (for example: with `repeat`

), you approximate the underlying true distribution better and better.

WebPPL also represents probability distributions *explicitly*.
We call these explicit probability distributions: **distribution objects**.
Syntactically, this is denoted using a capitalized versions of the sampler functions.

```
// bernoulli(0.6) // same as flip(0.6); returns a single sample
var myDist = Bernoulli( { p: 0.6 } ) // create distribution object
viz( myDist ) // plot the distribution
```

(Note: `flip()`

is a cute way of referring to a sample from the `bernoulli()`

distribution.)

Exercise: Try running the above box many times. Does it change? Why not?

Distributions have parameters. (And different distributions have different parameters.)
For example, the Bernoulli distribution has a single parameter.
In WebPPL, it is called `p`

and refers to the probability of the coin landing on heads.
Click here for a complete list of distributions and their parameters in WebPPL.
We can call the distributions by passing them objects with the parameter name(s) as keys and parameter value(s) as values (e.g., `{p: 0.6}`

).
When a distribution is explicitly constructed, it can be sampled from by calling `sample()`

on that distribution.

```
var parameters = {p: 0.9}
var myDist = Bernoulli(parameters)
sample(myDist)
```

Exercise: Line by line, describe what is happening in the above code box.

### Properties of distribution objects

We just saw how you can call `sample()`

on a distribution, and it returns a value from that distribution (the value is sampled according to the its probability).
Distribution objects have two other properties that will be useful for us.
We get a glimpse at how WebPPL represents probability distributions as first-order objects when we use the method `print`

. (This example also show the difference between the brute `print()`

and the elegant `display()`

.)

```
var myDist = Bernoulli( { p: 0.6 } ) // create distribution object
display( myDist ) // show high-level representation
print( myDist ) // show true underlying representation
// (to see the last output you may have to run the code box twice (a glitch in "print"))
```

(Notice: using `print`

only works for discrete probability distributions (with finite support).)

We see that distribution objects are internally represented as a set of elements (the support) and the probability, or score (see below), of each element in the support.

Sometimes, we want to know what possible values could be sampled from a distribution.
This is called the **support** of a distribution and it can be accessed by calling `myDist.support()`

, where `myDist`

is some distribution object.

```
var myDist = Bernoulli( { p: 0.6 } ) // create distribution object
myDist.support() // the support of the distribution
```

Note that the support of many distributions will be a continuum, like a `Uniform({a, b})`

distribution which is defined over the bounds `a`

and `b`

.

Another common computation relating to probability distributions comes when you have a sample from distribution and want to know how probable it was to get that sample.
This is sometimes called “scoring a sample”, and it can be accessed by calling `myDist.score(mySample)`

, where `myDist`

is a distribution object and `mySample`

is a value.
Note that for the score of sample to be defined; that is, the sample must be an element of the support of the distribution.
WebPPL returns not the probability of `mySample`

under the distribution `myDist`

, but the natural logarithm of the probability, or the log-probability.
To recover the probability, use the javascript function `Math.exp()`

.

```
// the log-probability of true (e.g., "heads")
Bernoulli( { p : 0.9 } ).score(true)
```

Exercises:

- Try changing the parameter value of the Bernoulli distribution in the first code chunk (for the support). Does the result change? Why or why not?
- Try changing the parameter value of the Bernoulli distribution in the second code chunk (for the score). Does the result change? Why or why not?
- Modify the second code chunk to return the probability of
`true`

(rather than the log-probability). What is the relation between the probability of`true`

and the parameter to the Bernoulli distribution?

## Bayesian Inference in WebPPL

#### Warm-up: Constructing distributions with Infer

Sampling functions and explicit representations of probability distributions are the basic building blocks of a probabilistic programs.
These constructs can be thought of as ways of generating data. For example, if you wanted to generate a sequence of flips of a coin, you could use `repeat(15, flip)`

, like we saw above.

```
repeat(15, flip)
```

When you run the code-box above, the program returns to you one sample: a sample from the probability distribution defined by the program (a sequence of 15 flips).
What if we wanted more samples, to better understand the distribution? For example, how probable is it to get 14 heads in a series of 15 flips?
We could wrap the above code into a function, and call `repeat`

on that.

```
var repeat15flips = function(){ repeat(15, flip) }
repeat(3, repeat15flips)
```

As you might imagine, it’s going to be hard to get any intuition for this distribution by looking at many samples, because each sample is a list.
But we may not need to keep around the whole list for each sample.
If we want to know how probable it is to get 14 heads in a series of 15 flips, then all we need to keep around from each sample is the number of heads (rather than the full sequence).
To get the number of heads, we can call `sum()`

on the list of boolean values returned by `repeat15flips`

.

```
var sumRepeat15flips = function(){ sum(repeat(15, flip)) }
repeat(3, sumRepeat15flips)
```

Exercise:Try running 1000 samples, and wrap the output in a`viz`

`.

Presently, the data which results from the `repeat`

function call is represented as a list, but the procedure implicitly defines a probability distribution.
We can reify the sampling function into a probability distribution using the built-in function `Infer()`

.

```
var sumRepeat15flips = function(){ sum(repeat(15, flip)) }
Infer(sumRepeat15flips)
```

Turning the sampling function into a probability distribution has the advantage of being able to access properties of the distribution (like the score, support, and sampling; as described above).
Think of `Infer`

as a constructor, which takes as input a sampling function (no matter how complex) and returns a distribution object.
In this way, probabilistic programming languages can powerfully create distributions comprised of distributions in a fully generative way.

Exercise:: Using the newly constructed distribution object from`Infer`

, compute the probability of getting 14 heads from a series of 15 flips.

#### Bayesian inference

In science and life, we are often not interested in the question of how probable some data is but rather in the question of how probable a certain *hypothesis* may be given some data we have observed.
For example, imagine that you observe 14 heads from a series of 15 flips of a coin, but you’re actually uncertain about the weight of the coin (i.e., you have reason to believe the coin may be biased towards landing on heads or tails, but you don’t know which one and you don’t know how biased it may be).
Given the observed data (14 heads out of 15 flips), you can use Bayesian inference to make judgment about the likely weight of the coin.

To do this in a WebPPL program, we need to make two changes to the code above: We need to specify the **prior distribution** over the parameters (in this case, the prior distribution over coin weights) and we need to tell WebPPL about the data we observed, e.g., using the `observe`

function.

```
var model = function(){
var coin_weight = uniform(0, 1)
var sumRepeat15flips = function(){ sum(repeat(15, function() {flip(coin_weight)})) }
var myDist = Infer(sumRepeat15flips)
observe(myDist, 14)
return coin_weight
}
Infer(model)
```

Unfortunately, this will take a very long time to run.
Instead, we’ll take advantage of the fact that `Infer(sumRepeat15flips)`

is a known probability distribution. It is the Binomial distribution.

```
var model = function(){
var coin_weight = uniform(0, 1)
observe(Binomial({n: 15, p: coin_weight}), 14)
return coin_weight
}
Infer(model)
```

#### Observe, condition, and factor: distilled

The helper functions `condition`

, `observe`

, and `factor`

all have the same underlying purpose: Changing the probability of different program executions. For Bayesian data analysis, we want to do this in a way that computes the posterior distribution.

Imagine running a model function a single time.
In some lines of the model code, the program makes *random choices* (e.g., flipping a coin and it landing on heads, or tails).
The collection of all the random choices in an execution of every line of a program is referred to as the *program execution*.

Different random choices may have different (prior) probabilities (or perhaps, you have uninformed priors on all of the parameters, and then they each have equal probability).
What `observe`

, `condition`

, and `factor`

do is change the probabilities of these different random choices.
For Bayesian data analysis, we use these terms to change the probabilities of these random choices to align with the true posterior probabilities.
For BDA, this is usually achived using `observe`

.

`factor`

is the most primitive of the three, and `observe`

and `condition`

are both special cases of `factor`

.
`factor`

directly re-weights the log-probability of program executions, and it takes in a single numerical argument (how much to re-weight the log-probabilities).
`observe`

is a special case where you want to re-weight the probabilities by the probability of the observed data under some distribution. `observe`

thus takes in two arguments: a distribution and an observed data point.

`condition`

is a special case where you want to completely reject or rule out certain program executions.
`condition`

takes in a single *boolean* argum

Here is a summary of the three statements.

```
factor(val)
observe(Dist, val) === factor(Dist.score(val))
=== condition(sample(Dist) == val)
condition(bool) === factor(bool ? 0 : -Infinity)
```

Finally, here are three equivalent models for inferring a coin’s bias, one using `factor`

, one using `observe`

and one using `condition`

:

```
// model with observe
var model_observe = function(){
var coin_weight = uniform(0, 1)
observe(Binomial({n: 15, p: coin_weight}), 14)
return coin_weight
}
viz(Infer({model: model_observe, method: "rejection", samples: 5000}))
// model with condition
var model_condition = function(){
var coin_weight = uniform(0, 1)
var number_of_heads = binomial({n: 15, p: coin_weight})
condition(number_of_heads == 14)
return coin_weight
}
viz(Infer({model: model_condition, method: "rejection", samples: 5000}))
// model with factor
var model_factor = function(){
var coin_weight = uniform(0, 1)
factor(Binomial({n: 15, p: coin_weight}).score(14))
return coin_weight
}
viz(Infer({model: model_factor, method: "rejection", samples: 5000}))
```

Table of Contents