Dream Coder - Algorithm 3

Let us say you have a sample space S and s is an event in S that occurs with probability P(s)=12. so if we observe the event s, we are effective cutting the sample space in half. So this event carries 1 bit of information.
Similarly, an event p with probability P(p)=14 carries two bits of information, because observing it effectively cuts the sample space into a quarter.

So less likely events carry more information when they occur. in general, if xS and P:S[0,1] is a probability function, I(x)=log2(P(x)).

Description of the Enumerate Algorithm

Setup and Notation

Let S denote the sample space of all possible programs, and let μ:S[0,1] be a probability distribution over programs, such that μ(x) represents the probability of program xS. The information content of a program x is defined as:

This measures the amount of information carried by observing x, and less probable events carry more information. For example:

The goal of the enumerate procedure is to generate programs in approximately decreasing order of probability (or equivalently, increasing order of information) under μ. The algorithm works in two phases: a Heap Search Phase followed by a Parallel DFS Phase.


Heap Search Phase

Initialization

  1. Define a lower bound lowerBound=0 and an upper bound upperBound=Δ, where Δ>0 is a hyperparameter controlling the granularity of the search.

  2. Initialize a max-heap, where programs are ordered by their priority, defined as priority(x)=log2(μ(x)) (equivalently, higher probabilities are prioritized).

  3. Insert the initial empty program (or syntax tree) into the heap with priority 0.

Iterative Heap Processing

While the heap size is manageable (specifically, |heap|10×CPUs):

  1. Pop a program ρ from the heap:

    • If ρ is complete (a fully-specified program):

      • Check whether lowerBoundI(ρ)upperBound. If true, yield ρ.
    • Otherwise, compute the set of children children(ρ), where each child cchildren(ρ) fills in the next "hole" in ρ's syntax tree.

  2. For each child c:

    • Compute its information I(c)=log2(μ(c)).

    • If I(c)upperBound (i.e., c is not too improbable), insert c into the heap with priority log2(μ(c)).

If the heap grows too large (|heap|>10×CPUs), transition to the Parallel DFS Phase.


Parallel DFS Phase

Task Distribution

  1. Distribute up to 10×CPUs programs from the heap to the CPUs for processing. Each CPU receives approximately 10 programs (partial programs) as independent tasks.

  2. Set the initial search bounds for the DFS to lowerBound=0 and upperBound=Δ.

Per-CPU DFS Procedure

Each CPU processes its assigned tasks sequentially. For each partial program ρ:

  1. Perform a depth-first search (DFS) to explore all completions of ρ.

  2. When a completion c is found:

    • Check if lowerBoundI(c)upperBound.

    • If true, yield c.

  3. For incomplete programs:

    • Compute their children children(ρ).

    • For each child c:

      • Compute its information I(c).

      • If I(c)upperBound, push c onto the local DFS stack for further exploration.

Each CPU finishes all of its 10 assigned jobs sequentially but operates in parallel with other CPUs.

Lower Bound Adjustment

Once all CPUs complete their assigned jobs:

  1. Increase the lower bound by Δ, i.e., set lowerBoundlowerBound+Δ.

  2. Adjust the upper bound accordingly, i.e., upperBoundupperBound+Δ.

  3. Reuse the same set of programs from the heap for the next DFS phase, but now search for completions with higher information (i.e., less probable programs).


Summary of the Algorithm

  1. The algorithm starts with a Heap Search Phase, which uses a max-heap to explore programs with information I(ρ)Δ.

  2. When the heap grows too large, the algorithm transitions to a Parallel DFS Phase:

    • Each CPU explores its assigned programs in parallel, searching for completions within the current information bounds [lowerBound,upperBound].

    • The bounds are incrementally shifted by Δ after each round, allowing the algorithm to progressively explore less probable programs.

This approach efficiently enumerates programs in decreasing order of probability while balancing parallelism and memory constraints.

A note on (MDL) or minimum description length:

Basically, the information of a program is an estimate of its MDL, that is what the upper bound is for, highly informative programs are longer and are lower probability.
Support/Figures/Pasted image 20241212220318.png
Support/Figures/Pasted image 20241212220349.png