Dream Coder - Algorithm 1

Support/Figures/Pasted image 20241210232148.png

Initialize recognition model parameters:
We initialize the parameters of the recognition model, denoted as $θ \in R^{| D | + 1}$ , randomly from a uniform distribution. This parameter governs how likely a program is based on its features.
Initialize beams for each task:
For each task $x \in X$ , the beam $B_{x}$ (a prioritized list of candidate programs) is assigned the empty set $\emptyset$ .
Epoch loop:
We enter an infinite loop over epochs. This allows the algorithm to continually refine its outputs, and it can be terminated at any point if desired.
Random task permutation:
The variable shuffle is assigned a random permutation of the tasks $X$ to introduce stochasticity and avoid deterministic behavior.
Minibatch loop:
While shuffle is not empty, the algorithm loops over mini-batches of tasks, ensuring that tasks are processed in manageable subsets.
Batch selection:
A batch is extracted from shuffle, consisting of the first $B$ tasks. This batch will be processed together during the "wake" phase.
Update shuffle:
The first $B$ elements are removed from shuffle, preparing it for the next iteration.
Enumeration (wake phase):
The function enumerate is an abstract routine that takes:
- A program distribution (e.g., $P (\cdot | D, θ)$ ) that assigns probabilities to programs,
- A timeout $T$ that limits the computation,
  and returns programs in approximately decreasing order of their probability.
  For each task $x$ , the beam $B_{x}$ is updated by adding programs $ρ$ if the conditional probability $P [x | ρ] > 0$ , meaning $ρ$ successfully solves the task.
Training the recognition model (Dream Sleep phase):
The recognition model $Q (\cdot | \cdot)$ is trained to minimize the loss $L_{MAP}$ , which is the negative log-likelihood of the programs in the beams $B_{x} : x \in X$ . This phase updates the recognition model to better predict programs for tasks based on the current beams.
Re-enumeration (Wake phase):
After updating the recognition model, we repeat the enumeration step for each task $x$ , this time using the updated recognition model $Q (\cdot)$ . Specifically, for each task, we update the beam $B_{x}$ by adding programs $ρ$ sampled from the distribution $Q (ρ | x)$ , subject to the condition $P [x | ρ] > 0$ . This step integrates learning from the recognition model into the beams.
Prune beams:
For each task $x$ , we prune the beam $B_{x}$ to keep only the top $M$ programs, as measured by the joint probability $P [x, D, θ, ρ]$ . This ensures that the beam remains computationally manageable by focusing only on the most promising programs.
Abstraction Sleep phase:
The ABSTRACTION function takes as input the current library $D$ , $θ$ , and beams ${B_{x}}_{x \in X}$ . It identifies useful subprograms across tasks and adds them to the library $D$ , allowing the system to "abstract" common patterns into reusable components. This phase improves the efficiency and expressiveness of the library over time.
Yield updated components:
The updated library $D$ , recognition model $Q (\cdot)$ , parameters $θ$ , and the beams ${B_{x}}_{x \in X}$ are yielded back to the user or tasks. This allows for external inspection or use of the current state of the system.
Continue looping:
The process repeats for the next epoch, incorporating new discoveries, improved recognition models, and a refined library.