Dream Coder - Algorithm 1

Support/Figures/Pasted image 20241210232148.png

  1. Initialize recognition model parameters:
    We initialize the parameters of the recognition model, denoted as θR|D|+1, randomly from a uniform distribution. This parameter governs how likely a program is based on its features.

  2. Initialize beams for each task:
    For each task xX, the beam Bx (a prioritized list of candidate programs) is assigned the empty set .

  3. Epoch loop:
    We enter an infinite loop over epochs. This allows the algorithm to continually refine its outputs, and it can be terminated at any point if desired.

  4. Random task permutation:
    The variable shuffle is assigned a random permutation of the tasks X to introduce stochasticity and avoid deterministic behavior.

  5. Minibatch loop:
    While shuffle is not empty, the algorithm loops over mini-batches of tasks, ensuring that tasks are processed in manageable subsets.

  6. Batch selection:
    A batch is extracted from shuffle, consisting of the first B tasks. This batch will be processed together during the "wake" phase.

  7. Update shuffle:
    The first B elements are removed from shuffle, preparing it for the next iteration.

  8. Enumeration (wake phase):
    The function enumerate is an abstract routine that takes:

    • A program distribution (e.g., P(|D,θ)) that assigns probabilities to programs,
    • A timeout T that limits the computation,
      and returns programs in approximately decreasing order of their probability.
      For each task x, the beam Bx is updated by adding programs ρ if the conditional probability P[x|ρ]>0, meaning ρ successfully solves the task.
  9. Training the recognition model (Dream Sleep phase):
    The recognition model Q(|) is trained to minimize the loss LMAP, which is the negative log-likelihood of the programs in the beams Bx:xX. This phase updates the recognition model to better predict programs for tasks based on the current beams.

  10. Re-enumeration (Wake phase):
    After updating the recognition model, we repeat the enumeration step for each task x, this time using the updated recognition model Q(). Specifically, for each task, we update the beam Bx by adding programs ρ sampled from the distribution Q(ρ|x), subject to the condition P[x|ρ]>0. This step integrates learning from the recognition model into the beams.

  11. Prune beams:
    For each task x, we prune the beam Bx to keep only the top M programs, as measured by the joint probability P[x,D,θ,ρ]. This ensures that the beam remains computationally manageable by focusing only on the most promising programs.

  12. Abstraction Sleep phase:
    The ABSTRACTION function takes as input the current library D, θ, and beams BxxX. It identifies useful subprograms across tasks and adds them to the library D, allowing the system to "abstract" common patterns into reusable components. This phase improves the efficiency and expressiveness of the library over time.

  13. Yield updated components:
    The updated library D, recognition model Q(), parameters θ, and the beams BxxX are yielded back to the user or tasks. This allows for external inspection or use of the current state of the system.

  14. Continue looping:
    The process repeats for the next epoch, incorporating new discoveries, improved recognition models, and a refined library.