# Synthetic problem generation

Synthetic problem generation lets you quickly produce random grammars and corresponding target individuals to benchmark search behavior without hand-crafting domain grammars.

This is useful for stress-testing operators, budgets, and configuration choices under controlled complexity.

## API

```{eval-rst}
.. autofunction:: geneticengine.grammar.synthetic_grammar.create_arbitrary_grammar
```

### Parameters (summary)
- **seed**: Controls reproducibility of the generated grammar.
- **non_terminals_count**: Number of abstract non-terminals to create.
- **recursive_non_terminals_count**: How many of the last non-terminals are allowed to be recursive.
- **productions_per_non_terminal(rd)**: Callable returning how many productions each non-terminal has (called per non-terminal with a `random.Random`).
- **non_terminals_per_production(rd)**: Callable returning the arity (number of fields) per production (called per production with a `random.Random`).
- **base_types**: Set of terminal/base field types to allow as leaves (defaults to `{int, bool}`).

The function returns `(nodes, root)` where `nodes` are the dynamically created classes (non-terminals and productions) and `root` is the designated root non-terminal.

## Minimal example

```python
from geneticengine.grammar.grammar import extract_grammar
from geneticengine.grammar.synthetic_grammar import create_arbitrary_grammar
from geneticengine.random.sources import NativeRandomSource
from geneticengine.representations.tree.initializations import MaxDepthDecider
from geneticengine.representations.tree.treebased import TreeBasedRepresentation

# 1) Generate a random grammar
nodes, root = create_arbitrary_grammar(
    seed=0,
    non_terminals_count=3,
    recursive_non_terminals_count=2,
    productions_per_non_terminal=lambda rd: 2,
    non_terminals_per_production=lambda rd: 1,
)

# 2) Build a Grammar object
G = extract_grammar(nodes, root)

# 3) Create a random target individual (phenotype) from this grammar
r = NativeRandomSource(0)
rep = TreeBasedRepresentation(G, decider=MaxDepthDecider(r, G, G.get_min_tree_depth()))
G_target = rep.create_genotype(r, depth=8)
P_target = rep.genotype_to_phenotype(G_target)
print(P_target)
```

## Using with Genetic Programming

A common pattern is to generate a target individual and then define a distance-based fitness to measure how close candidates are to that target. For example, using string distance over the stringified phenotypes:

```python
from polyleven import levenshtein
from geneticengine.algorithms.gp.gp import GeneticProgramming
from geneticengine.evaluation.budget import EvaluationBudget
from geneticengine.problems import SingleObjectiveProblem

# Assume G and P_target are created as above

def fitness_function(p):
    return levenshtein(str(p), str(P_target))

problem = SingleObjectiveProblem(
    fitness_function=fitness_function,
    minimize=True,
    target=0,
)

r = NativeRandomSource(0)
alg = GeneticProgramming(
    problem=problem,
    budget=EvaluationBudget(100),
    representation=TreeBasedRepresentation(G, decider=MaxDepthDecider(r, G, G.get_min_tree_depth() + 10)),
    population_size=10,
    random=r,
)
ind = alg.search()[0]
print(ind, ind.get_fitness(problem))
```

## Complexity control tips
- **Grammar size**: Increase `non_terminals_count` and/or `productions_per_non_terminal` to grow the search space.
- **Depth/recursion**: Control recursion potential with `recursive_non_terminals_count`, and control tree growth with `MaxDepthDecider` and the depth you pass to `create_genotype`.
- **Arity**: Use `non_terminals_per_production` to increase/decrease branching factor per production.
- **Leaf variety**: Adjust `base_types` to change the terminal set (e.g., `{int, bool, float}`).

## End-to-end runnable example
See `examples/synthetic_grammar_example.py` for a complete script with CLI flags to vary grammar size, recursion, and arity.