Synthetic problem generation
Synthetic problem generation lets you quickly produce random grammars and corresponding target individuals to benchmark search behavior without hand-crafting domain grammars.
This is useful for stress-testing operators, budgets, and configuration choices under controlled complexity.
API
Parameters (summary)
seed: Controls reproducibility of the generated grammar.
non_terminals_count: Number of abstract non-terminals to create.
recursive_non_terminals_count: How many of the last non-terminals are allowed to be recursive.
productions_per_non_terminal(rd): Callable returning how many productions each non-terminal has (called per non-terminal with a
random.Random).non_terminals_per_production(rd): Callable returning the arity (number of fields) per production (called per production with a
random.Random).base_types: Set of terminal/base field types to allow as leaves (defaults to
{int, bool}).
The function returns (nodes, root) where nodes are the dynamically created classes (non-terminals and productions) and root is the designated root non-terminal.
Minimal example
from geneticengine.grammar.grammar import extract_grammar
from geneticengine.grammar.synthetic_grammar import create_arbitrary_grammar
from geneticengine.random.sources import NativeRandomSource
from geneticengine.representations.tree.initializations import MaxDepthDecider
from geneticengine.representations.tree.treebased import TreeBasedRepresentation
# 1) Generate a random grammar
nodes, root = create_arbitrary_grammar(
seed=0,
non_terminals_count=3,
recursive_non_terminals_count=2,
productions_per_non_terminal=lambda rd: 2,
non_terminals_per_production=lambda rd: 1,
)
# 2) Build a Grammar object
G = extract_grammar(nodes, root)
# 3) Create a random target individual (phenotype) from this grammar
r = NativeRandomSource(0)
rep = TreeBasedRepresentation(G, decider=MaxDepthDecider(r, G, G.get_min_tree_depth()))
G_target = rep.create_genotype(r, depth=8)
P_target = rep.genotype_to_phenotype(G_target)
print(P_target)
Using with Genetic Programming
A common pattern is to generate a target individual and then define a distance-based fitness to measure how close candidates are to that target. For example, using string distance over the stringified phenotypes:
from polyleven import levenshtein
from geneticengine.algorithms.gp.gp import GeneticProgramming
from geneticengine.evaluation.budget import EvaluationBudget
from geneticengine.problems import SingleObjectiveProblem
# Assume G and P_target are created as above
def fitness_function(p):
return levenshtein(str(p), str(P_target))
problem = SingleObjectiveProblem(
fitness_function=fitness_function,
minimize=True,
target=0,
)
r = NativeRandomSource(0)
alg = GeneticProgramming(
problem=problem,
budget=EvaluationBudget(100),
representation=TreeBasedRepresentation(G, decider=MaxDepthDecider(r, G, G.get_min_tree_depth() + 10)),
population_size=10,
random=r,
)
ind = alg.search()[0]
print(ind, ind.get_fitness(problem))
Complexity control tips
Grammar size: Increase
non_terminals_countand/orproductions_per_non_terminalto grow the search space.Depth/recursion: Control recursion potential with
recursive_non_terminals_count, and control tree growth withMaxDepthDeciderand the depth you pass tocreate_genotype.Arity: Use
non_terminals_per_productionto increase/decrease branching factor per production.Leaf variety: Adjust
base_typesto change the terminal set (e.g.,{int, bool, float}).
End-to-end runnable example
See examples/synthetic_grammar_example.py for a complete script with CLI flags to vary grammar size, recursion, and arity.