Boolean Expression Evaluation Test Manifold¶

Overview¶

The Boolean Expression Evaluation test manifold assesses a model's ability to correctly parse and evaluate boolean logic expressions with varying complexity, including parentheses, operator precedence, negation, and nested structures. This task tests fundamental logical reasoning, symbolic manipulation, and the ability to follow boolean operator precedence rules in computational contexts.

Task Description¶

Models are presented with boolean expressions containing True/False values, logical operators (and, or, ^, not), and parentheses with varying levels of nesting. The task requires accurate evaluation following standard boolean operator precedence rules, where "not" has highest precedence, followed by "and", then "or", with parentheses overriding default precedence.

Key Features:

Variable complexity: Expressions range from simple chains to deeply nested parenthetical structures
Operator precedence: Tests understanding of boolean logic order of operations (not > and > or)
XOR operations: Includes exclusive-or (^) with special syntax restrictions
Negation handling: Multiple levels of "not" operators with careful precedence management
Format variations: Expressions can be rendered in different boolean notations
Whitespace variation: Expressions may have inconsistent spacing to test robust parsing

Difficulty Progression and Complexity Factors¶

Number of terms (length):

3-5 terms: Basic precedence and simple logical operations
6-10 terms: Moderate complexity with multiple operators and negations
10+ terms: Extended expressions requiring sustained logical reasoning

Nesting Depth (max_depth):

Depth 0: Linear expressions testing operator precedence only
Depth 1: Single parenthetical level overriding precedence
Depth 2+: Nested structures requiring recursive logical evaluation

Negation Probability (prob_not, prob_not_after_not):

Controls frequency of "not" operators in expressions
Separate probability for chaining "not" operators (double/triple negation)
Higher values create more complex logical transformations

Boolean Format Variations:

TRUE_FALSE: Standard Python format (True/False, and/or/not)
T_F: Abbreviated format (T/F, &/|/~)
ON_OFF: Hardware-style format (ON/OFF, AND/OR/NOT)
BINARY: Programming format (1/0, &&/||/!)
YES_NO: Natural language format (YES/NO, and/or/not)

XOR Operator Restrictions:

The XOR operator (^) has special syntax constraints in Python
Cannot be immediately followed by "not" due to parsing ambiguity
Generator enforces separate AFTER_XOR state to prevent syntax errors
This restriction ensures all expressions can be evaluated with Python's eval()

Whitespace Randomization (prob_dewhitespace):

Probability 0.0-1.0 of removing spaces creates tokenization challenges
Tests model robustness to formatting variations

Test Case Generation¶

Algorithm Overview¶

The generator uses a state machine approach to create syntactically valid boolean expressions:

State-Driven Construction: Uses finite state machine with six states (START, AFTER_BOOLEAN, AFTER_OPERATOR, AFTER_XOR, AFTER_NOT, AFTER_OPEN_PAREN) to ensure logical validity
Probabilistic Structure: Randomly adds parentheses, operators, and negations based on configurable probabilities
XOR Syntax Safety: Special handling for XOR operator to prevent "^ not" sequences that cause Python syntax errors
Weighted Action Selection: Uses probability-based weighting to bias toward expression completion
Format Transformation: Post-processing to convert between different boolean notation systems

Expression Components¶

Boolean Values:

True/False (or format-specific equivalents)
Randomly selected for each term position

Logical Operators:

AND (and, &, AND, &&)
OR (or, |, OR, ||)
XOR (^, XOR, !=, xor)
NOT (not, ~, NOT, !)

Parentheses:

Configurable nesting depth (max_depth parameter)
Probabilistic insertion based on current state and depth limits
Automatic balancing ensures syntactic correctness

Special XOR Handling:

XOR operator (^) cannot be followed directly by NOT in Python
Generator maintains separate AFTER_XOR state to prevent invalid syntax
This constraint ensures all generated expressions are evaluable

Configuration Parameters¶

Generation Schema (`BooleanGenerationParams`)¶

class BooleanGenerationParams(BaseModel):
    count: int                           # Number of test cases to generate (> 0)
    length: int                          # Number of boolean terms (≥ 3)
    max_depth: int                       # Maximum bracket nesting depth (≥ 1)
    prob_open: float                     # Probability of adding open parenthesis (0-1, default: 0.4)
    prob_close: float                    # Probability of adding close parenthesis (0-1, default: 0.5)
    prob_not: float                      # Probability of adding not operator (0-1, default: 0.8)
    prob_not_after_not: float            # Probability of chaining not operators (0-1, default: 0.5)
    prob_dewhitespace: float             # Probability of removing whitespace (0-1, default: 0)
    boolean_format: BooleanFormat        # Output format for boolean values and operators

Boolean Format Options:

TRUE_FALSE (0): True/False, and/or/not/^
T_F (1): T/F, &/|/~/^
ON_OFF (2): ON/OFF, AND/OR/NOT/XOR
BINARY (3): 1/0, &&/||/!/!=
YES_NO (4): YES/NO, and/or/not/xor

Result Schema (`BooleanTestCaseResult`)¶

class BooleanTestCaseResult(BaseModel):
    input: str                          # The boolean expression to evaluate
    target: str                         # Correct boolean result ("True" or "False")

Example Test Cases¶

Simple Precedence (length=3, max_depth=0)¶

Input: "True and False or True"
Target: "True"
Format: TRUE_FALSE

Analysis: Tests basic operator precedence (and before or): (True and False) or True = False or True = True.

Negation Precedence (length=3, max_depth=1)¶

Input: "not True and False"
Target: "False"
Format: TRUE_FALSE

Analysis: NOT has highest precedence: (not True) and False = False and False = False.

Nested Parentheses (length=4, max_depth=2)¶

Input: "not ( ( not not True ) )"
Target: "False"
Format: TRUE_FALSE

Analysis: Multiple negation levels requiring careful evaluation: not((not(not True))) = not(not False) = not True = False.

XOR Operations (length=3, max_depth=0)¶

Input: "True ^ False ^ True"
Target: "False"
Format: TRUE_FALSE

Analysis: XOR is left-associative: (True ^ False) ^ True = True ^ True = False.

Alternative Format - Binary (length=4, max_depth=1)¶

Input: "!(1&&0)||1"
Target: "True"
Format: BINARY

Analysis: Binary notation with C-style operators: !(1 && 0) || 1 = !False || True = True || True = True.

Alternative Format - Hardware (length=3, max_depth=1)¶

Input: "NOT ( ON AND OFF )"
Target: "True"  
Format: ON_OFF

Analysis: Hardware-style boolean notation: NOT(ON AND OFF) = NOT(False) = True.

Complex Nested Structure (length=6, max_depth=3)¶

Input: "not ( True and ( False or not True ) )"
Target: "True"
Format: TRUE_FALSE

Analysis: Multiple precedence levels: not(True and (False or (not True))) = not(True and (False or False)) = not(True and False) = not False = True.

Whitespace Variation (length=4, max_depth=1, prob_dewhitespace=1.0)¶

Input: "True&&(False||True)"
Target: "True"
Format: BINARY

Analysis: No spaces between operators tests parsing robustness in binary format.

Cognitive Skills Tested¶

Core Competencies¶

Truth Table Knowledge: Understanding fundamental AND, OR, XOR, NOT operations
Symbolic Parsing: Correctly interpreting logical notation and operator symbols
Associativity: Proper left-to-right evaluation of same-precedence operators
Precedence Rules: Applying boolean operator precedence (not > and > or) consistently
Sequential Processing: Evaluating expressions step-by-step in correct logical order

Advanced Skills¶

Negation Handling: Managing multiple levels of logical negation
Format Recognition: Parsing different boolean notation systems
Working Memory: Tracking intermediate boolean values through complex nested evaluations
Short-Circuit Logic: Potential optimization recognition in evaluation paths

Special Implementation Notes¶

XOR Operator Constraints¶

The XOR operator (^) requires special handling due to Python syntax limitations:

Restriction: Cannot be immediately followed by "not" operator
Reason: "^ not" creates parsing ambiguity in Python's grammar
Solution: Generator uses separate AFTER_XOR state to prevent invalid combinations
Impact: Ensures all generated expressions are evaluable with Python's eval() function

This constraint reflects real-world programming considerations where certain operator combinations have syntactic limitations.

Applications¶

This test manifold evaluates capabilities essential for:

Logical Programming: Foundation for conditional logic and boolean algebra
Circuit Design: Understanding digital logic gate operations and combinations
Database Queries: Boolean conditions in WHERE clauses and filtering operations
Search Logic: Complex query construction with AND/OR/NOT operations
Formal Verification: Logical reasoning in software and hardware verification
Decision Trees: Boolean condition evaluation in algorithmic decision making
Programming Logic: Control flow conditions and boolean expression evaluation