Boolean Expression Evaluation Test Manifold¶
Overview¶
The Boolean Expression Evaluation test manifold assesses a model's ability to correctly parse and evaluate boolean logic expressions with varying complexity, including parentheses, operator precedence, negation, and nested structures. This task tests fundamental logical reasoning, symbolic manipulation, and the ability to follow boolean operator precedence rules in computational contexts.
Task Description¶
Models are presented with boolean expressions containing True/False values, logical operators (and, or, ^, not), and parentheses with varying levels of nesting. The task requires accurate evaluation following standard boolean operator precedence rules, where "not" has highest precedence, followed by "and", then "or", with parentheses overriding default precedence.
Key Features:
- Variable complexity: Expressions range from simple chains to deeply nested parenthetical structures
- Operator precedence: Tests understanding of boolean logic order of operations (not > and > or)
- XOR operations: Includes exclusive-or (^) with special syntax restrictions
- Negation handling: Multiple levels of "not" operators with careful precedence management
- Format variations: Expressions can be rendered in different boolean notations
- Whitespace variation: Expressions may have inconsistent spacing to test robust parsing
Difficulty Progression and Complexity Factors¶
Number of terms (length):
- 3-5 terms: Basic precedence and simple logical operations
- 6-10 terms: Moderate complexity with multiple operators and negations
- 10+ terms: Extended expressions requiring sustained logical reasoning
Nesting Depth (max_depth):
- Depth 0: Linear expressions testing operator precedence only
- Depth 1: Single parenthetical level overriding precedence
- Depth 2+: Nested structures requiring recursive logical evaluation
Negation Probability (prob_not, prob_not_after_not):
- Controls frequency of "not" operators in expressions
- Separate probability for chaining "not" operators (double/triple negation)
- Higher values create more complex logical transformations
Boolean Format Variations:
- TRUE_FALSE: Standard Python format (True/False, and/or/not)
- T_F: Abbreviated format (T/F, &/|/~)
- ON_OFF: Hardware-style format (ON/OFF, AND/OR/NOT)
- BINARY: Programming format (1/0, &&/||/!)
- YES_NO: Natural language format (YES/NO, and/or/not)
XOR Operator Restrictions:
- The XOR operator (^) has special syntax constraints in Python
- Cannot be immediately followed by "not" due to parsing ambiguity
- Generator enforces separate AFTER_XOR state to prevent syntax errors
- This restriction ensures all expressions can be evaluated with Python's eval()
Whitespace Randomization (prob_dewhitespace):
- Probability 0.0-1.0 of removing spaces creates tokenization challenges
- Tests model robustness to formatting variations
Test Case Generation¶
Algorithm Overview¶
The generator uses a state machine approach to create syntactically valid boolean expressions:
- State-Driven Construction: Uses finite state machine with six states (START, AFTER_BOOLEAN, AFTER_OPERATOR, AFTER_XOR, AFTER_NOT, AFTER_OPEN_PAREN) to ensure logical validity
- Probabilistic Structure: Randomly adds parentheses, operators, and negations based on configurable probabilities
- XOR Syntax Safety: Special handling for XOR operator to prevent "^ not" sequences that cause Python syntax errors
- Weighted Action Selection: Uses probability-based weighting to bias toward expression completion
- Format Transformation: Post-processing to convert between different boolean notation systems
Expression Components¶
Boolean Values:
- True/False (or format-specific equivalents)
- Randomly selected for each term position
Logical Operators:
- AND (and, &, AND, &&)
- OR (or, |, OR, ||)
- XOR (^, XOR, !=, xor)
- NOT (not, ~, NOT, !)
Parentheses:
- Configurable nesting depth (max_depth parameter)
- Probabilistic insertion based on current state and depth limits
- Automatic balancing ensures syntactic correctness
Special XOR Handling:
- XOR operator (^) cannot be followed directly by NOT in Python
- Generator maintains separate AFTER_XOR state to prevent invalid syntax
- This constraint ensures all generated expressions are evaluable
Configuration Parameters¶
Generation Schema (BooleanGenerationParams
)¶
class BooleanGenerationParams(BaseModel):
count: int # Number of test cases to generate (> 0)
length: int # Number of boolean terms (≥ 3)
max_depth: int # Maximum bracket nesting depth (≥ 1)
prob_open: float # Probability of adding open parenthesis (0-1, default: 0.4)
prob_close: float # Probability of adding close parenthesis (0-1, default: 0.5)
prob_not: float # Probability of adding not operator (0-1, default: 0.8)
prob_not_after_not: float # Probability of chaining not operators (0-1, default: 0.5)
prob_dewhitespace: float # Probability of removing whitespace (0-1, default: 0)
boolean_format: BooleanFormat # Output format for boolean values and operators
Boolean Format Options:
TRUE_FALSE
(0): True/False, and/or/not/^T_F
(1): T/F, &/|/~/^ON_OFF
(2): ON/OFF, AND/OR/NOT/XORBINARY
(3): 1/0, &&/||/!/!=YES_NO
(4): YES/NO, and/or/not/xor
Result Schema (BooleanTestCaseResult
)¶
class BooleanTestCaseResult(BaseModel):
input: str # The boolean expression to evaluate
target: str # Correct boolean result ("True" or "False")
Example Test Cases¶
Simple Precedence (length=3, max_depth=0)¶
Input: "True and False or True"
Target: "True"
Format: TRUE_FALSE
Negation Precedence (length=3, max_depth=1)¶
Input: "not True and False"
Target: "False"
Format: TRUE_FALSE
Nested Parentheses (length=4, max_depth=2)¶
Input: "not ( ( not not True ) )"
Target: "False"
Format: TRUE_FALSE
XOR Operations (length=3, max_depth=0)¶
Input: "True ^ False ^ True"
Target: "False"
Format: TRUE_FALSE
Alternative Format - Binary (length=4, max_depth=1)¶
Input: "!(1&&0)||1"
Target: "True"
Format: BINARY
Alternative Format - Hardware (length=3, max_depth=1)¶
Input: "NOT ( ON AND OFF )"
Target: "True"
Format: ON_OFF
Complex Nested Structure (length=6, max_depth=3)¶
Input: "not ( True and ( False or not True ) )"
Target: "True"
Format: TRUE_FALSE
Whitespace Variation (length=4, max_depth=1, prob_dewhitespace=1.0)¶
Input: "True&&(False||True)"
Target: "True"
Format: BINARY
Cognitive Skills Tested¶
Core Competencies¶
- Truth Table Knowledge: Understanding fundamental AND, OR, XOR, NOT operations
- Symbolic Parsing: Correctly interpreting logical notation and operator symbols
- Associativity: Proper left-to-right evaluation of same-precedence operators
- Precedence Rules: Applying boolean operator precedence (not > and > or) consistently
- Sequential Processing: Evaluating expressions step-by-step in correct logical order
Advanced Skills¶
- Negation Handling: Managing multiple levels of logical negation
- Format Recognition: Parsing different boolean notation systems
- Working Memory: Tracking intermediate boolean values through complex nested evaluations
- Short-Circuit Logic: Potential optimization recognition in evaluation paths
Special Implementation Notes¶
XOR Operator Constraints¶
The XOR operator (^) requires special handling due to Python syntax limitations:
- Restriction: Cannot be immediately followed by "not" operator
- Reason: "^ not" creates parsing ambiguity in Python's grammar
- Solution: Generator uses separate AFTER_XOR state to prevent invalid combinations
- Impact: Ensures all generated expressions are evaluable with Python's eval() function
This constraint reflects real-world programming considerations where certain operator combinations have syntactic limitations.
Applications¶
This test manifold evaluates capabilities essential for:
- Logical Programming: Foundation for conditional logic and boolean algebra
- Circuit Design: Understanding digital logic gate operations and combinations
- Database Queries: Boolean conditions in WHERE clauses and filtering operations
- Search Logic: Complex query construction with AND/OR/NOT operations
- Formal Verification: Logical reasoning in software and hardware verification
- Decision Trees: Boolean condition evaluation in algorithmic decision making
- Programming Logic: Control flow conditions and boolean expression evaluation