Skip to content

Boolean Expression Evaluation Test Manifold

Overview

The Boolean Expression Evaluation test manifold assesses a model's ability to correctly parse and evaluate boolean logic expressions with varying complexity, including parentheses, operator precedence, negation, and nested structures. This task tests fundamental logical reasoning, symbolic manipulation, and the ability to follow boolean operator precedence rules in computational contexts.

Task Description

Models are presented with boolean expressions containing True/False values, logical operators (and, or, ^, not), and parentheses with varying levels of nesting. The task requires accurate evaluation following standard boolean operator precedence rules, where "not" has highest precedence, followed by "and", then "or", with parentheses overriding default precedence.

Key Features:

  • Variable complexity: Expressions range from simple chains to deeply nested parenthetical structures
  • Operator precedence: Tests understanding of boolean logic order of operations (not > and > or)
  • XOR operations: Includes exclusive-or (^) with special syntax restrictions
  • Negation handling: Multiple levels of "not" operators with careful precedence management
  • Format variations: Expressions can be rendered in different boolean notations
  • Whitespace variation: Expressions may have inconsistent spacing to test robust parsing

Difficulty Progression and Complexity Factors

Number of terms (length):

  • 3-5 terms: Basic precedence and simple logical operations
  • 6-10 terms: Moderate complexity with multiple operators and negations
  • 10+ terms: Extended expressions requiring sustained logical reasoning

Nesting Depth (max_depth):

  • Depth 0: Linear expressions testing operator precedence only
  • Depth 1: Single parenthetical level overriding precedence
  • Depth 2+: Nested structures requiring recursive logical evaluation

Negation Probability (prob_not, prob_not_after_not):

  • Controls frequency of "not" operators in expressions
  • Separate probability for chaining "not" operators (double/triple negation)
  • Higher values create more complex logical transformations

Boolean Format Variations:

  • TRUE_FALSE: Standard Python format (True/False, and/or/not)
  • T_F: Abbreviated format (T/F, &/|/~)
  • ON_OFF: Hardware-style format (ON/OFF, AND/OR/NOT)
  • BINARY: Programming format (1/0, &&/||/!)
  • YES_NO: Natural language format (YES/NO, and/or/not)

XOR Operator Restrictions:

  • The XOR operator (^) has special syntax constraints in Python
  • Cannot be immediately followed by "not" due to parsing ambiguity
  • Generator enforces separate AFTER_XOR state to prevent syntax errors
  • This restriction ensures all expressions can be evaluated with Python's eval()

Whitespace Randomization (prob_dewhitespace):

  • Probability 0.0-1.0 of removing spaces creates tokenization challenges
  • Tests model robustness to formatting variations

Test Case Generation

Algorithm Overview

The generator uses a state machine approach to create syntactically valid boolean expressions:

  1. State-Driven Construction: Uses finite state machine with six states (START, AFTER_BOOLEAN, AFTER_OPERATOR, AFTER_XOR, AFTER_NOT, AFTER_OPEN_PAREN) to ensure logical validity
  2. Probabilistic Structure: Randomly adds parentheses, operators, and negations based on configurable probabilities
  3. XOR Syntax Safety: Special handling for XOR operator to prevent "^ not" sequences that cause Python syntax errors
  4. Weighted Action Selection: Uses probability-based weighting to bias toward expression completion
  5. Format Transformation: Post-processing to convert between different boolean notation systems

Expression Components

Boolean Values:

  • True/False (or format-specific equivalents)
  • Randomly selected for each term position

Logical Operators:

  • AND (and, &, AND, &&)
  • OR (or, |, OR, ||)
  • XOR (^, XOR, !=, xor)
  • NOT (not, ~, NOT, !)

Parentheses:

  • Configurable nesting depth (max_depth parameter)
  • Probabilistic insertion based on current state and depth limits
  • Automatic balancing ensures syntactic correctness

Special XOR Handling:

  • XOR operator (^) cannot be followed directly by NOT in Python
  • Generator maintains separate AFTER_XOR state to prevent invalid syntax
  • This constraint ensures all generated expressions are evaluable

Configuration Parameters

Generation Schema (BooleanGenerationParams)

class BooleanGenerationParams(BaseModel):
    count: int                           # Number of test cases to generate (> 0)
    length: int                          # Number of boolean terms (≥ 3)
    max_depth: int                       # Maximum bracket nesting depth (≥ 1)
    prob_open: float                     # Probability of adding open parenthesis (0-1, default: 0.4)
    prob_close: float                    # Probability of adding close parenthesis (0-1, default: 0.5)
    prob_not: float                      # Probability of adding not operator (0-1, default: 0.8)
    prob_not_after_not: float            # Probability of chaining not operators (0-1, default: 0.5)
    prob_dewhitespace: float             # Probability of removing whitespace (0-1, default: 0)
    boolean_format: BooleanFormat        # Output format for boolean values and operators

Boolean Format Options:

  • TRUE_FALSE (0): True/False, and/or/not/^
  • T_F (1): T/F, &/|/~/^
  • ON_OFF (2): ON/OFF, AND/OR/NOT/XOR
  • BINARY (3): 1/0, &&/||/!/!=
  • YES_NO (4): YES/NO, and/or/not/xor

Result Schema (BooleanTestCaseResult)

class BooleanTestCaseResult(BaseModel):
    input: str                          # The boolean expression to evaluate
    target: str                         # Correct boolean result ("True" or "False")

Example Test Cases

Simple Precedence (length=3, max_depth=0)

Input: "True and False or True"
Target: "True"
Format: TRUE_FALSE
Analysis: Tests basic operator precedence (and before or): (True and False) or True = False or True = True.

Negation Precedence (length=3, max_depth=1)

Input: "not True and False"
Target: "False"
Format: TRUE_FALSE
Analysis: NOT has highest precedence: (not True) and False = False and False = False.

Nested Parentheses (length=4, max_depth=2)

Input: "not ( ( not not True ) )"
Target: "False"
Format: TRUE_FALSE
Analysis: Multiple negation levels requiring careful evaluation: not((not(not True))) = not(not False) = not True = False.

XOR Operations (length=3, max_depth=0)

Input: "True ^ False ^ True"
Target: "False"
Format: TRUE_FALSE
Analysis: XOR is left-associative: (True ^ False) ^ True = True ^ True = False.

Alternative Format - Binary (length=4, max_depth=1)

Input: "!(1&&0)||1"
Target: "True"
Format: BINARY
Analysis: Binary notation with C-style operators: !(1 && 0) || 1 = !False || True = True || True = True.

Alternative Format - Hardware (length=3, max_depth=1)

Input: "NOT ( ON AND OFF )"
Target: "True"  
Format: ON_OFF
Analysis: Hardware-style boolean notation: NOT(ON AND OFF) = NOT(False) = True.

Complex Nested Structure (length=6, max_depth=3)

Input: "not ( True and ( False or not True ) )"
Target: "True"
Format: TRUE_FALSE
Analysis: Multiple precedence levels: not(True and (False or (not True))) = not(True and (False or False)) = not(True and False) = not False = True.

Whitespace Variation (length=4, max_depth=1, prob_dewhitespace=1.0)

Input: "True&&(False||True)"
Target: "True"
Format: BINARY
Analysis: No spaces between operators tests parsing robustness in binary format.

Cognitive Skills Tested

Core Competencies

  • Truth Table Knowledge: Understanding fundamental AND, OR, XOR, NOT operations
  • Symbolic Parsing: Correctly interpreting logical notation and operator symbols
  • Associativity: Proper left-to-right evaluation of same-precedence operators
  • Precedence Rules: Applying boolean operator precedence (not > and > or) consistently
  • Sequential Processing: Evaluating expressions step-by-step in correct logical order

Advanced Skills

  • Negation Handling: Managing multiple levels of logical negation
  • Format Recognition: Parsing different boolean notation systems
  • Working Memory: Tracking intermediate boolean values through complex nested evaluations
  • Short-Circuit Logic: Potential optimization recognition in evaluation paths

Special Implementation Notes

XOR Operator Constraints

The XOR operator (^) requires special handling due to Python syntax limitations:

  • Restriction: Cannot be immediately followed by "not" operator
  • Reason: "^ not" creates parsing ambiguity in Python's grammar
  • Solution: Generator uses separate AFTER_XOR state to prevent invalid combinations
  • Impact: Ensures all generated expressions are evaluable with Python's eval() function

This constraint reflects real-world programming considerations where certain operator combinations have syntactic limitations.

Applications

This test manifold evaluates capabilities essential for:

  • Logical Programming: Foundation for conditional logic and boolean algebra
  • Circuit Design: Understanding digital logic gate operations and combinations
  • Database Queries: Boolean conditions in WHERE clauses and filtering operations
  • Search Logic: Complex query construction with AND/OR/NOT operations
  • Formal Verification: Logical reasoning in software and hardware verification
  • Decision Trees: Boolean condition evaluation in algorithmic decision making
  • Programming Logic: Control flow conditions and boolean expression evaluation