Sort Test Manifold¶

Overview¶

The Sort test manifold evaluates a model's ability to perform alphabetical sorting of word collections while handling various text transformations and duplications. Models must process space-separated word lists, apply case-insensitive alphabetical ordering, and output sorted results in a specified format while managing cognitive load from case mutations, word duplications, and varied vocabulary sources.

Task Description¶

Models are presented with collections of unsorted words and must sort them alphabetically in a case-insensitive manner. The task requires lexicographic ordering, case normalization, and format transformation while handling distractors in the form of mixed-case words and potential duplications that increase cognitive complexity.

Key Features: - Lexicographic Ordering: Sorting words alphabetically regardless of case - Case Normalization: Converting all output to lowercase for consistency - Format Transformation: Converting space-separated input to newline-separated output - Dictionary Sampling: Using real dictionary words for realistic vocabulary - Run-Length Control: Selecting consecutive dictionary segments for coherent word groups - Case Mutation: Random case transformations to test case-insensitive sorting - Word Duplication: Controlled repetition of words to increase processing complexity - Uniqueness Validation: Ensuring generated test cases are distinct

Test Case Generation¶

Algorithm Overview¶

The generator creates challenging sorting scenarios through a systematic process:

Dictionary Loading: Load and pre-sort words from dictionary file for efficient sampling
Run-Length Sampling: Select consecutive word segments from dictionary to maintain coherence
Word Collection: Accumulate words until target length is reached through multiple runs
Duplication Application: Apply controlled word duplication based on probability
Case Mutation: Apply random case transformations to increase visual complexity
Collection Shuffling: Randomize word order to eliminate positional sorting cues
Target Generation: Create lowercase, alphabetically sorted target output
Uniqueness Checking: Ensure generated test cases haven't been seen before

Dictionary Sampling Strategy¶

Run-Length Approach: Instead of random individual word selection, the system uses consecutive dictionary segments: - Coherence Benefit: Consecutive words often share semantic or morphological relationships - Efficiency Gain: Reduces random access patterns in large dictionaries - Controlled Variety: run_length parameter controls segment size (1 = random, larger = more coherent) - Multiple Runs: System performs multiple sampling runs until target word count is reached

Word Accumulation Process: 1. Random Start: Select random starting position in sorted dictionary 2. Segment Extraction: Copy up to run_length consecutive words 3. Length Constraint: Respect remaining word count needed 4. Iteration: Repeat until target length is achieved

Transformation System¶

Case Mutation: Random case transformations applied with prob_mutation: - lowercase: Convert entire word to lowercase - UPPERCASE: Convert entire word to uppercase
- Title Case: Capitalize first letter, lowercase remainder - Application: Applied to final word in collection after duplication

Word Duplication: Controlled repetition with prob_duplication: - Timing: Applied before adding each word to collection - Effect: Creates duplicate entries that must be sorted correctly - Cognitive Load: Tests whether models handle repeated elements properly

Output Format Transformation¶

Input Format: Space-separated words with "Input: " prefix

Input: apple Banana CHERRY dog

Target Format: Newline-separated lowercase words in alphabetical order

apple
banana
cherry
dog

Configuration Parameters¶

Generation Schema (`SortGenerationParams`)¶

class SortGenerationParams(BaseModel):
    count: int                                   # Number of test cases to generate (> 0)
    length: int                                 # Number of words in each test case (> 0)
    run_length: int                             # Maximum consecutive words from dictionary (> 0)
    prob_mutation: float                        # Probability of case mutation (0.0-1.0, default: 0.3)
    prob_duplication: float                     # Probability of word duplication (0.0-1.0, default: 0.2)

Result Schema (`SortTestCaseResult`)¶

class SortTestCaseResult(BaseModel):
    input: str                                  # Space-separated unsorted words with "Input: " prefix
    target: str                                 # Newline-separated sorted words (lowercase)

Example Test Cases¶

Basic Alphabetical Sorting (length=5, run_length=3, prob_mutation=0.0, prob_duplication=0.0)¶

Input: elephant dog cat bird apple

Sorting Process: - Original: [elephant, dog, cat, bird, apple] - Lowercase: [elephant, dog, cat, bird, apple] - Alphabetical: [apple, bird, cat, dog, elephant]

Expected Output:

apple
bird
cat
dog
elephant

Case-Insensitive Sorting (length=4, prob_mutation=0.8)¶

Input: ZEBRA apple Banana cherry

Case Analysis: - ZEBRA: uppercase mutation - apple: lowercase (original) - Banana: title case mutation - cherry: lowercase (original)

Sorting Process: - Case-insensitive sort: [apple, Banana, cherry, ZEBRA] - Lowercase output: [apple, banana, cherry, zebra]

Expected Output:

apple
banana
cherry
zebra

Word Duplication Challenge (length=6, prob_duplication=0.5)¶

Input: cat dog cat bird apple dog

Duplication Analysis: - cat: appears twice - dog: appears twice - bird: appears once - apple: appears once

Sorting Process: - All instances sorted: [apple, bird, cat, cat, dog, dog]

Expected Output:

apple
bird
cat
cat
dog
dog

Mixed Complexity (length=8, run_length=2, prob_mutation=0.4, prob_duplication=0.3)¶

Input: HOUSE tree HOUSE garden flower Tree mountain river

Transformation Analysis: - HOUSE: uppercase mutation, duplicated - tree/Tree: different cases of same word - garden: lowercase (original) - flower: lowercase (original) - mountain: lowercase (original) - river: lowercase (original)

Sorting Process: - Case-insensitive grouping: [flower, garden, HOUSE, HOUSE, mountain, river, Tree, tree] - Lowercase output: [flower, garden, house, house, mountain, river, tree, tree]

Expected Output:

flower
garden
house
house
mountain
river
tree
tree

Run-Length Coherence (length=6, run_length=6)¶

Input: abandon ability able about above absence

Dictionary Coherence: All words from same alphabetical region (consecutive 'ab-' words)

Sorting Process: - Already near-sorted due to dictionary source - Final order: [abandon, ability, able, about, above, absence]

Expected Output:

abandon
ability
able
about
above
absence

Distractor System¶

Primary Distractors: Case Mutations¶

Random case transformations that test case-insensitive sorting: - Visual Complexity: Mixed case creates visual noise while preserving alphabetical relationships - Case Insensitivity Test: Models must ignore case when determining sort order - Cognitive Load: Unusual casing (ALL CAPS, Title Case) increases processing difficulty

Secondary Distractors: Word Duplications¶

Repeated words that test duplicate handling and attention: - Duplicate Processing: Models must sort all instances of repeated words correctly - Attention Challenge: Duplicates may cause confusion about unique vs. repeated elements - Ordering Consistency: All instances of same word must appear together in sorted output

Strategic Distribution¶

Words are shuffled after all transformations to ensure: - No Positional Cues: Original dictionary order is completely randomized - Mixed Complexity: Case mutations and duplications distributed throughout input - Cognitive Consistency: Processing difficulty maintained across entire word list

Cognitive Skills Tested¶

Lexicographic Ordering: Understanding alphabetical sequence relationships
Case Insensitivity: Ignoring case differences when comparing words
Pattern Recognition: Identifying alphabetical patterns across varied presentations
Working Memory: Maintaining sort criteria while processing word sequences
Attention to Detail: Accurate character-by-character comparison for ordering
Duplicate Handling: Correctly processing repeated elements in collections
Format Transformation: Converting between different text representations
Visual Processing: Parsing mixed-case text accurately
Systematic Processing: Applying consistent sorting rules across entire collections
Output Formatting: Generating correctly structured results

Applications¶

This test manifold evaluates capabilities essential for:

Data Organization: Sorting textual data in databases and applications
Information Retrieval: Organizing search results and directory listings
Text Processing: Alphabetizing content in documents and reports
User Interface: Implementing sort functionality in applications
Data Validation: Ensuring consistent ordering in data processing pipelines
Content Management: Organizing textual content for presentation
Search Optimization: Preparing sorted indexes for efficient searching
Quality Assurance: Validating sort implementations in software systems
Document Processing: Organizing references, glossaries, and indexes
Database Operations: Implementing ORDER BY functionality and data organization

Sort Test Manifold¶

Overview¶

Task Description¶

Test Case Generation¶

Algorithm Overview¶

Dictionary Sampling Strategy¶

Transformation System¶

Output Format Transformation¶

Configuration Parameters¶

Generation Schema (SortGenerationParams)¶

Result Schema (SortTestCaseResult)¶

Example Test Cases¶

Basic Alphabetical Sorting (length=5, run_length=3, prob_mutation=0.0, prob_duplication=0.0)¶

Case-Insensitive Sorting (length=4, prob_mutation=0.8)¶

Word Duplication Challenge (length=6, prob_duplication=0.5)¶

Mixed Complexity (length=8, run_length=2, prob_mutation=0.4, prob_duplication=0.3)¶

Run-Length Coherence (length=6, run_length=6)¶

Distractor System¶

Primary Distractors: Case Mutations¶

Secondary Distractors: Word Duplications¶

Strategic Distribution¶

Cognitive Skills Tested¶

Applications¶

Generation Schema (`SortGenerationParams`)¶

Result Schema (`SortTestCaseResult`)¶