Sort Test Manifold¶
Overview¶
The Sort test manifold evaluates a model's ability to perform alphabetical sorting of word collections while handling various text transformations and duplications. Models must process space-separated word lists, apply case-insensitive alphabetical ordering, and output sorted results in a specified format while managing cognitive load from case mutations, word duplications, and varied vocabulary sources.
Task Description¶
Models are presented with collections of unsorted words and must sort them alphabetically in a case-insensitive manner. The task requires lexicographic ordering, case normalization, and format transformation while handling distractors in the form of mixed-case words and potential duplications that increase cognitive complexity.
Key Features: - Lexicographic Ordering: Sorting words alphabetically regardless of case - Case Normalization: Converting all output to lowercase for consistency - Format Transformation: Converting space-separated input to newline-separated output - Dictionary Sampling: Using real dictionary words for realistic vocabulary - Run-Length Control: Selecting consecutive dictionary segments for coherent word groups - Case Mutation: Random case transformations to test case-insensitive sorting - Word Duplication: Controlled repetition of words to increase processing complexity - Uniqueness Validation: Ensuring generated test cases are distinct
Test Case Generation¶
Algorithm Overview¶
The generator creates challenging sorting scenarios through a systematic process:
- Dictionary Loading: Load and pre-sort words from dictionary file for efficient sampling
- Run-Length Sampling: Select consecutive word segments from dictionary to maintain coherence
- Word Collection: Accumulate words until target length is reached through multiple runs
- Duplication Application: Apply controlled word duplication based on probability
- Case Mutation: Apply random case transformations to increase visual complexity
- Collection Shuffling: Randomize word order to eliminate positional sorting cues
- Target Generation: Create lowercase, alphabetically sorted target output
- Uniqueness Checking: Ensure generated test cases haven't been seen before
Dictionary Sampling Strategy¶
Run-Length Approach: Instead of random individual word selection, the system uses consecutive dictionary segments:
- Coherence Benefit: Consecutive words often share semantic or morphological relationships
- Efficiency Gain: Reduces random access patterns in large dictionaries
- Controlled Variety: run_length parameter controls segment size (1 = random, larger = more coherent)
- Multiple Runs: System performs multiple sampling runs until target word count is reached
Word Accumulation Process:
1. Random Start: Select random starting position in sorted dictionary
2. Segment Extraction: Copy up to run_length consecutive words
3. Length Constraint: Respect remaining word count needed
4. Iteration: Repeat until target length is achieved
Transformation System¶
Case Mutation: Random case transformations applied with prob_mutation:
- lowercase: Convert entire word to lowercase
- UPPERCASE: Convert entire word to uppercase
- Title Case: Capitalize first letter, lowercase remainder
- Application: Applied to final word in collection after duplication
Word Duplication: Controlled repetition with prob_duplication:
- Timing: Applied before adding each word to collection
- Effect: Creates duplicate entries that must be sorted correctly
- Cognitive Load: Tests whether models handle repeated elements properly
Output Format Transformation¶
Input Format: Space-separated words with "Input: " prefix
Input: apple Banana CHERRY dog
Target Format: Newline-separated lowercase words in alphabetical order
apple
banana
cherry
dog
Configuration Parameters¶
Generation Schema (SortGenerationParams)¶
class SortGenerationParams(BaseModel):
count: int # Number of test cases to generate (> 0)
length: int # Number of words in each test case (> 0)
run_length: int # Maximum consecutive words from dictionary (> 0)
prob_mutation: float # Probability of case mutation (0.0-1.0, default: 0.3)
prob_duplication: float # Probability of word duplication (0.0-1.0, default: 0.2)
Result Schema (SortTestCaseResult)¶
class SortTestCaseResult(BaseModel):
input: str # Space-separated unsorted words with "Input: " prefix
target: str # Newline-separated sorted words (lowercase)
Example Test Cases¶
Basic Alphabetical Sorting (length=5, run_length=3, prob_mutation=0.0, prob_duplication=0.0)¶
Input: elephant dog cat bird apple
Sorting Process: - Original: [elephant, dog, cat, bird, apple] - Lowercase: [elephant, dog, cat, bird, apple] - Alphabetical: [apple, bird, cat, dog, elephant]
Expected Output:
apple
bird
cat
dog
elephant
Case-Insensitive Sorting (length=4, prob_mutation=0.8)¶
Input: ZEBRA apple Banana cherry
Case Analysis: - ZEBRA: uppercase mutation - apple: lowercase (original) - Banana: title case mutation - cherry: lowercase (original)
Sorting Process: - Case-insensitive sort: [apple, Banana, cherry, ZEBRA] - Lowercase output: [apple, banana, cherry, zebra]
Expected Output:
apple
banana
cherry
zebra
Word Duplication Challenge (length=6, prob_duplication=0.5)¶
Input: cat dog cat bird apple dog
Duplication Analysis: - cat: appears twice - dog: appears twice - bird: appears once - apple: appears once
Sorting Process: - All instances sorted: [apple, bird, cat, cat, dog, dog]
Expected Output:
apple
bird
cat
cat
dog
dog
Mixed Complexity (length=8, run_length=2, prob_mutation=0.4, prob_duplication=0.3)¶
Input: HOUSE tree HOUSE garden flower Tree mountain river
Transformation Analysis: - HOUSE: uppercase mutation, duplicated - tree/Tree: different cases of same word - garden: lowercase (original) - flower: lowercase (original) - mountain: lowercase (original) - river: lowercase (original)
Sorting Process: - Case-insensitive grouping: [flower, garden, HOUSE, HOUSE, mountain, river, Tree, tree] - Lowercase output: [flower, garden, house, house, mountain, river, tree, tree]
Expected Output:
flower
garden
house
house
mountain
river
tree
tree
Run-Length Coherence (length=6, run_length=6)¶
Input: abandon ability able about above absence
Dictionary Coherence: All words from same alphabetical region (consecutive 'ab-' words)
Sorting Process: - Already near-sorted due to dictionary source - Final order: [abandon, ability, able, about, above, absence]
Expected Output:
abandon
ability
able
about
above
absence
Distractor System¶
Primary Distractors: Case Mutations¶
Random case transformations that test case-insensitive sorting: - Visual Complexity: Mixed case creates visual noise while preserving alphabetical relationships - Case Insensitivity Test: Models must ignore case when determining sort order - Cognitive Load: Unusual casing (ALL CAPS, Title Case) increases processing difficulty
Secondary Distractors: Word Duplications¶
Repeated words that test duplicate handling and attention: - Duplicate Processing: Models must sort all instances of repeated words correctly - Attention Challenge: Duplicates may cause confusion about unique vs. repeated elements - Ordering Consistency: All instances of same word must appear together in sorted output
Strategic Distribution¶
Words are shuffled after all transformations to ensure: - No Positional Cues: Original dictionary order is completely randomized - Mixed Complexity: Case mutations and duplications distributed throughout input - Cognitive Consistency: Processing difficulty maintained across entire word list
Cognitive Skills Tested¶
- Lexicographic Ordering: Understanding alphabetical sequence relationships
- Case Insensitivity: Ignoring case differences when comparing words
- Pattern Recognition: Identifying alphabetical patterns across varied presentations
- Working Memory: Maintaining sort criteria while processing word sequences
- Attention to Detail: Accurate character-by-character comparison for ordering
- Duplicate Handling: Correctly processing repeated elements in collections
- Format Transformation: Converting between different text representations
- Visual Processing: Parsing mixed-case text accurately
- Systematic Processing: Applying consistent sorting rules across entire collections
- Output Formatting: Generating correctly structured results
Applications¶
This test manifold evaluates capabilities essential for:
- Data Organization: Sorting textual data in databases and applications
- Information Retrieval: Organizing search results and directory listings
- Text Processing: Alphabetizing content in documents and reports
- User Interface: Implementing sort functionality in applications
- Data Validation: Ensuring consistent ordering in data processing pipelines
- Content Management: Organizing textual content for presentation
- Search Optimization: Preparing sorted indexes for efficient searching
- Quality Assurance: Validating sort implementations in software systems
- Document Processing: Organizing references, glossaries, and indexes
- Database Operations: Implementing ORDER BY functionality and data organization