Skip to content

Sort Test Manifold

Overview

The Sort test manifold evaluates a model's ability to perform alphabetical sorting of word collections while handling various text transformations and duplications. Models must process space-separated word lists, apply case-insensitive alphabetical ordering, and output sorted results in a specified format while managing cognitive load from case mutations, word duplications, and varied vocabulary sources.

Task Description

Models are presented with collections of unsorted words and must sort them alphabetically in a case-insensitive manner. The task requires lexicographic ordering, case normalization, and format transformation while handling distractors in the form of mixed-case words and potential duplications that increase cognitive complexity.

Key Features: - Lexicographic Ordering: Sorting words alphabetically regardless of case - Case Normalization: Converting all output to lowercase for consistency - Format Transformation: Converting space-separated input to newline-separated output - Dictionary Sampling: Using real dictionary words for realistic vocabulary - Run-Length Control: Selecting consecutive dictionary segments for coherent word groups - Case Mutation: Random case transformations to test case-insensitive sorting - Word Duplication: Controlled repetition of words to increase processing complexity - Uniqueness Validation: Ensuring generated test cases are distinct

Test Case Generation

Algorithm Overview

The generator creates challenging sorting scenarios through a systematic process:

  1. Dictionary Loading: Load and pre-sort words from dictionary file for efficient sampling
  2. Run-Length Sampling: Select consecutive word segments from dictionary to maintain coherence
  3. Word Collection: Accumulate words until target length is reached through multiple runs
  4. Duplication Application: Apply controlled word duplication based on probability
  5. Case Mutation: Apply random case transformations to increase visual complexity
  6. Collection Shuffling: Randomize word order to eliminate positional sorting cues
  7. Target Generation: Create lowercase, alphabetically sorted target output
  8. Uniqueness Checking: Ensure generated test cases haven't been seen before

Dictionary Sampling Strategy

Run-Length Approach: Instead of random individual word selection, the system uses consecutive dictionary segments: - Coherence Benefit: Consecutive words often share semantic or morphological relationships - Efficiency Gain: Reduces random access patterns in large dictionaries - Controlled Variety: run_length parameter controls segment size (1 = random, larger = more coherent) - Multiple Runs: System performs multiple sampling runs until target word count is reached

Word Accumulation Process: 1. Random Start: Select random starting position in sorted dictionary 2. Segment Extraction: Copy up to run_length consecutive words 3. Length Constraint: Respect remaining word count needed 4. Iteration: Repeat until target length is achieved

Transformation System

Case Mutation: Random case transformations applied with prob_mutation: - lowercase: Convert entire word to lowercase - UPPERCASE: Convert entire word to uppercase
- Title Case: Capitalize first letter, lowercase remainder - Application: Applied to final word in collection after duplication

Word Duplication: Controlled repetition with prob_duplication: - Timing: Applied before adding each word to collection - Effect: Creates duplicate entries that must be sorted correctly - Cognitive Load: Tests whether models handle repeated elements properly

Output Format Transformation

Input Format: Space-separated words with "Input: " prefix

Input: apple Banana CHERRY dog

Target Format: Newline-separated lowercase words in alphabetical order

apple
banana
cherry
dog

Configuration Parameters

Generation Schema (SortGenerationParams)

class SortGenerationParams(BaseModel):
    count: int                                   # Number of test cases to generate (> 0)
    length: int                                 # Number of words in each test case (> 0)
    run_length: int                             # Maximum consecutive words from dictionary (> 0)
    prob_mutation: float                        # Probability of case mutation (0.0-1.0, default: 0.3)
    prob_duplication: float                     # Probability of word duplication (0.0-1.0, default: 0.2)

Result Schema (SortTestCaseResult)

class SortTestCaseResult(BaseModel):
    input: str                                  # Space-separated unsorted words with "Input: " prefix
    target: str                                 # Newline-separated sorted words (lowercase)

Example Test Cases

Basic Alphabetical Sorting (length=5, run_length=3, prob_mutation=0.0, prob_duplication=0.0)

Input: elephant dog cat bird apple

Sorting Process: - Original: [elephant, dog, cat, bird, apple] - Lowercase: [elephant, dog, cat, bird, apple] - Alphabetical: [apple, bird, cat, dog, elephant]

Expected Output:

apple
bird
cat
dog
elephant

Case-Insensitive Sorting (length=4, prob_mutation=0.8)

Input: ZEBRA apple Banana cherry

Case Analysis: - ZEBRA: uppercase mutation - apple: lowercase (original) - Banana: title case mutation - cherry: lowercase (original)

Sorting Process: - Case-insensitive sort: [apple, Banana, cherry, ZEBRA] - Lowercase output: [apple, banana, cherry, zebra]

Expected Output:

apple
banana
cherry
zebra

Word Duplication Challenge (length=6, prob_duplication=0.5)

Input: cat dog cat bird apple dog

Duplication Analysis: - cat: appears twice - dog: appears twice - bird: appears once - apple: appears once

Sorting Process: - All instances sorted: [apple, bird, cat, cat, dog, dog]

Expected Output:

apple
bird
cat
cat
dog
dog

Mixed Complexity (length=8, run_length=2, prob_mutation=0.4, prob_duplication=0.3)

Input: HOUSE tree HOUSE garden flower Tree mountain river

Transformation Analysis: - HOUSE: uppercase mutation, duplicated - tree/Tree: different cases of same word - garden: lowercase (original) - flower: lowercase (original) - mountain: lowercase (original) - river: lowercase (original)

Sorting Process: - Case-insensitive grouping: [flower, garden, HOUSE, HOUSE, mountain, river, Tree, tree] - Lowercase output: [flower, garden, house, house, mountain, river, tree, tree]

Expected Output:

flower
garden
house
house
mountain
river
tree
tree

Run-Length Coherence (length=6, run_length=6)

Input: abandon ability able about above absence

Dictionary Coherence: All words from same alphabetical region (consecutive 'ab-' words)

Sorting Process: - Already near-sorted due to dictionary source - Final order: [abandon, ability, able, about, above, absence]

Expected Output:

abandon
ability
able
about
above
absence

Distractor System

Primary Distractors: Case Mutations

Random case transformations that test case-insensitive sorting: - Visual Complexity: Mixed case creates visual noise while preserving alphabetical relationships - Case Insensitivity Test: Models must ignore case when determining sort order - Cognitive Load: Unusual casing (ALL CAPS, Title Case) increases processing difficulty

Secondary Distractors: Word Duplications

Repeated words that test duplicate handling and attention: - Duplicate Processing: Models must sort all instances of repeated words correctly - Attention Challenge: Duplicates may cause confusion about unique vs. repeated elements - Ordering Consistency: All instances of same word must appear together in sorted output

Strategic Distribution

Words are shuffled after all transformations to ensure: - No Positional Cues: Original dictionary order is completely randomized - Mixed Complexity: Case mutations and duplications distributed throughout input - Cognitive Consistency: Processing difficulty maintained across entire word list

Cognitive Skills Tested

  • Lexicographic Ordering: Understanding alphabetical sequence relationships
  • Case Insensitivity: Ignoring case differences when comparing words
  • Pattern Recognition: Identifying alphabetical patterns across varied presentations
  • Working Memory: Maintaining sort criteria while processing word sequences
  • Attention to Detail: Accurate character-by-character comparison for ordering
  • Duplicate Handling: Correctly processing repeated elements in collections
  • Format Transformation: Converting between different text representations
  • Visual Processing: Parsing mixed-case text accurately
  • Systematic Processing: Applying consistent sorting rules across entire collections
  • Output Formatting: Generating correctly structured results

Applications

This test manifold evaluates capabilities essential for:

  • Data Organization: Sorting textual data in databases and applications
  • Information Retrieval: Organizing search results and directory listings
  • Text Processing: Alphabetizing content in documents and reports
  • User Interface: Implementing sort functionality in applications
  • Data Validation: Ensuring consistent ordering in data processing pipelines
  • Content Management: Organizing textual content for presentation
  • Search Optimization: Preparing sorted indexes for efficient searching
  • Quality Assurance: Validating sort implementations in software systems
  • Document Processing: Organizing references, glossaries, and indexes
  • Database Operations: Implementing ORDER BY functionality and data organization