Movie Recommendation Test Manifold¶

Overview¶

The Movie Recommendation test manifold evaluates a model's ability to identify thematic and stylistic similarities between films, requiring sophisticated pattern recognition and content similarity analysis. Unlike superficial popularity-based recommendations, this task tests genuine understanding of movie characteristics through genre and theme clustering.

Task Description¶

Models are presented with a set of reference movies that a user has enjoyed, and must select the most appropriate recommendation from multiple choice options. The correct answer shares meaningful similarities with the reference set based on genre patterns, thematic elements, or stylistic characteristics.

Key Features:

Genre-based clustering: Movies grouped by shared genres (Action, Horror, Animation, etc.)
Thematic similarity: Movies connected by narrative themes (Coming-of-Age, Revenge, Identity-Crisis, etc.)
Pattern recognition: Identifying the underlying commonalities that connect seemingly different films
Balanced difficulty: Anti-reference choices specifically selected to avoid obvious matches
Configurable hints: Optional genre/theme information to transform from knowledge-based to information-processing task
Question sensitivity tracking: Template index tracking enables analysis of model sensitivity to question phrasing

Test Case Generation¶

Algorithm Overview¶

The generator uses a dual-filtering approach to create meaningful test cases:

Reference Set Selection: Starting with the full movie database, iteratively apply genre and theme filters until a coherent reference set that matches ALL criteria emerges
Anti-Reference Generation: Create a contrasting set of movies that match NONE of the selected criteria
Balanced Sampling: Select reference movies and the target from the reference set, with remaining choices from the anti-reference set
Randomization: Shuffle choices to prevent position bias
Question Template: Randomly select a question template to ensure broad task understanding

Movie Database¶

The system uses a curated database of ~400 movies spanning multiple decades, each annotated with:

Genres (19 categories):

Action, Adventure, Animation, Biography, Comedy, Crime, Documentary, Drama, Family, Fantasy, History, Horror, Musical, Mystery, Romance, Sci-Fi, Thriller, War, Western

Themes (25 categories):

Coming-of-Age, Redemption, Revenge, Love-Triangle, Fish-Out-of-Water, Good-vs-Evil, Underdog, Betrayal, Sacrifice, Identity-Crisis, Father-Son, Time-Loop, Artificial-Intelligence, Dystopian-Society, Heist, Road-Trip, Survival, Corruption, Mental-Illness, Technology-Dependence, Class-Struggle, Alien-Contact, Superhero-Origin, Found-Family, Moral-Ambiguity

Note: While the database includes release years for each movie, the current generation algorithm does not use temporal information for clustering. Test cases are created purely based on genre and theme similarities.

Question Templates¶

The generator uses a variety of question templates to create natural, diverse problem statements. Templates are randomly selected to avoid predictable phrasing patterns while maintaining consistent task requirements.

Template Categories¶

Direct Similarity Queries¶

"Which movie is most similar to {reference_movies}?"
"Which movie shares the most characteristics with {reference_movies}?"
"Which movie fits the same pattern as {reference_movies}?"

Recommendation Framing¶

"Based on these movies: {reference_movies} - which would you recommend next?"
"If you enjoyed {reference_movies}, which movie would you like most?"
"If you liked {reference_movies}, what should you watch next?"
"Which movie would appeal to someone who enjoys {reference_movies}?"

User Preference Simulation¶

"I loved {reference_movies}, what should I watch next?"
"If {reference_movies} are your favorites, which would you add to the list?"
"Based on these movies, which would you recommend: {reference_movies}?"

Group Membership¶

"Which movie belongs with this group: {reference_movies}?"
"Complete this movie collection: {reference_movies}."

Template Structure¶

Each question follows a consistent format:

Opening Statement: The question prompt with reference movies inserted
Options Header: "Options:" separator line
Lettered Choices: Multiple choice options formatted as "(A) Movie Title"

Example Template Applications¶

Template: "If you enjoyed {reference_movies}, which movie would you like most?"

Rendered Example:

If you enjoyed The Exorcist, Halloween, A Nightmare on Elm Street, which movie would you like most?

Options:
(A) The Conjuring
(B) The Notebook
(C) Fast & Furious
(D) Finding Nemo

Configuration Parameters¶

Generation Schema (`MovieGenerationParams`)¶

class MovieGenerationParams(BaseModel):
    count: int                                    # Number of test cases to generate (> 0)
    reference_count: int                          # Number of movies in reference set (≥ 1)  
    choice_count: int                            # Total number of answer choices (≥ 2)
    genre_weights: Optional[Dict[str, int]]      # Weights for genre selection probability
    theme_weights: Optional[Dict[str, int]]      # Weights for theme selection probability
    hints: HintLevel                             # Level of hints to provide (NONE, QUESTION, ANSWER, BOTH)

Hint Levels (`HintLevel`)¶

The hints parameter controls whether genre and theme information is provided to transform the task from internal knowledge-based to information processing:

class HintLevel(IntEnum):
    NONE = 0      # No hints provided (pure knowledge test)
    QUESTION = 1  # Hints in reference movies only
    ANSWER = 2    # Hints in answer choices only  
    BOTH = 3      # Hints in both reference movies and answer choices

Hint Format: When enabled, hints appear as a list of genres and a list of themes like (Action, Thriller; Revenge, Betrayal) after movie titles.

Standard Grid Configuration: - reference_count: [4, 5, 6] - Number of movies user has "watched" - choice_count: [3, 4, 5] - Multiple choice options presented - Generates 9 different difficulty combinations

Result Schema (`MovieTestCaseResult`)¶

class MovieTestCaseResult(BaseModel):
    input: str                                   # Formatted problem text
    target: str                                  # Correct answer (A)-(Z)
    reference_movies: List[Dict[str, Any]]       # Movies in reference set
    choices: List[Dict[str, Any]]                # All answer choices
    selected_genres: List[str]                   # Genres used for filtering
    selected_themes: List[str]                   # Themes used for filtering
    question_template_index: int                 # Index of question template used (for sensitivity analysis)

Example Test Cases¶

Genre-Based Clustering (Western)¶

Without Hints (HintLevel.NONE):

Which movie shares the most characteristics with The Good, the Bad and the Ugly, The Magnificent Seven, Rio Bravo, Once Upon a Time in the West?

Options:
(A) Family Man
(B) Carrie  
(C) True Grit

Expected: (C)

With Hints (HintLevel.BOTH):

Which movie shares the most characteristics with The Good, the Bad and the Ugly (Western; Good-vs-Evil, Moral-Ambiguity), The Magnificent Seven (Western; Sacrifice, Underdog), Rio Bravo (Western; Good-vs-Evil), Once Upon a Time in the West (Western; Revenge, Moral-Ambiguity)?

Options:
(A) Family Man (Drama; Father-Son, Sacrifice)
(B) Carrie (Horror; Coming-of-Age, Revenge)
(C) True Grit (Western; Revenge, Coming-of-Age)

Expected: (C)

Analysis: All reference movies are westerns sharing themes of frontier justice and moral complexity. True Grit is the only western option. With hints enabled, the pattern becomes explicit rather than requiring internal movie knowledge.

Theme-Based Clustering (Moral Ambiguity)¶

Which movie belongs with this group: Batman v Superman: Dawn of Justice, Watchmen, Platoon?

Options:
(A) Letters from Iwo Jima
(B) 1917
(C) Black Mirror: USS Callister
(D) The Maze Runner
(E) Psycho

Expected: (A)

Analysis: Despite spanning superhero and war genres, all reference movies explore moral ambiguity without clear heroes/villains. Letters from Iwo Jima matches this thematic complexity.

Mixed Pattern (Family-Friendly + Comedy)¶

Based on these movies: Frozen, Coco, Up, Monsters, Inc., Thor: Ragnarok - which would you recommend next?

Options:
(A) Superbad
(B) Jaws
(C) Iron Man 3  
(D) Batman v Superman: Dawn of Justice

Expected: (A)

Analysis: Reference set shows preference for family-friendly content with found-family themes plus light comedy. Superbad best matches the comedy preference while being accessible.

Cognitive Skills Tested¶

Core Competencies¶

Without Hints (Knowledge-Based):

Cultural Knowledge: Understanding movie genres, themes, and stylistic elements from their titles
Memory Recall: Accessing stored information about specific films
Implicit Pattern Recognition: Inferring similarities without explicit feature information

With Hints (Information Processing):

Explicit Pattern Matching: Comparing provided genre and theme lists
Multi-dimensional Analysis: Weighing multiple explicit similarity dimensions
Logical Reasoning: Making decisions based on clearly presented criteria

Universal Skills:

Similarity Analysis: Weighing multiple dimensions of similarity (genre, theme, tone, style)
Pattern Recognition: Predicting user preferences from limited examples
Question Sensitivity: Consistent performance across different question phrasings (tracked via template indices)

Applications¶

This test manifold evaluates capabilities essential for:

Content Recommendation Systems: Understanding user preferences beyond surface-level features
Analogical Reasoning: Finding deep structural similarities between different examples
Preference Modeling: Inferring underlying criteria from limited examples