Skip to content

Movie Recommendation Test Manifold

Overview

The Movie Recommendation test manifold evaluates a model's ability to identify thematic and stylistic similarities between films, requiring sophisticated pattern recognition and content similarity analysis. Unlike superficial popularity-based recommendations, this task tests genuine understanding of movie characteristics through genre and theme clustering.

Task Description

Models are presented with a set of reference movies that a user has enjoyed, and must select the most appropriate recommendation from multiple choice options. The correct answer shares meaningful similarities with the reference set based on genre patterns, thematic elements, or stylistic characteristics.

Key Features:

  • Genre-based clustering: Movies grouped by shared genres (Action, Horror, Animation, etc.)
  • Thematic similarity: Movies connected by narrative themes (Coming-of-Age, Revenge, Identity-Crisis, etc.)
  • Pattern recognition: Identifying the underlying commonalities that connect seemingly different films
  • Balanced difficulty: Anti-reference choices specifically selected to avoid obvious matches
  • Configurable hints: Optional genre/theme information to transform from knowledge-based to information-processing task
  • Question sensitivity tracking: Template index tracking enables analysis of model sensitivity to question phrasing

Test Case Generation

Algorithm Overview

The generator uses a dual-filtering approach to create meaningful test cases:

  1. Reference Set Selection: Starting with the full movie database, iteratively apply genre and theme filters until a coherent reference set that matches ALL criteria emerges
  2. Anti-Reference Generation: Create a contrasting set of movies that match NONE of the selected criteria
  3. Balanced Sampling: Select reference movies and the target from the reference set, with remaining choices from the anti-reference set
  4. Randomization: Shuffle choices to prevent position bias
  5. Question Template: Randomly select a question template to ensure broad task understanding

Movie Database

The system uses a curated database of ~400 movies spanning multiple decades, each annotated with:

Genres (19 categories):

  • Action, Adventure, Animation, Biography, Comedy, Crime, Documentary, Drama, Family, Fantasy, History, Horror, Musical, Mystery, Romance, Sci-Fi, Thriller, War, Western

Themes (25 categories):

  • Coming-of-Age, Redemption, Revenge, Love-Triangle, Fish-Out-of-Water, Good-vs-Evil, Underdog, Betrayal, Sacrifice, Identity-Crisis, Father-Son, Time-Loop, Artificial-Intelligence, Dystopian-Society, Heist, Road-Trip, Survival, Corruption, Mental-Illness, Technology-Dependence, Class-Struggle, Alien-Contact, Superhero-Origin, Found-Family, Moral-Ambiguity

Note: While the database includes release years for each movie, the current generation algorithm does not use temporal information for clustering. Test cases are created purely based on genre and theme similarities.

Question Templates

The generator uses a variety of question templates to create natural, diverse problem statements. Templates are randomly selected to avoid predictable phrasing patterns while maintaining consistent task requirements.

Template Categories

Direct Similarity Queries

"Which movie is most similar to {reference_movies}?"
"Which movie shares the most characteristics with {reference_movies}?"
"Which movie fits the same pattern as {reference_movies}?"

Recommendation Framing

"Based on these movies: {reference_movies} - which would you recommend next?"
"If you enjoyed {reference_movies}, which movie would you like most?"
"If you liked {reference_movies}, what should you watch next?"
"Which movie would appeal to someone who enjoys {reference_movies}?"

User Preference Simulation

"I loved {reference_movies}, what should I watch next?"
"If {reference_movies} are your favorites, which would you add to the list?"
"Based on these movies, which would you recommend: {reference_movies}?"

Group Membership

"Which movie belongs with this group: {reference_movies}?"
"Complete this movie collection: {reference_movies}."

Template Structure

Each question follows a consistent format:

  1. Opening Statement: The question prompt with reference movies inserted
  2. Options Header: "Options:" separator line
  3. Lettered Choices: Multiple choice options formatted as "(A) Movie Title"

Example Template Applications

Template: "If you enjoyed {reference_movies}, which movie would you like most?"

Rendered Example:

If you enjoyed The Exorcist, Halloween, A Nightmare on Elm Street, which movie would you like most?

Options:
(A) The Conjuring
(B) The Notebook
(C) Fast & Furious
(D) Finding Nemo

Configuration Parameters

Generation Schema (MovieGenerationParams)

class MovieGenerationParams(BaseModel):
    count: int                                    # Number of test cases to generate (> 0)
    reference_count: int                          # Number of movies in reference set (≥ 1)  
    choice_count: int                            # Total number of answer choices (≥ 2)
    genre_weights: Optional[Dict[str, int]]      # Weights for genre selection probability
    theme_weights: Optional[Dict[str, int]]      # Weights for theme selection probability
    hints: HintLevel                             # Level of hints to provide (NONE, QUESTION, ANSWER, BOTH)

Hint Levels (HintLevel)

The hints parameter controls whether genre and theme information is provided to transform the task from internal knowledge-based to information processing:

class HintLevel(IntEnum):
    NONE = 0      # No hints provided (pure knowledge test)
    QUESTION = 1  # Hints in reference movies only
    ANSWER = 2    # Hints in answer choices only  
    BOTH = 3      # Hints in both reference movies and answer choices

Hint Format: When enabled, hints appear as a list of genres and a list of themes like (Action, Thriller; Revenge, Betrayal) after movie titles.

Standard Grid Configuration: - reference_count: [4, 5, 6] - Number of movies user has "watched" - choice_count: [3, 4, 5] - Multiple choice options presented - Generates 9 different difficulty combinations

Result Schema (MovieTestCaseResult)

class MovieTestCaseResult(BaseModel):
    input: str                                   # Formatted problem text
    target: str                                  # Correct answer (A)-(Z)
    reference_movies: List[Dict[str, Any]]       # Movies in reference set
    choices: List[Dict[str, Any]]                # All answer choices
    selected_genres: List[str]                   # Genres used for filtering
    selected_themes: List[str]                   # Themes used for filtering
    question_template_index: int                 # Index of question template used (for sensitivity analysis)

Example Test Cases

Genre-Based Clustering (Western)

Without Hints (HintLevel.NONE):

Which movie shares the most characteristics with The Good, the Bad and the Ugly, The Magnificent Seven, Rio Bravo, Once Upon a Time in the West?

Options:
(A) Family Man
(B) Carrie  
(C) True Grit

Expected: (C)

With Hints (HintLevel.BOTH):

Which movie shares the most characteristics with The Good, the Bad and the Ugly (Western; Good-vs-Evil, Moral-Ambiguity), The Magnificent Seven (Western; Sacrifice, Underdog), Rio Bravo (Western; Good-vs-Evil), Once Upon a Time in the West (Western; Revenge, Moral-Ambiguity)?

Options:
(A) Family Man (Drama; Father-Son, Sacrifice)
(B) Carrie (Horror; Coming-of-Age, Revenge)
(C) True Grit (Western; Revenge, Coming-of-Age)

Expected: (C)

Analysis: All reference movies are westerns sharing themes of frontier justice and moral complexity. True Grit is the only western option. With hints enabled, the pattern becomes explicit rather than requiring internal movie knowledge.

Theme-Based Clustering (Moral Ambiguity)

Which movie belongs with this group: Batman v Superman: Dawn of Justice, Watchmen, Platoon?

Options:
(A) Letters from Iwo Jima
(B) 1917
(C) Black Mirror: USS Callister
(D) The Maze Runner
(E) Psycho

Expected: (A)

Analysis: Despite spanning superhero and war genres, all reference movies explore moral ambiguity without clear heroes/villains. Letters from Iwo Jima matches this thematic complexity.

Mixed Pattern (Family-Friendly + Comedy)

Based on these movies: Frozen, Coco, Up, Monsters, Inc., Thor: Ragnarok - which would you recommend next?

Options:
(A) Superbad
(B) Jaws
(C) Iron Man 3  
(D) Batman v Superman: Dawn of Justice

Expected: (A)

Analysis: Reference set shows preference for family-friendly content with found-family themes plus light comedy. Superbad best matches the comedy preference while being accessible.

Cognitive Skills Tested

Core Competencies

Without Hints (Knowledge-Based):

  • Cultural Knowledge: Understanding movie genres, themes, and stylistic elements from their titles
  • Memory Recall: Accessing stored information about specific films
  • Implicit Pattern Recognition: Inferring similarities without explicit feature information

With Hints (Information Processing):

  • Explicit Pattern Matching: Comparing provided genre and theme lists
  • Multi-dimensional Analysis: Weighing multiple explicit similarity dimensions
  • Logical Reasoning: Making decisions based on clearly presented criteria

Universal Skills:

  • Similarity Analysis: Weighing multiple dimensions of similarity (genre, theme, tone, style)
  • Pattern Recognition: Predicting user preferences from limited examples
  • Question Sensitivity: Consistent performance across different question phrasings (tracked via template indices)

Applications

This test manifold evaluates capabilities essential for:

  • Content Recommendation Systems: Understanding user preferences beyond surface-level features
  • Analogical Reasoning: Finding deep structural similarities between different examples
  • Preference Modeling: Inferring underlying criteria from limited examples