Movie Recommendation Test Manifold¶
Overview¶
The Movie Recommendation test manifold evaluates a model's ability to identify thematic and stylistic similarities between films, requiring sophisticated pattern recognition and content similarity analysis. Unlike superficial popularity-based recommendations, this task tests genuine understanding of movie characteristics through genre and theme clustering.
Task Description¶
Models are presented with a set of reference movies that a user has enjoyed, and must select the most appropriate recommendation from multiple choice options. The correct answer shares meaningful similarities with the reference set based on genre patterns, thematic elements, or stylistic characteristics.
Key Features:
- Genre-based clustering: Movies grouped by shared genres (Action, Horror, Animation, etc.)
- Thematic similarity: Movies connected by narrative themes (Coming-of-Age, Revenge, Identity-Crisis, etc.)
- Pattern recognition: Identifying the underlying commonalities that connect seemingly different films
- Balanced difficulty: Anti-reference choices specifically selected to avoid obvious matches
- Configurable hints: Optional genre/theme information to transform from knowledge-based to information-processing task
- Question sensitivity tracking: Template index tracking enables analysis of model sensitivity to question phrasing
Test Case Generation¶
Algorithm Overview¶
The generator uses a dual-filtering approach to create meaningful test cases:
- Reference Set Selection: Starting with the full movie database, iteratively apply genre and theme filters until a coherent reference set that matches ALL criteria emerges
- Anti-Reference Generation: Create a contrasting set of movies that match NONE of the selected criteria
- Balanced Sampling: Select reference movies and the target from the reference set, with remaining choices from the anti-reference set
- Randomization: Shuffle choices to prevent position bias
- Question Template: Randomly select a question template to ensure broad task understanding
Movie Database¶
The system uses a curated database of ~400 movies spanning multiple decades, each annotated with:
Genres (19 categories):
- Action, Adventure, Animation, Biography, Comedy, Crime, Documentary, Drama, Family, Fantasy, History, Horror, Musical, Mystery, Romance, Sci-Fi, Thriller, War, Western
Themes (25 categories):
- Coming-of-Age, Redemption, Revenge, Love-Triangle, Fish-Out-of-Water, Good-vs-Evil, Underdog, Betrayal, Sacrifice, Identity-Crisis, Father-Son, Time-Loop, Artificial-Intelligence, Dystopian-Society, Heist, Road-Trip, Survival, Corruption, Mental-Illness, Technology-Dependence, Class-Struggle, Alien-Contact, Superhero-Origin, Found-Family, Moral-Ambiguity
Note: While the database includes release years for each movie, the current generation algorithm does not use temporal information for clustering. Test cases are created purely based on genre and theme similarities.
Question Templates¶
The generator uses a variety of question templates to create natural, diverse problem statements. Templates are randomly selected to avoid predictable phrasing patterns while maintaining consistent task requirements.
Template Categories¶
Direct Similarity Queries¶
"Which movie is most similar to {reference_movies}?"
"Which movie shares the most characteristics with {reference_movies}?"
"Which movie fits the same pattern as {reference_movies}?"
Recommendation Framing¶
"Based on these movies: {reference_movies} - which would you recommend next?"
"If you enjoyed {reference_movies}, which movie would you like most?"
"If you liked {reference_movies}, what should you watch next?"
"Which movie would appeal to someone who enjoys {reference_movies}?"
User Preference Simulation¶
"I loved {reference_movies}, what should I watch next?"
"If {reference_movies} are your favorites, which would you add to the list?"
"Based on these movies, which would you recommend: {reference_movies}?"
Group Membership¶
"Which movie belongs with this group: {reference_movies}?"
"Complete this movie collection: {reference_movies}."
Template Structure¶
Each question follows a consistent format:
- Opening Statement: The question prompt with reference movies inserted
- Options Header: "Options:" separator line
- Lettered Choices: Multiple choice options formatted as "(A) Movie Title"
Example Template Applications¶
Template: "If you enjoyed {reference_movies}, which movie would you like most?"
Rendered Example:
If you enjoyed The Exorcist, Halloween, A Nightmare on Elm Street, which movie would you like most?
Options:
(A) The Conjuring
(B) The Notebook
(C) Fast & Furious
(D) Finding Nemo
Configuration Parameters¶
Generation Schema (MovieGenerationParams
)¶
class MovieGenerationParams(BaseModel):
count: int # Number of test cases to generate (> 0)
reference_count: int # Number of movies in reference set (≥ 1)
choice_count: int # Total number of answer choices (≥ 2)
genre_weights: Optional[Dict[str, int]] # Weights for genre selection probability
theme_weights: Optional[Dict[str, int]] # Weights for theme selection probability
hints: HintLevel # Level of hints to provide (NONE, QUESTION, ANSWER, BOTH)
Hint Levels (HintLevel
)¶
The hints parameter controls whether genre and theme information is provided to transform the task from internal knowledge-based to information processing:
class HintLevel(IntEnum):
NONE = 0 # No hints provided (pure knowledge test)
QUESTION = 1 # Hints in reference movies only
ANSWER = 2 # Hints in answer choices only
BOTH = 3 # Hints in both reference movies and answer choices
Hint Format: When enabled, hints appear as a list of genres and a list of themes like (Action, Thriller; Revenge, Betrayal)
after movie titles.
Standard Grid Configuration:
- reference_count
: [4, 5, 6] - Number of movies user has "watched"
- choice_count
: [3, 4, 5] - Multiple choice options presented
- Generates 9 different difficulty combinations
Result Schema (MovieTestCaseResult
)¶
class MovieTestCaseResult(BaseModel):
input: str # Formatted problem text
target: str # Correct answer (A)-(Z)
reference_movies: List[Dict[str, Any]] # Movies in reference set
choices: List[Dict[str, Any]] # All answer choices
selected_genres: List[str] # Genres used for filtering
selected_themes: List[str] # Themes used for filtering
question_template_index: int # Index of question template used (for sensitivity analysis)
Example Test Cases¶
Genre-Based Clustering (Western)¶
Without Hints (HintLevel.NONE):
Which movie shares the most characteristics with The Good, the Bad and the Ugly, The Magnificent Seven, Rio Bravo, Once Upon a Time in the West?
Options:
(A) Family Man
(B) Carrie
(C) True Grit
Expected: (C)
With Hints (HintLevel.BOTH):
Which movie shares the most characteristics with The Good, the Bad and the Ugly (Western; Good-vs-Evil, Moral-Ambiguity), The Magnificent Seven (Western; Sacrifice, Underdog), Rio Bravo (Western; Good-vs-Evil), Once Upon a Time in the West (Western; Revenge, Moral-Ambiguity)?
Options:
(A) Family Man (Drama; Father-Son, Sacrifice)
(B) Carrie (Horror; Coming-of-Age, Revenge)
(C) True Grit (Western; Revenge, Coming-of-Age)
Expected: (C)
Analysis: All reference movies are westerns sharing themes of frontier justice and moral complexity. True Grit is the only western option. With hints enabled, the pattern becomes explicit rather than requiring internal movie knowledge.
Theme-Based Clustering (Moral Ambiguity)¶
Which movie belongs with this group: Batman v Superman: Dawn of Justice, Watchmen, Platoon?
Options:
(A) Letters from Iwo Jima
(B) 1917
(C) Black Mirror: USS Callister
(D) The Maze Runner
(E) Psycho
Expected: (A)
Analysis: Despite spanning superhero and war genres, all reference movies explore moral ambiguity without clear heroes/villains. Letters from Iwo Jima matches this thematic complexity.
Mixed Pattern (Family-Friendly + Comedy)¶
Based on these movies: Frozen, Coco, Up, Monsters, Inc., Thor: Ragnarok - which would you recommend next?
Options:
(A) Superbad
(B) Jaws
(C) Iron Man 3
(D) Batman v Superman: Dawn of Justice
Expected: (A)
Analysis: Reference set shows preference for family-friendly content with found-family themes plus light comedy. Superbad best matches the comedy preference while being accessible.
Cognitive Skills Tested¶
Core Competencies¶
Without Hints (Knowledge-Based):
- Cultural Knowledge: Understanding movie genres, themes, and stylistic elements from their titles
- Memory Recall: Accessing stored information about specific films
- Implicit Pattern Recognition: Inferring similarities without explicit feature information
With Hints (Information Processing):
- Explicit Pattern Matching: Comparing provided genre and theme lists
- Multi-dimensional Analysis: Weighing multiple explicit similarity dimensions
- Logical Reasoning: Making decisions based on clearly presented criteria
Universal Skills:
- Similarity Analysis: Weighing multiple dimensions of similarity (genre, theme, tone, style)
- Pattern Recognition: Predicting user preferences from limited examples
- Question Sensitivity: Consistent performance across different question phrasings (tracked via template indices)
Applications¶
This test manifold evaluates capabilities essential for:
- Content Recommendation Systems: Understanding user preferences beyond surface-level features
- Analogical Reasoning: Finding deep structural similarities between different examples
- Preference Modeling: Inferring underlying criteria from limited examples