A movie passes the Bechdel test if it satisfies 3 rules:
The Bechdel test originated in this comic by Alison Bechdel (image source http://dykestowatchoutfor.com/the-rule):
The data we’re going to work with today have been gathered from a variety of sources by several people. The Bechdel test ratings themselves are from www.bechdeltest.com, where the general public can rate movies according to whether they pass or fail the Bechdel test. Some additional information about the movies comes from www.the-numbers.com. These data were the basis of an article on the topic at http://fivethirtyeight.com/features/the-dollar-and-cents-case-against-hollywoods-exclusion-of-women/. The data have since been added to the fivethirtyeight
package for R. I took those data, scraped some additional information about the movies like the MPAA rating, run time, and ratings from IMDB users from imdb.com. Note that this is not a random sample of movies – which movies made it into the data set was basically determined by which movies were rated by users of www.bechdeltest.com. That means any findings from your analysis in this lab are only tentative.
The following R chunk loads the data and sets the factor levels for categorical variables:
movies <- read_csv("https://mhc-stat140-2017.github.io/data/bechdel/bechdel.csv") %>%
mutate(
bechdel_test = factor(
bechdel_test,
ordered = TRUE,
levels = c("nowomen", "notalk", "men", "dubious", "ok")),
bechdel_test_binary = factor(
bechdel_test_binary,
ordered = TRUE,
levels = c("FAIL", "PASS")),
mpaa_rating = factor(
mpaa_rating,
ordered = TRUE,
levels = c("UNRATED", "NOT RATED", "G", "PG", "TV-PG", "PG-13", "TV-14", "R", "NC-17"))
)
## Parsed with column specification:
## cols(
## year = col_integer(),
## title = col_character(),
## bechdel_test = col_character(),
## bechdel_test_binary = col_character(),
## budget = col_integer(),
## domgross = col_double(),
## intgross = col_double(),
## budget_2013 = col_integer(),
## domgross_2013 = col_integer(),
## intgross_2013 = col_double(),
## imdb_rating = col_double(),
## num_imdb_ratings = col_integer(),
## mpaa_rating = col_character(),
## run_time_min = col_integer()
## )
dim(movies)
## [1] 1794 14
names(movies)
## [1] "year" "title" "bechdel_test"
## [4] "bechdel_test_binary" "budget" "domgross"
## [7] "intgross" "budget_2013" "domgross_2013"
## [10] "intgross_2013" "imdb_rating" "num_imdb_ratings"
## [13] "mpaa_rating" "run_time_min"
head(movies)
## # A tibble: 6 x 14
## year title bechdel_test bechdel_test_binary budget
## <int> <chr> <ord> <ord> <int>
## 1 2013 21 & Over notalk FAIL 13000000
## 2 2012 Dredd 3D ok PASS 45000000
## 3 2013 12 Years a Slave notalk FAIL 20000000
## 4 2013 2 Guns notalk FAIL 61000000
## 5 2013 42 men FAIL 40000000
## 6 2013 47 Ronin men FAIL 225000000
## # ... with 9 more variables: domgross <dbl>, intgross <dbl>,
## # budget_2013 <int>, domgross_2013 <int>, intgross_2013 <dbl>,
## # imdb_rating <dbl>, num_imdb_ratings <int>, mpaa_rating <ord>,
## # run_time_min <int>
str(movies)
## Classes 'tbl_df', 'tbl' and 'data.frame': 1794 obs. of 14 variables:
## $ year : int 2013 2012 2013 2013 2013 2013 2013 2013 2013 2013 ...
## $ title : chr "21 & Over" "Dredd 3D" "12 Years a Slave" "2 Guns" ...
## $ bechdel_test : Ord.factor w/ 5 levels "nowomen"<"notalk"<..: 2 5 2 2 3 3 2 5 5 2 ...
## $ bechdel_test_binary: Ord.factor w/ 2 levels "FAIL"<"PASS": 1 2 1 1 1 1 1 2 2 1 ...
## $ budget : int 13000000 45000000 20000000 61000000 40000000 225000000 92000000 12000000 13000000 130000000 ...
## $ domgross : num 25682380 13414714 53107035 75612460 95020213 ...
## $ intgross : num 4.22e+07 4.09e+07 1.59e+08 1.32e+08 9.50e+07 ...
## $ budget_2013 : int 13000000 45658735 20000000 61000000 40000000 225000000 92000000 12000000 13000000 130000000 ...
## $ domgross_2013 : int 25682380 13611086 53107035 75612460 95020213 38362475 67349198 15323921 18007317 60522097 ...
## $ intgross_2013 : num 4.22e+07 4.15e+07 1.59e+08 1.32e+08 9.50e+07 ...
## $ imdb_rating : num 5.9 7.1 8.1 6.7 7.5 6.3 5.3 7.8 5.7 4.9 ...
## $ num_imdb_ratings : int 64520 217487 501013 168308 70755 125401 175360 227378 29984 168942 ...
## $ mpaa_rating : Ord.factor w/ 9 levels "UNRATED"<"NOT RATED"<..: 8 8 8 8 6 6 8 8 6 6 ...
## $ run_time_min : int 93 95 134 109 128 128 98 123 107 100 ...
I think most of the variables are self-explanatory, but a couple require explanation. The bechdel_test
variable has five levels:
The bechdel_test_binary
variable has two levels:
bechdel_test
is “ok”)bechdel_test
is something other than “ok”)# Your code goes here
You should use the summarize()
function so that the result is stored in a data frame.
# Your code goes here
Make an appropriate plot of the bechdel_test
variable.
# Your code goes here
# Your code goes here
You should use the group_by
function and the summarize()
function so that the result is stored in a data frame.
# Your code goes here
# Your code goes here
# Your code goes here
Make a few more plots of your choosing. Try different plot types and see what relationships you can find among the variables in the data set. Add new R chunks as needed.