I’m not exactly a film buff but I’ve always been slightly annoyed by the standard metric for how successful a film is, which is through comparing box office takings resulting in a list like this. Because of currency inflation these rankings are biased towards more recent films and ultimately they are pretty useless for their intended purpose.
I had an idea for an improved ranking system which adjusts the box office gross figures due to inflation and population changes, the end result being a ratio of Box Office gross (inflation adjusted) $/capita for each film. I think this is more useful because it measures the film against the setting of its time of release and the size of the potential market. It’s reasonable that if one film is released in a country with 30 million people it would probably gross less than a slightly worse film released in a country with 300 million people. This method tries to adjust for that and while it isn’t ideal, for reasons detailed below, it did yield some interesting results. Here is a graph of the top 15 films (click to enlarge):
The rank change on the right-hand axis compares the population-and-inflation adjusted ranking to inflation-only rankings. Interestingly Gone with the Wind is still way out in front at around $12.50/capita, grossing nearly twice per capita than Snow White, which jumped up 8 places with the population adjustment. A bit of research indicates this could be due to early cinema (GWTW’s was released in 1939) having multiple releases spread over several years which would smudge the numbers using this metric and could explain the big difference. In my opinion this method gets more reliable for more recent releases when films had a country-wide release in a short period of time.
This rank change also gets a lot bigger the further down the rankings you go. The top 200 films:
I think this is pretty cool, the films ranked very highly using the old method don’t move much when using my metric but the remaining ones show a massive variation in ranking. There is also a pretty nice power-law for the $/capita data and it’s something I might look at to see if there’s a relation between the financial success of the film and factors such as advertising, cast, director etc.
This started as a way to teach myself some python objects, handling real-world data and the Plotly online plotting website, to see if I enjoy this kind of thing. I’m finishing my PhD in 9 months or so and I don’t really know what I’ll do next but data analysis is one possibility. I’m going to put the source code and a pastebin of the data I used in another blog post as it might be of use to someone. I found when trying to get the data together it’s actually pretty hard to find some of this information. Anyway, let me hear any comments or questions about this post below.