Today’s blog post is once again about the visualization of movie data. As I already experimented with the IMDb dataset to compare the average age of actors and actresses, I wanted to try something a bit different. One thing that I have always found cool is the visualization of movie plots (e.g. xkcd). The reason why I never attempted to do something like this myself was that I had no idea from where I could get the required data. Of course, there is always the possibility to generate the data manually, but that is usually a tedious task that I try to avoid. Fortunately, I found a much more convenient data source, while I was watching a movie on the Amazon Video app.
Its X-Ray feature shows you relevant IMDb information based on which actor is currently in the scene. The app does that based on a single text file which contains the information for when a character appears in a scene. At the end of the post, I will describe how you can extract the file yourself. First, I downloaded the X-Ray file for the latest Star Wars movie. Based on this data we can compare the characters by their screen time.
I noticed that the numbers are not always 100% accurate because some characters are only visible in parts of a scene. However, it should not be a major problem for which we are using them in this post. Next, I used the ggplot2 package in R to plot the following Gantt chart:
We can use the X-Ray data, not only to identify in which scene a character appears but also with whom else. To visualize this information, I used Gephi, an open source tool to plot networks. My assumption is that the longer characters appear on-screen together, the closer their relationship is. The circle size is based on their total screen time.
I hope these examples show what you can do with Amazon X-Ray data relatively quickly. The best thing of this approach is that it only requires a minimum manual work. So, here are Gantt charts for three other movies I enjoy:
How to Get X-Ray Data?
The X-Ray feature is based on an unencrypted JSON file which can be downloaded with the Chrome browser. Unfortunately, those files are not publicly available (signed CloudFront URL), meaning that you have to start streaming the movie before you can download the file. This also means that you are limited to the content included in your Prime subscription, or you need to rent/buy the movies in which you are interested. Nevertheless, I think it is still an interesting source, especially when you consider the alternatives.
- Start Developer Tools: Menu > Tools > Developer Tools
- Start streaming a video & close the player after a few seconds
- Select the following Developer Tools settings:
- Click on the gray record button to capture the network traffic
- Start streaming the video again
- Now the following file should appear: data.json?Expires=
- Right-click on the file > Open Link in New Tab > Save Page As…
Then you can use in R the jsonlite package to load the JSON file and then do, for example, something like this:
library(jsonlite)
library(ggplot2)
data <- fromJSON("starwars.json", flatten = TRUE)
e <- data$resource$events
e$start <- as.numeric(as.data.frame(e$when)[1,])
e$end <- as.numeric(as.data.frame(e$when)[2,])
ggplot(e, aes(colour=e$character)) +
geom_segment(aes(x=e$start, xend=e$end, y=e$character, yend=e$character), size=5)