Lord of the Rings is a fantasy novel trilogy written by J.R.R. Tolkien. Until today, it is one of the most popular and influential works of fantasy literature, and has been adapted into several films, video games, and other media. The story follows a group of characters as they embark on a quest to destroy a powerful ring that has the potential to enslave the world.
In this analysis, we will explore the distribution of speakers time in Lord of the Rings. For that, not the minutes of speaking in the movies, but the number of words spoken by each character in the three books will be used as a proxy for the time spent speaking (with some tweaking to include movie-only characters). This is due to practical reasons (data availability), as well as the fact that the movies are an adaption of the three books, making the novels the source material. We will analyze the amount of time each character spends speaking in the novel, and how it is distributed among the different characters.
This analysis will help us understand which characters have the most dialogue and how the speaking time is distributed among the characters in the story. We will also explore any patterns or trends in the distribution of speakers time, and how it may relate to the overall narrative of the novel. This also in light of the gender representation in the novel, as the movies are often criticized for their lack of diverse representation, like not passing the Bechdel-Wallace Test.
The data used for this analysis will consist of two datasets.
This dataset is concerned with the number of words spoken contains
the number of words spoken by each character in all three volumes of
Lord of the Rings. The dataset was created by counting the number of
words spoken by each character in the novels, and is available in the
project folder under the name WordsByCharacter.csv.
The creator of this dataset is FSharpAdvent (on GitHub). All information about it, as well as the original dataset can be found here: https://github.com/MokoSan/FSharpAdvent/blob/master/Data.
The data is organized in a tabular format, with each row representing a character and the number of words they spoke in a particular chapter of the novel. The columns include the name of the film (book), the chapter, the character’s name, race and the number of words spoken by them:
| Film | Chapter | Character | Race | Words |
|---|---|---|---|---|
| The Fellowship of the Ring | 01: Prologue | Bilbo | Hobbit | 4 |
| The Fellowship of the Ring | 01: Prologue | Elrond | Elf | 5 |
| The Fellowship of the Ring | 01: Prologue | Galadriel | Elf | 460 |
The columns are defined as follows:
The second dataset contains information about the characters in Lord of the Rings, including their name, race, gender and realm.
The dataset is available in the project folder under the name
InformationByCharacter.csv. The creator of this dataset is
me, Emily. The data was collected and compiled by me, based on
information from the novels and other sources, such as the
LOTR-Wiki.
The data is organized in a tabular format, with each row representing a character:
| Character | Race | Gender | Realm |
|---|---|---|---|
| Aragorn | Men | Male | Gondor |
| Arwen | Elf | Female | Rivendell |
| Bilbo | Hobbit | Male | The Shire |
The columns are defined as follows:
Disclaimer: The data used in this analysis is not official data, but rather data that has been collected and compiled by fans of the Lord of the Rings series (and me). Therefore, there may be inaccuracies or inconsistencies in the data, and it should be interpreted with caution. However, it is still a valuable resource for analyzing the characteristics of speakers time in the novels.
The first step in the analysis is to import the data from the two datasets into R. As the data is split into two different files, we will need to import both datasets and then merge them together based on the character’s name. This will allow us to have all the relevant information about each character in one dataset, which will make it easier to analyze the distribution of speakers time.
Hint: Only merge the two datasets based on the character’s name, after summarizing the number of words by character (Step 2.2. of Data Analysis).
Please note, that characters can appear multiple
times in the WordsByCharacter dataset, as they can
speak in multiple chapters across the three books. In the
InformationByCharacter dataset, each character appears only
once, as it contains general information about the characters.
Please check:
Character or
Race column between the two datasets, the corresponding
rows should be merged based on the content of dataset 1
(WordsByCharacter).Once the data is imported, we can begin analyzing the distribution of speakers time.
The first step is to calculate the total number of words spoken in each of the three books. For that, create a new table that includes:
Please visualize the distribution of the total number of words across the three volumes, using a pie chart similar to this:

The next step is to calculate the total number of words spoken by each character across all three books. For that create a new table that summarizes the total number of words across all volumes spoken by each character.
Answer these questions:
Please visualize this distribution using a pie chart, similar to this:

Next, we can analyze the distribution of speakers time based on different characteristics of the characters. Summarize the number of words spoken by characters based on gender, realm and race.
Answer these questions:
Please visualize each of these distributions using a donut chart, similar to the one above.
Finally, we can analyze the speaking time of Frodo, the
main character of the story

