Poetry & Data I: Exploration

Florian Maas

By Florian Maas on April 5, 2022

Estimated Reading Time: 9 minutes

I am a really big fan of the poems by /u/poem_for_your_sprog on Reddit. For those of you who don't know this person yet; they are well known on Reddit for writing short poems in reaction to the comments of others on /r/AskReddit threads. To give you an example, one that I particularly like is the following, which was written in response to a thread full of responses from ICU workers, who despite their best efforts are not always able to save every patient they meet:

You’ll weather the wind and the rain and the rough -
And sometimes you’ll try but it won’t be enough.

You did what you could,
but it’s not up to you.

You did what you could,
and that’s all you can do.


I always find it difficult to explain why I enjoy these poems so much. Some, like the above, stand out in simplicity; six short lines that bring a message that speaks to many of us. But there's more elaborate ones, and really funny ones too. I would love to understand a little bit better what exactly it is that makes these poems so appealing to me, and to get a better grasp of the artform behind it. Sadly, I know absolutely nothing about poetry. And since I'm also not as good with words as /u/poem_for_your_sprog, I will try to understand poetry in the only way I know how: By using data.

My plan is to create a dataset of all the poems /u/poem_for_your_sprog has written, and create a number of posts that dive into these works. In this first post, I want to explore some basic statistics about these poems.

You can find the notebook with python code that was used to create this page on GitHub.

Comments per day

Let's start by simply looking at the amount of poems over time. We can see below that on the extraordinarily productive days sprog writes about 5 or 6 poems, and their productivity seems to have increased somewhat over time. The outlier of 10 comments in June is the 'Ask Me Anything', or AMA in short, where they answered questions of fellow Redditors.

Number of comments on Reddit per day by u/poem_for_your_sprog


This plot was useful to determine the AMA outlier, so lets remove the observations from that day from our dataset. However, the daily plot does not really help us in identifying a trend, so let's aggregate the data to monthly buckets to get a clearer view:

Number of comments on Reddit per month by u/poem_for_your_sprog


Not all poems by sprog have been posted on /r/AskReddit threads, they also responded to posts on some other subreddits. In total, there are 3,415 comments by /u/poem_for_your_sprog on Reddit. Out of those, 96.0% were comments within /r/AskReddit. It would be interesting to find out to which other subreddits sprog has posted their poems in the past. To gain insight in the temporal pattern of these poems and in the distribution of these poems over the subreddits simultaneously, we can create a plot with multiple timelines and denote the comments as events in a scatter plot. You can hover over the plot to read the poems.

Timeline of poems outside of /r/AskReddit by /u/poem_/for_your_sprog


The total number of subreddits that sprog has written poems on is fairly limited. It's also clear to see that in the early days there were many more comments outside of /r/AskReddit threads than there have been in the past few years.

It also seems like we have some more outliers to filter. For example, by hovering over the marker in December 2013 in the timeline of /r/worldnews we find a very long line that is clearly not part of a poem; that's a regular comment. So let's continue by analyzing the basic structure of the comments to see if we can identify a method to filter out these outliers.

Average line length

Sprog writes both poems with very short lines, as well as poems with longer ones. A histogram of the average number of characters on a line per poem should give us a better idea of the distribution:

Histogram of the average characters per line by u/poem_for_your_sprog


There are three clear outliers, which upon further inspection are not poems, or contain a lot of text alongside the poem. We will filter those out for now. Furthermore, the peak around 46 is interesting to see. My initial guess was that some rhymes are written in the rhyme scheme abab, while others are split out over shorter lines in the rhyme shape abcb defe, so the former would on average have lines twice as long as the latter. But that does not seem to hold; then the first peak should be around 23 instead of 30.

When I took a look at the rows corresponding to that peak, I found out that they all had a similar 'flow' to them. A bit of Googling made me realize that this is called poetic metre. There, I learned something about poetry already. It turns out that these are a set of poems that sprog usually writes in 'Anapestic tetrameter'. Anapestic tetrameter is a metre with four anapestical feet per line, where an anapestical foot is two unstressed syllables, followed by a stressed one. So, denoting a stressed syllable as / and an unstressed syllable as x, an anapestic tetrameter can de denoted as follows:

x x / x x / x x / x x /

However, sprog usually omits the first unstressed syllable, so it becomes:

x / x x / x x / x x /

To get a better idea, here is an example of one of these poems:

you don't need a measure of treasure to fly
to sporting success on a broom in the sky...
to eros alone in the sight of the stars...
to space on a ship that's intended for mars.
you don't need a mountain of money to go
where peter and susan await in the snow...
where planets contend and defend for a spice...
where alice adventures with hatters and mice.
you don't need a wallet of wealth and of worth
to start on a journey across middle earth...
to fight in the night with your sword and your steed.
you don't need a fund or a fortune to read.


I feel there's a lot more to learn about metre. But that seems to be a very broad topic, so let's save that for another post, and let's try to get a better feeling of the basic structure of the poems first. For example, it would be interesting to take a look at some of the poems with very short or very long line lengths to see what they look like. However, listing all the poems here would make this post quite long, so let's just plot them. In the figure below, I have put the 100 poems with the shortest average line length and the the 100 poems with the longest average line length in a plot that allows you to read the poems by hovering over the points.

100 poems with the shortest and 100 poems with the longest line length by u/poem_for_your_sprog


If you're done reading, let's move on to another histogram; the number of lines per poem.

Histogram of the number of lines per poem by u/poem_for_your_sprog


This looks like something one might expect as well; poems usually have an even amount of lines, since we usually rhyme one line with another. The most common rhyme schemes I can think of are abab, and abcb, so it also makes sense that the highest peaks are at 4, 8, 12, and 16, which are multiples of four.

The values directly following those numbers (5, 9, 13 and 17) also are relatively high. This seems to be caused by a characteristic way of breaking the last line into two lines to build some tension or to give some extra stress to the final line. An example:

you're disturbed and defiled and depraved and debased?
you're obscene and corrupt and debauched and disgraced?
you're perverse and profane and you freely confess?
and with pride, he replied with a confident
... yes.


And sometimes multiple lines in a poem are split up, or one line is split up into more than two parts, explaining the fact that the peak after each multiple of four decays until the next multiple of four. An example of this:

and when the darkness comes for me
to take me where i'm meant to be -
to guide me out beyond the black,
forever on,
and never back -
to lead me through the final door
till all that is and was before
is nothing more than dreams at night -
i will not fret.
i will not fight.


Upvotes & Awards

On Reddit, there are two ways to show appreciation for a comment. One can upvote a post, of spend some actual money to give the post an award. The total number of upvotes on poems by /u/poem_for_your_sprog is 9,104,079, a considerable amount.

Next, let's take a look at the awards that sprog has earned. Initially, there were three types of awards;

  • Silver, granting the user.. nothing.
  • Gold, granting the user Reddit Premium for a week
  • Platinum, granting the user Reddit Premium for a month

Currently, there are many more types of awards to react to a post, with names such as 'Wholesome', 'Helpful', or 'Bravo!'.

Below is a chart which shows the breakdown of the awards /u/poem_for_your_sprog has been given:

Awards received on poems by /u/poem_for_your_sprog.


That's a lot of gold!

Below are three more graphs. The first is a histogram of the number of upvotes per post, and the second is a histogram of the number of awards per post. The last graph shows the 100 most upvoted poems and the 100 poems with the most awards written by /u/poem_for_your_sprog on Reddit with the number of upvotes on the y-axis and the number of awards on the x-axis. You can hover over the points to read the poems!

Histogram of the upvotes on poems by u/poem_for_your_sprog


Histogram of the number of awards per comment by u/poem_for_your_sprog


Upvotes versus number of awards of the top poems by u/poem_for_your_sprog


Hovering over the right side of the plot, we find quite some poems that have many lines, which might lead us to believe that longer poems are awarded with silver, gold or platinum more often. The correlation plot below confirms that; the number of awards and the number of lines have a positive correlation. Also, in the plot we find two things that we might have expected to find; score and total awards received are positively correlated, and poems with many lines have in general a lower average line length, although the correlation coefficient of the latter is not very large.

Correlation plot


That's enough exploring poems for now! This helped in providing a better sense of the data by doing some initial exploration, but I think there's still a lot to find that can help me understand the art behind the poems better. In the next post I will try to dive a bit deeper into the things that make a poem a poem, such as meter or rhyme.