Using Python to find Song titles

Ram Narasimhan
6 min readOct 29, 2018

--

A friend of mine in Chennai plays in a band that sings covers. He sent along an image of a flyer advertising an upcoming performance. Tucked in the flyer was neat a little puzzle. Rather than merely list some of the songs they’d be playing, the group had come with a nice twist. They included 30 English words all jumbled up, which were the titles to songs they’d be performing. We had to decipher the song titles.

This is exactly the kind of gimmick that appeals to me!

Some of the songs were easy for me to guess, but not all. This is where the ability to write a few lines of Python comes in handy. All those hours of reading of StackOverflow can have minuscule payoffs.

So, if you want to learn to leverage the power of an expressive programming language such as Python to get these sorts of tasks done, read on. I will walk you through the logic and a few lines of easy-to-follow Python code.

First, let me share those 30 words with you. See if you can figure the songs. (These are mostly pop hits and classic rock from the 70’s and the 80's.)

The list of words that make up 10 songs. Can you guess all the songs?

First, how would you approach this problem? Forget about whether or not you know Python. (The syntax can be looked up.) What is the logic you’d use?

Let’s us say that I gave you a big database of 1 million song titles. Now, would you be able to use the database to unscramble the song titles from that jumbled up list of words above?

The Logic we will use:

Here are the steps I ended up using. For the sake of this article, I’ve focused more on the clarity of illustration rather than on code efficiency.

Step1: Using a little bit of web-scraping, get hold of a large “popular song bank” These will serve as the corpus of titles for us to compare against.

Step 2: Take the songs in the massive database one at a time and split them into words.

Step 3: Check if all the words in any song title are present in our clue-words (‘bag-of-30-jumbled-words’). If all are present, that means that we’ve found one song! Just print it out.

(Optional Step 4: Strictly speaking, each time we “use up” some words from our bag of words, we have to remove them. But this is a fun exercise, so let’s keep it simple and ignore this detail)

First, I create a simple Python list of the jumbled words (in lowercase) calledclue_words.

clue_words = [x.lower() for x in “”” We Want Champions Farewell To Mamma In Of Jambalaya Gonsalves The Have Sound Girls Air Sway Que Mia Sera Are Music Sera The Just Jamaica Fun Love is The Speedy “””.split()]

Step 1: One line of code to get hold of a database of popular song titles

A quick web search found quite a number of song databases. I’ll use just one for illustration here, a website that has all the Billboard hits from 1964.

Pandas is a very popular module (library) of useful analytical functions built for Python users. One lesser-known feature in Pandas is how easily we can web-scrape — convert any webpage (table or database) into a data table.

import pandas as pd
songs_url = 'https://raw.githubusercontent.com/walkerkq/musiclyrics/master/billboard_lyrics_1964-2015.csv'
songs = pd.read_csv(songs_url, encoding = "ISO-8859-1")

Using that last line of Pandas code, I read all the songs into a nice table (called a ‘Data Frame’)

With that one line of code, we now have 5100 titles. (There are dozens of other URL’s to try as well)
We now have 1000s of titles in an easy to use table. The column of interest is Song.

Step 2: Go through this known database of song titles, one at a time

Since our song titles are in one Pandas data frame column, we can iterate through them, in a simple for loop, one row at a time. The split() command takes the full title string and ‘tokenizes’ them into individual words.

Going through our Song Database, one by one, splitting the title into ‘tokens’ (words)

So now we have 2 lists. First, our original list of 30 or so jumbled words clue_words and a large table of song titles, broken down into a list of words.

Step 3: Check if all the words in any song title are present in our 30 clue_words

This is the workhorse step. The actual word-by-word comparison gets done here. (Make sure you understand this step properly!)

Now, if we all of the words of a title (say, ['crying', 'in', 'the', 'chapel'] ) are present in our 30 clue_words list, we’ve found one song that the band is planning to play.

All the words of one song (Che Sera Sera) are present (True) in the Jumbled words, so we’ve found one song
Since we found one False, this song (Pretty Woman) is not in the jumbled list of words

We can make this code more Pythonic into the following one-liner:

For each word in the song, is it present in the list called clue_words

For the complete song title to be present, we want ALL True’s. Python has a convenient function for just that, aptly called all().

Make sure that you understand the Python construct above. We are asking if all the elements of one list (song words) are present in the bigger list (clue_words). [element in big_list for element in small_list] will take each element of the small_list and check if it is present in the big list. This check returns a series of True if present, False if not present. all() returns one True if all are true, but if even one check is false, it returns False.

Quick Recap: In English: are all songwords present inside clue_words?

In Python: all(word in clue_words for word in songwords)

Wrap-up

When I put all the steps together and ran the code (shared here). Here are some of the songs found:

['love', 'is', 'in', 'the', 'air']
['mamma', 'mia']
['girls', 'just', 'want', 'to', 'have', 'fun']

Practice: If you want to try it yourself, then read in a large database of songs using Pandas, and try answering:

  1. What is the most common word used in song titles? (Is it love?)
  2. How to use Python to find some “crazy” song titles? (Crazy words, very long or very short song titles)
  3. Which band has had the most number of “hit” songs?

--

--

Ram Narasimhan

Data driven policies and stories, coding, Python for data analysis, Chess. @ramnarasimhan