May 16, 2020 4 min to read

Create Word Cloud Images with Python

Can you guess the pokemon above? How about the pokemon below?

I came across a fun and simple python package called wordcloud. I will give a short tutorial in this post, adapted from this example on the official site.

What’s a Word Cloud?

Word cloud generators are visualization tools used for text analysis. In general, images created are composed of words of different sizes and colors. Popular words would be printed in larger font sizes, whereas the less frequent words would be displayed smaller. Moreover, the image can be generated to copy the coloring of a base image.

In this tutorial, I will show how to display an image using a word cloud. I will not focus on text popularity (though it is easy to modify the example for such purpose). We will take a base image (Pikachu!) and generate something like this:

Install python package

First, you will need the wordcloud package. Use pip or conda to install it. I use wordcloud version 1.7.0 and Python 3.7 for this tutorial.

pip install wordcloud

You may need to install additional pacakges like matplotlib, numpy, etc. If you want to install the exact same package versions as mine, you can grab my pip requirements.txt file on github. And run pip install -r requirements.txt in a terminal.

The python code

I will work in a Jupyter notebook. The input files can be found on github.

First, let’s import the packages.

from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
import os
from wordcloud import WordCloud, ImageColorGenerator

Now, we run the word cloud generator with the text given in file pokemon-names.txt and the base image file pikachu.png. These files are located in the current working directory.

text = open('pokemon-names.txt').read()
orig_image = np.array(Image.open("pikachu.png"))

The text file pokemon-names.txt contains only names of Pokemons, like this:

...
Caterpie
Chansey
Charizard
Charmander
Charmeleon
Clefable
...

The image orig_image will be used as a mask, where every pure white pixel (with RGB code #FFFFFF) will be left white in the output image. We can use any color generator, but if we can to copy the coloring style of the base image, we need to add this line:

image_colors = ImageColorGenerator(orig_image)

Next, we initialize the word cloud object. Since we have only limited number of words in our text corpus (it contains the list of Pokemon Gen 1 names), we set repeat=True and stopwords=set() to not ignore any words from the list (just in case). Note we are not interested in the frequency of the words. Other important parameters are max_words, max_font_size and min_font_size that will guide the generation of the image. Changing these parameters may not always change the output. Even if you set max_words=10000, there might not be enough space to display that many words.

wc = WordCloud(background_color="white", max_words=1000, repeat=True,
               mask=orig_image,
               color_func=image_colors,
               stopwords=None,
               max_font_size=40, min_font_size=3, random_state=11013)
wc.generate(text)

If you want to change the background color, the color input needs to be compatible with HTML style. A list of named colors can be found at htmlcolorcodes.com/color-names/. If you want to define a color using rgb format, set background_color="rgb(150,150,255)" for example.

Let’s now display the word cloud in Jupyter and save it to the file output.png. You can try different interpolation algorithms like bicubic, gaussian, and more. I find bilinear works nicely for text.

plt.figure(figsize=(15,10))
plt.axis("off")
plt.imshow(wc, interpolation="bilinear")

# must save image before calling plt.show()    
plt.savefig('output.png',bbox_inches='tight')

# display image in Jupyter
plt.show()

There you go!

Source code

The Jupyter notebook, text file, image and pip requirements are available on github.

Finally, the answers are…

In case you are still scratching your head over the Pokemon images shown at the start of this post, here are the answers: #1 is Pikachu, and #2 is Mew.

chwyean's blog