Using Tags With Multilingual Jekyll Sites

Jekyll stores tags site-wide. You need a little Ruby code if you want to want to maintain them per-language instead, as you would want on a multi-lingual site. In a previous post Multilingual Web Sites with Jekyll I had described how to set up a site with Jekyll supporting documents in multiple languages. One thing that was still missing was support for tags respectively keywords.

Requirements

I create category pages - that is pages listing all posts from a particular category - manually because I want to add a description to them anyway. For tag pages I prefer to have them maintained automatically.

I also want to be able to allow tag sets for languages to overlap, so that one and the same tag could be used for documents in different languages. The tag page should, however, only list articles for one language at a time.

Using jekyll-tagging

The standard plug-in for generating tag pages seems to be jekyll-tagging. I configured it in _config.yml like this:

tag_page_layout: tag_page
tag_page_dir: tags
tag_feed_layout: tag_feed
tag_feed_dir: tags
tag_permalink_style: pretty

Note: For real usage, you also have to tell jekyll to use the plug-in but I will skip that step because the solution shown here will be different anyway.

The layout template _layout/tag_page.html is used for generating the tag pages (line 1), and they should be written into /tags/ (line 2). I also wanted RSS feeds for each tag. They are configured in the same manner in lines 3 and 4.

Finally we tell the plug-in to use pretty links (line 5).

The template _layout/tag_page.html looks like this:

---
layout: default
---
{% assign posts=site.tags[page.tag] | where: "lang", page.lang | where: "type", "posts" %}
<div class="col-md-8">
  <span>{{ site.t[page.lang].tag }}: <i class="fa fa-tag"></i>{{ page.tag }}</span>
  {% for post in posts %}
  <article class="blog-post">
    {% if post.image %}
    <div class="blog-post-image">
      <a href="{{post.url | prepend: site.baseurl}}">
        <img src="{{post.image}}" alt="{{post.image_alt}}">
      </a>
    </div>
    {% endif %}
    <div class="blog-post-body">
      <h2>
        <a href="{{post.url | prepend: site.baseurl}}">{{ post.title }}</a>
      </h2>
      <div class="post-meta">
        <span><i class="fa fa-clock-o"></i>{% include {{ page.lang }}/long_date.html param=post.date %}</span> {% if post.comments %} / <span><i
          class="fa fa-comment-o"></i> <a href="#">{{ post.comments }}</a></span>
        {% endif %}
      </div>
      <p>{{post.excerpt}}</p>
      <div class="read-more">
        <a href="{{post.url}}">Continue Reading</a>
      </div>
    </div>
  </article>
  {% endfor %}
</div>

This only interesting line is line 4. The collection site.tags is filled by Jekyll. We use the document attribute tag as the lookup key into that hash and filter that by the document language and document type. Unfortunately that does not work.

The first problem is that jekyll-tagging does not know about a document attribute lang and therefore cannot set it. And it only creates one tag page for each tag. But we want one tag page for each tag and for each language that uses it.

Writing a Wrapper Around jekyll-tagging

The only solution was to write a wrapper around the plug-in. Unfortunately I had never written a line of Ruby code before. But the task looked so trivial to me that I decided to give it a try unbiased by any Ruby knowledge.

My plan was to abuse the configuration option ignore_tags of jekyll-tagging for my purposes. Instead of invoking the plug-in once, the wrapper should invoke it for every language, each time giving the plug-in a modified configuration, especially setting the value of ignore_tags to the list of tags that did not occur for the current language.

Skeleton For The Plug-In

I created a file _plugins/ml_tagging.rb:

require 'jekyll/tagging'

module Jekyll
    class MultiLangTagger < Tagger

        @types = [:page, :feed]
        
        def generate(site)
            # Generate some pages.
        end
    end
end

I called my own plug-in MultiLangTagger and subclassed it from Tagger (line 4), the generator class defined by jekyll-tagging.

Line 6 looks suspicious. It defines a class variable and was copied one to one from the original source of jekyll-tagging. Obviously my own plug-in did not inherit the variable from the super class. Somebody with more profound Ruby knowledge than me can probably explain that.

The only method that Jekyll generator plug-ins have to implement is generate, see line 8. Its single argument is a Jekyll::Site instance that can be used to retrieve configuration, pages, posts, tags and so on.

Grouping Tags

The first task was to group the tags used by language. I modified the generate method as follows:

def generate(site)
    # Iterate over all posts and group the tags by language.
    for post in site.posts.docs do
        lang = post.data['lang']
        site.config['t'][lang]['tagnames'] = 
                {} unless site.config['t'][lang]['tagnames']
        tagnames = site.config['t'][lang]['tagnames']
        tags = post.data['tags']
        for tag in tags do
            slug = jekyll_tagging_slug(tag)
            if tagnames[slug]
                if tagnames[slug] && tagnames[slug] != tag
                    raise "Tag '%{tag1}' and tag '%{tag2}' will create the same filename.  Change one of them!" % { :tag1 => tagnames[slug], :tag2 => tag }
                end
            else
                tagnames[slug] =  tag
            end
        end
    end

end

site.posts.docs is a hash that contains all posts. For each post I first retrieve the language from the attribute lang. On my site it is guaranteed to be set, so there is no check wether the attribute exists or not.

My site configuration already contains a key t like translation that contains string translations for each supported language. I decided to stuff the tags into that structure with a key tagnames (lines 5-6) because there is an analogous slot catnames for categories.

In line 9 I iterate over all tags for the current post. In the next line I normalize the tag into a file-system-safe form by calling the helper function jekyll_tagging_slug(). jekyll-tagging uses that function for determining the name of the output file.

For my site it is important that there is a one-to-one relationship between tags and the name of the corresponding tag page. Lines 11 to 17 enforce that. That step is not strictly necessary but rather a QA measure. I want to ensure a consistent spelling of tags.

The result is a data structure that - translated from Ruby into YAML - would look like this in _config.yml:

t:
  en:
    dns: DNS
    system-administration: "System Administration"
    jekyll: Jekyll
    development: Development
  de:
    dns: DNS
    systemadministration: "Systemadministration"
    jekyll: Jekyll
    entwicklung: Entwicklung
Invoking the Super Class Generator ---------------------------------- Now that the tags are grouped, the `generate` method of the super class has to be invoked but with a tweaked configuration for each language. The following code added to the `generate` method does the job:
saved_tag_page_dir = site.config['tag_page_dir']
saved_tag_feed_dir = site.config['tag_feed_dir']

for lang in site.config['t'].keys
    site.config['tag_page_dir'] = '/' + lang + '/' + saved_tag_page_dir
    site.config['tag_feed_dir'] = '/' + lang + '/' + saved_tag_feed_dir
    site.config['ignored_tags'] = site.tags.keys - site.config['t'][lang]['tagnames'].values

    super
end

We need distinct output directories for each language. Otherwise tag pages for tags that are present in multiple languages would overwrite each other. In line 1 and 2, I get the locations for the tag pages from the configuration and store a copy of the original values.

Then I iterate over all available languages (that are the keys of the hash slot t) and overwrite the configuration variables tag_page_dir and tag_feed_dir with the language-specific location. I played simple here and just prepend the language identifier to the original configuration values.

Line 7 is important. jekyll-tagging uses the configuration variable ignored_tags for suppressing tag pages for particular tags. I abuse that and temporarily fill it with an array that contains the difference between all tags and the tags used for the current language.

Line 9 calls the super method, that is the generate method of jekyll-tagging, and let's it do its job.

At that point I ran into a show stopper. As it turned out, jekyll-tagging version 1.0.1 has a bug and the ignore mechanism actually does not work, see the bug report on github for details.

Unfortunately, I did not succeed in monkey-patching the bug away, and I ended up patching the source file manually. Search for the file tagging.rb from the jekyll-tagging distribution and change the method active_tags to read as follows:

def active_tags
  return site.tags unless site.config["ignored_tags"]
  site.tags.reject { |t| site.config["ignored_tags"].include? t }
end

Alternatively, wait for the bug to be fixed upstream.

One part of the solution is still missing. We have to make sure that the tag pages all have an attribute lang containing the language code.
jekyll-tagging does not have hooks for injecting additional data. Therefore we do that ourselves:

for page in site.pages
    if page.data['tag']
        dir = Pathname(page.url).each_filename.to_a
        lang = page.data['lang'] = dir[0]
        description = site.config['t'][lang]['taglist']['description']
        page.data['description'] = description % { :tag => page.data['tag'] }
    end
end

Still in our generate method, we iterate over all pages. Short of any better method for detecting tag pages, we check whether the document attribute tag is present, and extract the first path name component from the document URL. Note that in order to be able to use Pathname(), you have to add a require 'pathname' to the beginning of the file!

While we are at it, we also pimp up the generated tag pages by giving them a description. The description comes from a language-specific string in _config.yml and may contain a placeholder %{tag} for the current tag. Line 6 interpolates the tag into that string.

Finally, we have to restore the configuration to its original values because they are used at the next invocation of our plug-in:

site.config['tag_page_dir'] = saved_tag_page_dir
site.config['tag_feed_dir'] = saved_tag_feed_dir

At this point, our implementation is more or less functional. Our own generate method calls generate of jekyll-tagging once for each language and with a modified configuration. And we also inject two attributes lang and description into the generated pages.

More Tweaks

Jekyll obviously detects that another generator plug-in has been loaded and insists on executing it as well, which leads to problems. In my particular site setup the liquid template engine bails out if a page or post does not have a lang attribute. Therefore, it has to be prevented that the original plug-in generates pages without a configuration patched by us.

I could not find a way to make Jekyll prevent calling the super class generator. Therefore, after my plug-in has invoked the super class generator for each language, I set the list of tags to be ignored to the complete tag list of the site:

site.config['ignored_tags'] = site.tags.keys

Now the super method gets invoked but it will not generate any pages.

Another problem was that I wanted to display the count of documents for each tag in overview pages. I solved that by writing one more plug-in, this time a hook plug-in that computes these counts and saves them in the site configuration. And since I actually needed the same for categories, I precomputed the category counts as well.

You can find a download link to that file below.

Questions

How can I link to a tag page?

Like this:

<a href="/{{ page.lang }}{{ tag | tag_url }}">{{% tag %}}</a>

The filter tag_url is defined by jekyll-tagging and gives the URL of a tag page. We have to prepend the language.

How can I get the number of documents for a tag per language?

Like this:

{{ site.tagcounts[page.lang][tag] }}

The count is provided by the hook precompute.rb.

How can I get the number of documents in a category per language?

Likewise:

{{ site.catcounts[page.lang][category] }}

The count is provided by the hook precompute.rb.

How do I create language-specific tag clouds?

Tag clouds? Are you kidding me? It is 2016!

Downloads

You can download all source files needed below:

_plugins/ml_tagging.rb
wrapper plug-in for `jekyll-tagging`
_plugins/precompute.rb
hook plug-in that precomputes tag and category counts
_layouts/tag_page.html
template for tag pages
_layouts/tag_feed.xml
template for tag feeds
_includes/feed.xml
include for all feeds
_includes/tag-feeds.xml
include for listing all tag feeds for a language

blog comments powered by Disqus