Finding Related Posts in Jekyll

LSI for Jekyll can get slow and did not produce satisfying results for my site. I therefore wrote my own little, largely simplified plug-in hand-tailored for my needs, that does the same job quite well and quite fast.

Note! I have switched from Jekyll to Qgoda a couple of years ago. That means that the "related" feature on this site no longer employs the technique described in this post. Qgoda has the feature to find related posts built in and is also a lot faster than Jekyll.

LSI stands for Latent Semantic Indexing and is a technique to measure the similarity of two texts. The Jekyll docs say that it is very slow and I suspected it to be responsible for the poor performance of Jekyll when generating my side. Unfortunately, LSI was not the culprit. Jekyll is still slow, even after turning LSI off. Still, I had the feeling that measuring the similarity between posts for my site is done better and faster with the information I provide myself, instead of using math.

The idea is simple. The similarity between two posts is estimated with a point system using factors that are cheap to calculate. If two posts share a link (internal or outgoing), this counts as one point. Each common tag contributes two points. If they are in the same category, that counts as three points. Finally, if they are linked to each other, five points are added. The more points, the more similar two pages are considered.

An ideal solution would maybe be to mix the two approaches. A direct link between two posts strongly suggests that they are related, something that LSI cannot find out. On the other hand, LSI may find relationships that are not reflected in common links or categories. However, there is no simple way to retrieve the results of LSI from a Jekyll hook, and I therefore do not consider the LSI results at all.

Another factor that had to be taken into a account was that my site is multi-lingual. The array with similar posts should only contain references to posts in the same language. I had already created a Jekyll hook that precomputes tag and category counts. That hook was now extended to also calculate the similarity between two posts based on the criteria mentioned above.

As a result you can get the list of "similar" posts as follows:

<ul>
  {% for related in site.related[page.id] %}
    <li><a href="{{related.url}}">{{related.title | escape}}</a></li>
  {% endfor %}
  </ul>

The hash site.related uses the page id as the key and is limited to 10 entries. Likewise, you may want to restrict the list to posts that have a similarity index of at least one (or n) points. All that can be easily changed in the source code of the plug-in.

Leave a comment
This website uses cookies and similar technologies to provide certain features, enhance the user experience and deliver content that is relevant to your interests. Depending on their purpose, analysis and marketing cookies may be used in addition to technically necessary cookies. By clicking on "Agree and continue", you declare your consent to the use of the aforementioned cookies. Here you can make detailed settings or revoke your consent (in part if necessary) with effect for the future. For further information, please refer to our Privacy Policy.