Search Topics

After having created a BERTopic model, you might end up with over a hundred topics. Searching through those can be quite cumbersome especially if you are searching for a specific topic. Fortunately, BERTopic allows you to search for topics using search terms. First, let's create and train a BERTopic model:

from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups

# Create topics
docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))['data']
topic_model = BERTopic()
topics, probs = topic_model.fit_transform(docs)

After having trained our model, we can use find_topics to search for topics that are similar to an input search_term. Here, we are going to be searching for topics that closely relate the search term "motor". Then, we extract the most similar topic and check the results:

>>> similar_topics, similarity = topic_model.find_topics("motor", top_n=5)
>>> topic_model.get_topic(similar_topics[0])
[('bike', 0.02275997701645559),
 ('motorcycle', 0.011391202866080292),
 ('bikes', 0.00981187573649205),
 ('dod', 0.009614623748226669),
 ('honda', 0.008247663662558535),
 ('ride', 0.0064683227888861945),
 ('harley', 0.006355502638631013),
 ('riding', 0.005766601561614182),
 ('motorcycles', 0.005596372493714447),
 ('advice', 0.005534544418830091)]

It definitely seems that a topic was found that closely matches "motor". The topic seems to be motorcycle related and therefore matches our "motor" input. You can use the similarity variable to see how similar the extracted topics are to the search term.


You can only use this method if an embedding model was supplied to BERTopic using embedding_model.