Search Topics
After having created a BERTopic model, you might end up with over a hundred topics. Searching through those can be quite cumbersome especially if you are searching for a specific topic. Fortunately, BERTopic allows you to search for topics using search terms. First, let's create and train a BERTopic model:
from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups
# Create topics
docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']
topic_model = BERTopic()
topics, probs = topic_model.fit_transform(docs)
After having trained our model, we can use find_topics
to search for topics that are similar
to an input search_term. Here, we are going to be searching for topics that closely relate the
search term "motor". Then, we extract the most similar topic and check the results:
>>> similar_topics, similarity = topic_model.find_topics("motor", top_n=5)
>>> topic_model.get_topic(similar_topics[0])
[('bike', 0.02275997701645559),
('motorcycle', 0.011391202866080292),
('bikes', 0.00981187573649205),
('dod', 0.009614623748226669),
('honda', 0.008247663662558535),
('ride', 0.0064683227888861945),
('harley', 0.006355502638631013),
('riding', 0.005766601561614182),
('motorcycles', 0.005596372493714447),
('advice', 0.005534544418830091)]
It definitely seems that a topic was found that closely matches "motor". The topic seems to be motorcycle
related and therefore matches our "motor" input. You can use the similarity
variable to see how similar
the extracted topics are to the search term.
Note
You can only use this method if an embedding model was supplied to BERTopic using embedding_model
.