Over the last few years, BERTopic has been used on a wide variety of use cases and domains, from cancer research and voice perception, to employee surveys and social media. This diversity allows for interesting use cases but it might quickly become overwhelming. This page is meant to demonstrate how, when, and why BERTopic is used in practice.
Below are a number of use cases that have been applied in practice. These use cases are collected from and written by data-professionals.
If you would like to add your use case, feel free open up a PR! You only need to update this file and add your example. You can just copy-paste one of the existing examples and adjust it contain a description of your use case.
App User Feedback¶
"Analyzing user reviews from the App Store and Play Store helps us reveal valuable customer information, fix technical or usability issues, and help constantly improve customer experience. We utilize BERTopic for topic modeling and supervised classification of predefined categories."
Tibor Fabian, Ph.D.
Lead/Master Data Scientist
"We are using BERTopic to support analysis of employee surveys. Here, we use BERTopic to compute the topics of discussion found in employee responses to open-ended survey questions. To further understand how employees feel about certain topics, we combined BERTopic with sentiment analysis to identify the sentiments associated with different topics and vice versa."
Steve Quirolgico, Ph.D.
U.S. Department of Homeland Security
"A research project on voice perception to categorize what people describe when they make first impressions based on hearing people say, "Hi"." preprint | code
"We use BERTopic to detect trending topics in social media, Our product (AIM Insights) is a social media monitoring tool so detecting trending topics in social media helps our clients to capitalize on them for their campaigns.
We use BERTopic to group social media posts into clusters, sort them by engagement to detect the ones that are trending, and then use OpenAI's GPT-3 to generate a label for each of the top clusters based on the most relevant documents in it. This is all done on Arabic posts using an in-house sentence embeddings model."
IT Service Management¶
"In IT Service Management systems (e.g., Service Now) we receive Incidents, Problems, Change requests etc. We use BERTopic to categorize them into a group of topics/clusters to understand the distribution of the work requests over the period of time to plan and act accordingly for the future."
Data Science Consultant
"We use BERTopic to evaluate P53 in Ovarian cancer for Computational backgrounds researchers, who find it easier to relate Artificial Intelligence with advancing the transformer model and unstructured medical data. The paper explores the heterogeneity of keyBERT, BERTopic, PyCaret, and LDAs as key phrase generators and topic model extractors, with P53 in ovarian cancer as a use case."
PhD Student in Colon Cancer and AI
Telephone Help Line¶
"We analyzed 100K+ phone call memos from a telephone help line. The Help Line is open to all people, regardless of religion, culture, and origin. It follows the principles of IFOTES (International Federation Of Telephone Emergency Services). The regional offices each offer independent counseling services via telephone or online.
The phone call memos are written by hundreds of independent volunteers and come in various shapes, lengths, forms, and wordings - additionally to have them in multiple languages. While using BERTopic we ran a few tests to figure out if the topic modeling works. Selecting only one language with ~60K data points and a mixed language model we achieved good results. It helped identify topics within the calls and therefore show the organization what reasons there are for people calling them. We identified in a workshop a few interesting topics, which they were not aware of, for example, religious topics.
The identification of existing and new, arising topics is crucial for the service quality of the organization. It furthermore helps detect trends over time, which can then be reported directly to Public Health institutions, which can then come up with campaigns to inform the public and help reduce certain psychological concerns. It acts as a representative psychological health barometer of the population."
Chief Executive Officer
"Recently, we wanted to evaluate our overall section structure, especially our local news section. As you can imagine, local news is quite a big part of what we do in a regional newspaper. We used BERTopic on a year's worth of local news data to explore the topics in local news and define a new section structure. The results from this analysis helped to define the new section structure, which was implemented this month. "
Intelligent Virtual Assistants¶
"We have been using BERTopic as an early step in our exploratory analysis for intelligent virtual assistants. It helps us get a quick read on what some of the intents may be. The results help in the design discussions with customers."
VP, AI and Automation Solutions
Electronic Health Records¶
"Given physician-created documents from hospitals, find themes in the text as well as differentiate between "relevant" and "irrelevant" text, and disambiguate homonyms. "
Senior NLP Engineer
"BERTopic was used to determine a taxonomy of climate change risks discussed in financial news, and to compute firms' related exposure. It was used in a context a course offering on Climate Risks modelling with NLP."
Senior Associate, Quantitative Analyst
Zero Hunger Lab¶
"I am a PhD student at Tilburg University, at a lab called Zero Hunger Lab, where we try to use data science methods to improve food insecurity. One key issue is classifying and predicting food insecurity in food-insecure nations. The Integrated Food Security Phase Classification (IPC) system serves this purpose. The IPC categorizes food insecurity into five phases, ranging from minimal food insecurity to famine, and serves as a guide for directing humanitarian resources to the most affected regions.
The IPC system strives to be based on evidence, however, obtaining accurate information about food insecurity in remote regions can prove challenging. Despite the availability of weather data, data in the socio-economic domain, such as food prices and conflict, can be scarce or unreliable due to limited infrastructure and bureaucratic obstacles. These complications often result in infrequent releases of IPC classifications and projections, making it difficult to effectively respond to food insecurity in these areas.
One large source of daily-updated information is local news. Thus, one can build a model that classifies/predicts IPC by relying on news features obtained by NLP methods in addition to stuff like weather data. Previous research shows this is possible (see https://arxiv.org/pdf/2111.15602.pdf). The authors find words related to food insecurity using semantic frame parsing. After which, they count the occurrence of these words to create features. The features are put into a linear classifier. We wanted to apply more advanced methods and use local news sources (which we suppose contain more localized information). We used BERTopic on over a million articles scraped from Somali news websites. Because articles are both in English and Somali, we use a multilingual sentence encoder (LaBSE, which outperforms newer models in Somali). The results are quite nice. For example, topics most strongly correlated with known conflict casualty data are topics about terrorist attacks, car bombings, etc. And topics most negatively correlated with known conflict casualty data are about peace talks. We can also get an indication of food price development and forced migration. Most importantly, we can track the development of topics relating to food insecurity over time. While topic modelling cannot replace evidence-based food insecurity assessment, it can give a quick insight into a local situation when 'hard data' is lacking.
I applaud you on your success with BERTopic. The package is incredibly clean and easy to use, and the method works well with little parameter tuning. To me, the fact that you were able to deliver such a useful tool on your own is incredible, especially in the field of NLP, which is dominated by large organizations such as Google and Meta. "
Cascha van Wanrooij
BERTopic has also been adopted more and more in the academic field. Here are a few from all different kinds of research domains with interesting applications:
- Adewunmi, M., Sharma, S. K., Sharma, N., Sushma, N. S., & Mounmo, B. (2022). Cancer Health Disparities drivers with BERTopic modelling and PyCaret Evaluation. Cancer Health Disparities, 6.
- Ebeling, R., Sáenz, C. A. C., Nobre, J. C., & Becker, K. (2022, May). Analysis of the influence of political polarization in the vaccination stance: the Brazilian COVID-19 scenario. In Proceedings of the International AAAI Conference on Web and Social Media (Vol. 16, pp. 159-170).
- Hoseini, M., Melo, P., Benevenuto, F., Feldmann, A., & Zannettou, S. (2021). On the globalization of the QAnon conspiracy theory through Telegram. arXiv preprint arXiv:2105.13020.
- Falkenberg, M., Galeazzi, A., Torricelli, M., Di Marco, N., Larosa, F., Sas, M., ... & Baronchelli, A. (2022). Growing polarization around climate change on social media. Nature Climate Change, 1-8.
- Sánchez‐Franco, M. J., & Rey‐Moreno, M. (2022). Do travelers' reviews depend on the destination? An analysis in coastal and urban peer‐to‐peer lodgings. Psychology & Marketing, 39(2), 441-459.
- Zhunis, A., Lima, G., Song, H., Han, J., & Cha, M. (2022, April). Emotion bubbles: Emotional composition of online discourse before and after the COVID-19 outbreak. In Proceedings of the ACM Web Conference 2022 (pp. 2603-2613).
- Alhaj, F., Al-Haj, A., Sharieh, A., & Jabri, R. (2022). Improving Arabic cognitive distortion classification in Twitter using BERTopic. International Journal of Advanced Computer Science and Applications, 13(1), 854-860.
Click here for a full overview of papers citing BERTopic.