Skip to content

Hierarchical Topic Modeling

When tweaking your topic model, the number of topics that are generated has a large effect on the quality of the topic representations. Some topics could be merged and having an understanding of the effect will help you understand which topics should and which should not be merged.

That is where hierarchical topic modeling comes in. It tries to model the possible hierarchical nature of the topics you have created to understand which topics are similar to each other. Moreover, you will have more insight into sub-topics that might exist in your data.


Create a distance matrix by calculating the cosine similarity between c-TF-IDF representations of each topic. Apply a linkage function of choice on the distance matrix to model the hierarchical structure of topics. Topic 26 Topic 1 Topic 38 Topic 42 re-calculate c-TF-IDF Update the c-TF-IDF representation based on the collection of documents across the merged topics. Topic 1 .12 .12 .53 .53 .74 .74 .89 .89 .24 .24 .01 .01 1 1 1 1 ... ... ... ... ... ... ... ... 1 2 3 1 2 3 n ... . . . n


In BERTopic, we can approximate this potential hierarchy by making use of our topic-term matrix (c-TF-IDF matrix). This matrix contains information about the importance of every word in every topic and makes for a nice numerical representation of our topics. The smaller the distance between two c-TF-IDF representations, the more similar we assume they are. In practice, this process of merging topics is done through the hierarchical clustering capabilities of scipy (see here). It allows for several linkage methods through which we can approximate our topic hierarchy. As a default, we are using the ward but many others are available.

Whenever we merge two topics, we can calculate the c-TF-IDF representation of these two merged by summing their bag-of-words representation. We assume that two sets of topics are merged and that all others are kept the same, regardless of their location in the hierarchy. This helps us isolate the potential effect of merging sets of topics. As a result, we can see the topic representation at each level in the tree.

Example

To demonstrate hierarchical topic modeling with BERTopic, we use the 20 Newsgroups dataset to see how the topics that we uncover are represented in the 20 categories of documents.

First, we train a basic BERTopic model:

from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups

docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))["data"]
topic_model = BERTopic(verbose=True)
topics, probs = topic_model.fit_transform(docs)

Next, we can use our fitted BERTopic model to extract possible hierarchies from our c-TF-IDF matrix:

hierarchical_topics = topic_model.hierarchical_topics(docs)

The resulting hierarchical_topics is a dataframe in which merged topics are described. For example, if you would merge two topics, what would the topic representation of the new topic be?

Linkage functions

When creating the potential hierarchical nature of topics, we use Scipy's ward linkage function as a default to generate the hierarchy. However, you might want to use a different linkage function for your use case, such as single, complete, average, centroid, or median. In BERTopic, you can define the linkage function yourself, including the distance function that you would like to use:

from scipy.cluster import hierarchy as sch
from bertopic import BERTopic
topic_model = BERTopic()
topics, probs = topic_model.fit_transform(docs)

# Hierarchical topics
linkage_function = lambda x: sch.linkage(x, 'single', optimal_ordering=True)
hierarchical_topics = topic_model.hierarchical_topics(docs, linkage_function=linkage_function)

Visualizations

To visualize these results, we can start by running a familiar function, namely topic_model.visualize_hierarchy:

topic_model.visualize_hierarchy(hierarchical_topics=hierarchical_topics)

If you hover over the black circles, you will see the topic representation at that level of the hierarchy. These representations help you understand the effect of merging certain topics. Some might be logical to merge whilst others might not. Moreover, we can now see which sub-topics can be found within certain larger themes.

Although this gives a nice overview of the potential hierarchy, hovering over all black circles can be tiresome. Instead, we can use topic_model.get_topic_tree to create a text-based representation of this hierarchy. Although the general structure is more difficult to view, we can see better which topics could be logically merged:

>>> tree = topic_model.get_topic_tree(hierarchical_topics)
>>> print(tree)
.
└─atheists_atheism_god_moral_atheist
     β”œβ”€atheists_atheism_god_atheist_argument
     β”‚    β”œβ”€β– β”€β”€atheists_atheism_god_atheist_argument ── Topic: 21
     β”‚    └─■──br_god_exist_genetic_existence ── Topic: 124
     └─■──moral_morality_objective_immoral_morals ── Topic: 29
Click here to view the full tree.
  .
  β”œβ”€people_armenian_said_god_armenians
  β”‚    β”œβ”€god_jesus_jehovah_lord_christ
  β”‚    β”‚    β”œβ”€god_jesus_jehovah_lord_christ
  β”‚    β”‚    β”‚    β”œβ”€jehovah_lord_mormon_mcconkie_god
  β”‚    β”‚    β”‚    β”‚    β”œβ”€β– β”€β”€ra_satan_thou_god_lucifer ── Topic: 94
  β”‚    β”‚    β”‚    β”‚    └─■──jehovah_lord_mormon_mcconkie_unto ── Topic: 78
  β”‚    β”‚    β”‚    └─jesus_mary_god_hell_sin
  β”‚    β”‚    β”‚         β”œβ”€jesus_hell_god_eternal_heaven
  β”‚    β”‚    β”‚         β”‚    β”œβ”€hell_jesus_eternal_god_heaven
  β”‚    β”‚    β”‚         β”‚    β”‚    β”œβ”€β– β”€β”€jesus_tomb_disciples_resurrection_john ── Topic: 69
  β”‚    β”‚    β”‚         β”‚    β”‚    └─■──hell_eternal_god_jesus_heaven ── Topic: 53
  β”‚    β”‚    β”‚         β”‚    └─■──aaron_baptism_sin_law_god ── Topic: 89
  β”‚    β”‚    β”‚         └─■──mary_sin_maria_priest_conception ── Topic: 56
  β”‚    β”‚    └─■──marriage_married_marry_ceremony_marriages ── Topic: 110
  β”‚    └─people_armenian_armenians_said_mr
  β”‚         β”œβ”€people_armenian_armenians_said_israel
  β”‚         β”‚    β”œβ”€god_homosexual_homosexuality_atheists_sex
  β”‚         β”‚    β”‚    β”œβ”€homosexual_homosexuality_sex_gay_homosexuals
  β”‚         β”‚    β”‚    β”‚    β”œβ”€β– β”€β”€kinsey_sex_gay_men_sexual ── Topic: 44
  β”‚         β”‚    β”‚    β”‚    └─homosexuality_homosexual_sin_homosexuals_gay
  β”‚         β”‚    β”‚    β”‚         β”œβ”€β– β”€β”€gay_homosexual_homosexuals_sexual_cramer ── Topic: 50
  β”‚         β”‚    β”‚    β”‚         └─■──homosexuality_homosexual_sin_paul_sex ── Topic: 27
  β”‚         β”‚    β”‚    └─god_atheists_atheism_moral_atheist
  β”‚         β”‚    β”‚         β”œβ”€islam_quran_judas_islamic_book
  β”‚         β”‚    β”‚         β”‚    β”œβ”€β– β”€β”€jim_context_challenges_articles_quote ── Topic: 36
  β”‚         β”‚    β”‚         β”‚    └─islam_quran_judas_islamic_book
  β”‚         β”‚    β”‚         β”‚         β”œβ”€β– β”€β”€islam_quran_islamic_rushdie_muslims ── Topic: 31
  β”‚         β”‚    β”‚         β”‚         └─■──judas_scripture_bible_books_greek ── Topic: 33
  β”‚         β”‚    β”‚         └─atheists_atheism_god_moral_atheist
  β”‚         β”‚    β”‚              β”œβ”€atheists_atheism_god_atheist_argument
  β”‚         β”‚    β”‚              β”‚    β”œβ”€β– β”€β”€atheists_atheism_god_atheist_argument ── Topic: 21
  β”‚         β”‚    β”‚              β”‚    └─■──br_god_exist_genetic_existence ── Topic: 124
  β”‚         β”‚    β”‚              └─■──moral_morality_objective_immoral_morals ── Topic: 29
  β”‚         β”‚    └─armenian_armenians_people_israel_said
  β”‚         β”‚         β”œβ”€armenian_armenians_israel_people_jews
  β”‚         β”‚         β”‚    β”œβ”€tax_rights_government_income_taxes
  β”‚         β”‚         β”‚    β”‚    β”œβ”€β– β”€β”€rights_right_slavery_slaves_residence ── Topic: 106
  β”‚         β”‚         β”‚    β”‚    └─tax_government_taxes_income_libertarians
  β”‚         β”‚         β”‚    β”‚         β”œβ”€β– β”€β”€government_libertarians_libertarian_regulation_party ── Topic: 58
  β”‚         β”‚         β”‚    β”‚         └─■──tax_taxes_income_billion_deficit ── Topic: 41
  β”‚         β”‚         β”‚    └─armenian_armenians_israel_people_jews
  β”‚         β”‚         β”‚         β”œβ”€gun_guns_militia_firearms_amendment
  β”‚         β”‚         β”‚         β”‚    β”œβ”€β– β”€β”€blacks_penalty_death_cruel_punishment ── Topic: 55
  β”‚         β”‚         β”‚         β”‚    └─■──gun_guns_militia_firearms_amendment ── Topic: 7
  β”‚         β”‚         β”‚         └─armenian_armenians_israel_jews_turkish
  β”‚         β”‚         β”‚              β”œβ”€β– β”€β”€israel_israeli_jews_arab_jewish ── Topic: 4
  β”‚         β”‚         β”‚              └─■──armenian_armenians_turkish_armenia_azerbaijan ── Topic: 15
  β”‚         β”‚         └─stephanopoulos_president_mr_myers_ms
  β”‚         β”‚              β”œβ”€β– β”€β”€serbs_muslims_stephanopoulos_mr_bosnia ── Topic: 35
  β”‚         β”‚              └─■──myers_stephanopoulos_president_ms_mr ── Topic: 87
  β”‚         └─batf_fbi_koresh_compound_gas
  β”‚              β”œβ”€β– β”€β”€reno_workers_janet_clinton_waco ── Topic: 77
  β”‚              └─batf_fbi_koresh_gas_compound
  β”‚                   β”œβ”€batf_koresh_fbi_warrant_compound
  β”‚                   β”‚    β”œβ”€β– β”€β”€batf_warrant_raid_compound_fbi ── Topic: 42
  β”‚                   β”‚    └─■──koresh_batf_fbi_children_compound ── Topic: 61
  β”‚                   └─■──fbi_gas_tear_bds_building ── Topic: 23
  └─use_like_just_dont_new
      β”œβ”€game_team_year_games_like
      β”‚    β”œβ”€game_team_games_25_year
      β”‚    β”‚    β”œβ”€game_team_games_25_season
      β”‚    β”‚    β”‚    β”œβ”€window_printer_use_problem_mhz
      β”‚    β”‚    β”‚    β”‚    β”œβ”€mhz_wire_simms_wiring_battery
      β”‚    β”‚    β”‚    β”‚    β”‚    β”œβ”€simms_mhz_battery_cpu_heat
      β”‚    β”‚    β”‚    β”‚    β”‚    β”‚    β”œβ”€simms_pds_simm_vram_lc
      β”‚    β”‚    β”‚    β”‚    β”‚    β”‚    β”‚    β”œβ”€β– β”€β”€pds_nubus_lc_slot_card ── Topic: 119
      β”‚    β”‚    β”‚    β”‚    β”‚    β”‚    β”‚    └─■──simms_simm_vram_meg_dram ── Topic: 32
      β”‚    β”‚    β”‚    β”‚    β”‚    β”‚    └─mhz_battery_cpu_heat_speed
      β”‚    β”‚    β”‚    β”‚    β”‚    β”‚         β”œβ”€mhz_cpu_speed_heat_fan
      β”‚    β”‚    β”‚    β”‚    β”‚    β”‚         β”‚    β”œβ”€mhz_cpu_speed_heat_fan
      β”‚    β”‚    β”‚    β”‚    β”‚    β”‚         β”‚    β”‚    β”œβ”€β– β”€β”€fan_cpu_heat_sink_fans ── Topic: 92
      β”‚    β”‚    β”‚    β”‚    β”‚    β”‚         β”‚    β”‚    └─■──mhz_speed_cpu_fpu_clock ── Topic: 22
      β”‚    β”‚    β”‚    β”‚    β”‚    β”‚         β”‚    └─■──monitor_turn_power_computer_electricity ── Topic: 91
      β”‚    β”‚    β”‚    β”‚    β”‚    β”‚         └─battery_batteries_concrete_duo_discharge
      β”‚    β”‚    β”‚    β”‚    β”‚    β”‚              β”œβ”€β– β”€β”€duo_battery_apple_230_problem ── Topic: 121
      β”‚    β”‚    β”‚    β”‚    β”‚    β”‚              └─■──battery_batteries_concrete_discharge_temperature ── Topic: 75
      β”‚    β”‚    β”‚    β”‚    β”‚    └─wire_wiring_ground_neutral_outlets
      β”‚    β”‚    β”‚    β”‚    β”‚         β”œβ”€wire_wiring_ground_neutral_outlets
      β”‚    β”‚    β”‚    β”‚    β”‚         β”‚    β”œβ”€wire_wiring_ground_neutral_outlets
      β”‚    β”‚    β”‚    β”‚    β”‚         β”‚    β”‚    β”œβ”€β– β”€β”€leds_uv_blue_light_boards ── Topic: 66
      β”‚    β”‚    β”‚    β”‚    β”‚         β”‚    β”‚    └─■──wire_wiring_ground_neutral_outlets ── Topic: 120
      β”‚    β”‚    β”‚    β”‚    β”‚         β”‚    └─scope_scopes_phone_dial_number
      β”‚    β”‚    β”‚    β”‚    β”‚         β”‚         β”œβ”€β– β”€β”€dial_number_phone_line_output ── Topic: 93
      β”‚    β”‚    β”‚    β”‚    β”‚         β”‚         └─■──scope_scopes_motorola_generator_oscilloscope ── Topic: 113
      β”‚    β”‚    β”‚    β”‚    β”‚         └─celp_dsp_sampling_antenna_digital
      β”‚    β”‚    β”‚    β”‚    β”‚              β”œβ”€β– β”€β”€antenna_antennas_receiver_cable_transmitter ── Topic: 70
      β”‚    β”‚    β”‚    β”‚    β”‚              └─■──celp_dsp_sampling_speech_voice ── Topic: 52
      β”‚    β”‚    β”‚    β”‚    └─window_printer_xv_mouse_windows
      β”‚    β”‚    β”‚    β”‚         β”œβ”€window_xv_error_widget_problem
      β”‚    β”‚    β”‚    β”‚         β”‚    β”œβ”€error_symbol_undefined_xterm_rx
      β”‚    β”‚    β”‚    β”‚         β”‚    β”‚    β”œβ”€β– β”€β”€symbol_error_undefined_doug_parse ── Topic: 63
      β”‚    β”‚    β”‚    β”‚         β”‚    β”‚    └─■──rx_remote_server_xdm_xterm ── Topic: 45
      β”‚    β”‚    β”‚    β”‚         β”‚    └─window_xv_widget_application_expose
      β”‚    β”‚    β”‚    β”‚         β”‚         β”œβ”€window_widget_expose_application_event
      β”‚    β”‚    β”‚    β”‚         β”‚         β”‚    β”œβ”€β– β”€β”€gc_mydisplay_draw_gxxor_drawing ── Topic: 103
      β”‚    β”‚    β”‚    β”‚         β”‚         β”‚    └─■──window_widget_application_expose_event ── Topic: 25
      β”‚    β”‚    β”‚    β”‚         β”‚         └─xv_den_polygon_points_algorithm
      β”‚    β”‚    β”‚    β”‚         β”‚              β”œβ”€β– β”€β”€den_polygon_points_algorithm_polygons ── Topic: 28
      β”‚    β”‚    β”‚    β”‚         β”‚              └─■──xv_24bit_image_bit_images ── Topic: 57
      β”‚    β”‚    β”‚    β”‚         └─printer_fonts_print_mouse_postscript
      β”‚    β”‚    β”‚    β”‚              β”œβ”€printer_fonts_print_font_deskjet
      β”‚    β”‚    β”‚    β”‚              β”‚    β”œβ”€β– β”€β”€scanner_logitech_grayscale_ocr_scanman ── Topic: 108
      β”‚    β”‚    β”‚    β”‚              β”‚    └─printer_fonts_print_font_deskjet
      β”‚    β”‚    β”‚    β”‚              β”‚         β”œβ”€β– β”€β”€printer_print_deskjet_hp_ink ── Topic: 18
      β”‚    β”‚    β”‚    β”‚              β”‚         └─■──fonts_font_truetype_tt_atm ── Topic: 49
      β”‚    β”‚    β”‚    β”‚              └─mouse_ghostscript_midi_driver_postscript
      β”‚    β”‚    β”‚    β”‚                   β”œβ”€ghostscript_midi_postscript_files_file
      β”‚    β”‚    β”‚    β”‚                   β”‚    β”œβ”€β– β”€β”€ghostscript_postscript_pageview_ghostview_dsc ── Topic: 104
      β”‚    β”‚    β”‚    β”‚                   β”‚    └─midi_sound_file_windows_driver
      β”‚    β”‚    β”‚    β”‚                   β”‚         β”œβ”€β– β”€β”€location_mar_file_host_rwrr ── Topic: 83
      β”‚    β”‚    β”‚    β”‚                   β”‚         └─■──midi_sound_driver_blaster_soundblaster ── Topic: 98
      β”‚    β”‚    β”‚    β”‚                   └─■──mouse_driver_mice_ball_problem ── Topic: 68
      β”‚    β”‚    β”‚    └─game_team_games_25_season
      β”‚    β”‚    β”‚         β”œβ”€1st_sale_condition_comics_hulk
      β”‚    β”‚    β”‚         β”‚    β”œβ”€sale_condition_offer_asking_cd
      β”‚    β”‚    β”‚         β”‚    β”‚    β”œβ”€condition_stereo_amp_speakers_asking
      β”‚    β”‚    β”‚         β”‚    β”‚    β”‚    β”œβ”€β– β”€β”€miles_car_amfm_toyota_cassette ── Topic: 62
      β”‚    β”‚    β”‚         β”‚    β”‚    β”‚    └─■──amp_speakers_condition_stereo_audio ── Topic: 24
      β”‚    β”‚    β”‚         β”‚    β”‚    └─games_sale_pom_cds_shipping
      β”‚    β”‚    β”‚         β”‚    β”‚         β”œβ”€pom_cds_sale_shipping_cd
      β”‚    β”‚    β”‚         β”‚    β”‚         β”‚    β”œβ”€β– β”€β”€size_shipping_sale_condition_mattress ── Topic: 100
      β”‚    β”‚    β”‚         β”‚    β”‚         β”‚    └─■──pom_cds_cd_sale_picture ── Topic: 37
      β”‚    β”‚    β”‚         β”‚    β”‚         └─■──games_game_snes_sega_genesis ── Topic: 40
      β”‚    β”‚    β”‚         β”‚    └─1st_hulk_comics_art_appears
      β”‚    β”‚    β”‚         β”‚         β”œβ”€1st_hulk_comics_art_appears
      β”‚    β”‚    β”‚         β”‚         β”‚    β”œβ”€lens_tape_camera_backup_lenses
      β”‚    β”‚    β”‚         β”‚         β”‚    β”‚    β”œβ”€β– β”€β”€tape_backup_tapes_drive_4mm ── Topic: 107
      β”‚    β”‚    β”‚         β”‚         β”‚    β”‚    └─■──lens_camera_lenses_zoom_pouch ── Topic: 114
      β”‚    β”‚    β”‚         β”‚         β”‚    └─1st_hulk_comics_art_appears
      β”‚    β”‚    β”‚         β”‚         β”‚         β”œβ”€β– β”€β”€1st_hulk_comics_art_appears ── Topic: 105
      β”‚    β”‚    β”‚         β”‚         β”‚         └─■──books_book_cover_trek_chemistry ── Topic: 125
      β”‚    β”‚    β”‚         β”‚         └─tickets_hotel_ticket_voucher_package
      β”‚    β”‚    β”‚         β”‚              β”œβ”€β– β”€β”€hotel_voucher_package_vacation_room ── Topic: 74
      β”‚    β”‚    β”‚         β”‚              └─■──tickets_ticket_june_airlines_july ── Topic: 84
      β”‚    β”‚    β”‚         └─game_team_games_season_hockey
      β”‚    β”‚    β”‚              β”œβ”€game_hockey_team_25_550
      β”‚    β”‚    β”‚              β”‚    β”œβ”€β– β”€β”€espn_pt_pts_game_la ── Topic: 17
      β”‚    β”‚    β”‚              β”‚    └─■──team_25_game_hockey_550 ── Topic: 2
      β”‚    β”‚    β”‚              └─■──year_game_hit_baseball_players ── Topic: 0
      β”‚    β”‚    └─bike_car_greek_insurance_msg
      β”‚    β”‚         β”œβ”€car_bike_insurance_cars_engine
      β”‚    β”‚         β”‚    β”œβ”€car_insurance_cars_radar_engine
      β”‚    β”‚         β”‚    β”‚    β”œβ”€insurance_health_private_care_canada
      β”‚    β”‚         β”‚    β”‚    β”‚    β”œβ”€β– β”€β”€insurance_health_private_care_canada ── Topic: 99
      β”‚    β”‚         β”‚    β”‚    β”‚    └─■──insurance_car_accident_rates_sue ── Topic: 82
      β”‚    β”‚         β”‚    β”‚    └─car_cars_radar_engine_detector
      β”‚    β”‚         β”‚    β”‚         β”œβ”€car_radar_cars_detector_engine
      β”‚    β”‚         β”‚    β”‚         β”‚    β”œβ”€β– β”€β”€radar_detector_detectors_ka_alarm ── Topic: 39
      β”‚    β”‚         β”‚    β”‚         β”‚    └─car_cars_mustang_ford_engine
      β”‚    β”‚         β”‚    β”‚         β”‚         β”œβ”€β– β”€β”€clutch_shift_shifting_transmission_gear ── Topic: 88
      β”‚    β”‚         β”‚    β”‚         β”‚         └─■──car_cars_mustang_ford_v8 ── Topic: 14
      β”‚    β”‚         β”‚    β”‚         └─oil_diesel_odometer_diesels_car
      β”‚    β”‚         β”‚    β”‚              β”œβ”€odometer_oil_sensor_car_drain
      β”‚    β”‚         β”‚    β”‚              β”‚    β”œβ”€β– β”€β”€odometer_sensor_speedo_gauge_mileage ── Topic: 96
      β”‚    β”‚         β”‚    β”‚              β”‚    └─■──oil_drain_car_leaks_taillights ── Topic: 102
      β”‚    β”‚         β”‚    β”‚              └─■──diesel_diesels_emissions_fuel_oil ── Topic: 79
      β”‚    β”‚         β”‚    └─bike_riding_ride_bikes_motorcycle
      β”‚    β”‚         β”‚         β”œβ”€bike_ride_riding_bikes_lane
      β”‚    β”‚         β”‚         β”‚    β”œβ”€β– β”€β”€bike_ride_riding_lane_car ── Topic: 11
      β”‚    β”‚         β”‚         β”‚    └─■──bike_bikes_miles_honda_motorcycle ── Topic: 19
      β”‚    β”‚         β”‚         └─■──countersteering_bike_motorcycle_rear_shaft ── Topic: 46
      β”‚    β”‚         └─greek_msg_kuwait_greece_water
      β”‚    β”‚              β”œβ”€greek_msg_kuwait_greece_water
      β”‚    β”‚              β”‚    β”œβ”€greek_msg_kuwait_greece_dog
      β”‚    β”‚              β”‚    β”‚    β”œβ”€greek_msg_kuwait_greece_dog
      β”‚    β”‚              β”‚    β”‚    β”‚    β”œβ”€greek_kuwait_greece_turkish_greeks
      β”‚    β”‚              β”‚    β”‚    β”‚    β”‚    β”œβ”€β– β”€β”€greek_greece_turkish_greeks_cyprus ── Topic: 71
      β”‚    β”‚              β”‚    β”‚    β”‚    β”‚    └─■──kuwait_iraq_iran_gulf_arabia ── Topic: 76
      β”‚    β”‚              β”‚    β”‚    β”‚    └─msg_dog_drugs_drug_food
      β”‚    β”‚              β”‚    β”‚    β”‚         β”œβ”€dog_dogs_cooper_trial_weaver
      β”‚    β”‚              β”‚    β”‚    β”‚         β”‚    β”œβ”€β– β”€β”€clinton_bush_quayle_reagan_panicking ── Topic: 101
      β”‚    β”‚              β”‚    β”‚    β”‚         β”‚    └─dog_dogs_cooper_trial_weaver
      β”‚    β”‚              β”‚    β”‚    β”‚         β”‚         β”œβ”€β– β”€β”€cooper_trial_weaver_spence_witnesses ── Topic: 90
      β”‚    β”‚              β”‚    β”‚    β”‚         β”‚         └─■──dog_dogs_bike_trained_springer ── Topic: 67
      β”‚    β”‚              β”‚    β”‚    β”‚         └─msg_drugs_drug_food_chinese
      β”‚    β”‚              β”‚    β”‚    β”‚              β”œβ”€β– β”€β”€msg_food_chinese_foods_taste ── Topic: 30
      β”‚    β”‚              β”‚    β”‚    β”‚              └─■──drugs_drug_marijuana_cocaine_alcohol ── Topic: 72
      β”‚    β”‚              β”‚    β”‚    └─water_theory_universe_science_larsons
      β”‚    β”‚              β”‚    β”‚         β”œβ”€water_nuclear_cooling_steam_dept
      β”‚    β”‚              β”‚    β”‚         β”‚    β”œβ”€β– β”€β”€rocketry_rockets_engines_nuclear_plutonium ── Topic: 115
      β”‚    β”‚              β”‚    β”‚         β”‚    └─water_cooling_steam_dept_plants
      β”‚    β”‚              β”‚    β”‚         β”‚         β”œβ”€β– β”€β”€water_dept_phd_environmental_atmospheric ── Topic: 97
      β”‚    β”‚              β”‚    β”‚         β”‚         └─■──cooling_water_steam_towers_plants ── Topic: 109
      β”‚    β”‚              β”‚    β”‚         └─theory_universe_larsons_larson_science
      β”‚    β”‚              β”‚    β”‚              β”œβ”€β– β”€β”€theory_universe_larsons_larson_science ── Topic: 54
      β”‚    β”‚              β”‚    β”‚              └─■──oort_cloud_grbs_gamma_burst ── Topic: 80
      β”‚    β”‚              β”‚    └─helmet_kirlian_photography_lock_wax
      β”‚    β”‚              β”‚         β”œβ”€helmet_kirlian_photography_leaf_mask
      β”‚    β”‚              β”‚         β”‚    β”œβ”€kirlian_photography_leaf_pictures_deleted
      β”‚    β”‚              β”‚         β”‚    β”‚    β”œβ”€deleted_joke_stuff_maddi_nickname
      β”‚    β”‚              β”‚         β”‚    β”‚    β”‚    β”œβ”€β– β”€β”€joke_maddi_nickname_nicknames_frank ── Topic: 43
      β”‚    β”‚              β”‚         β”‚    β”‚    β”‚    └─■──deleted_stuff_bookstore_joke_motto ── Topic: 81
      β”‚    β”‚              β”‚         β”‚    β”‚    └─■──kirlian_photography_leaf_pictures_aura ── Topic: 85
      β”‚    β”‚              β”‚         β”‚    └─helmet_mask_liner_foam_cb
      β”‚    β”‚              β”‚         β”‚         β”œβ”€β– β”€β”€helmet_liner_foam_cb_helmets ── Topic: 112
      β”‚    β”‚              β”‚         β”‚         └─■──mask_goalies_77_santore_tl ── Topic: 123
      β”‚    β”‚              β”‚         └─lock_wax_paint_plastic_ear
      β”‚    β”‚              β”‚              β”œβ”€β– β”€β”€lock_cable_locks_bike_600 ── Topic: 117
      β”‚    β”‚              β”‚              └─wax_paint_ear_plastic_skin
      β”‚    β”‚              β”‚                   β”œβ”€β– β”€β”€wax_paint_plastic_scratches_solvent ── Topic: 65
      β”‚    β”‚              β”‚                   └─■──ear_wax_skin_greasy_acne ── Topic: 116
      β”‚    β”‚              └─m4_mp_14_mw_mo
      β”‚    β”‚                   β”œβ”€m4_mp_14_mw_mo
      β”‚    β”‚                   β”‚    β”œβ”€β– β”€β”€m4_mp_14_mw_mo ── Topic: 111
      β”‚    β”‚                   β”‚    └─■──test_ensign_nameless_deane_deanebinahccbrandeisedu ── Topic: 118
      β”‚    β”‚                   └─■──ites_cheek_hello_hi_ken ── Topic: 3
      β”‚    └─space_medical_health_disease_cancer
      β”‚         β”œβ”€medical_health_disease_cancer_patients
      β”‚         β”‚    β”œβ”€β– β”€β”€cancer_centers_center_medical_research ── Topic: 122
      β”‚         β”‚    └─health_medical_disease_patients_hiv
      β”‚         β”‚         β”œβ”€patients_medical_disease_candida_health
      β”‚         β”‚         β”‚    β”œβ”€β– β”€β”€candida_yeast_infection_gonorrhea_infections ── Topic: 48
      β”‚         β”‚         β”‚    └─patients_disease_cancer_medical_doctor
      β”‚         β”‚         β”‚         β”œβ”€β– β”€β”€hiv_medical_cancer_patients_doctor ── Topic: 34
      β”‚         β”‚         β”‚         └─■──pain_drug_patients_disease_diet ── Topic: 26
      β”‚         β”‚         └─■──health_newsgroup_tobacco_vote_votes ── Topic: 9
      β”‚         └─space_launch_nasa_shuttle_orbit
      β”‚              β”œβ”€space_moon_station_nasa_launch
      β”‚              β”‚    β”œβ”€β– β”€β”€sky_advertising_billboard_billboards_space ── Topic: 59
      β”‚              β”‚    └─■──space_station_moon_redesign_nasa ── Topic: 16
      β”‚              └─space_mission_hst_launch_orbit
      β”‚                   β”œβ”€space_launch_nasa_orbit_propulsion
      β”‚                   β”‚    β”œβ”€β– β”€β”€space_launch_nasa_propulsion_astronaut ── Topic: 47
      β”‚                   β”‚    └─■──orbit_km_jupiter_probe_earth ── Topic: 86
      β”‚                   └─■──hst_mission_shuttle_orbit_arrays ── Topic: 60
      └─drive_file_key_windows_use
          β”œβ”€key_file_jpeg_encryption_image
          β”‚    β”œβ”€key_encryption_clipper_chip_keys
          β”‚    β”‚    β”œβ”€β– β”€β”€key_clipper_encryption_chip_keys ── Topic: 1
          β”‚    β”‚    └─■──entry_file_ripem_entries_key ── Topic: 73
          β”‚    └─jpeg_image_file_gif_images
          β”‚         β”œβ”€motif_graphics_ftp_available_3d
          β”‚         β”‚    β”œβ”€motif_graphics_openwindows_ftp_available
          β”‚         β”‚    β”‚    β”œβ”€β– β”€β”€openwindows_motif_xview_windows_mouse ── Topic: 20
          β”‚         β”‚    β”‚    └─■──graphics_widget_ray_3d_available ── Topic: 95
          β”‚         β”‚    └─■──3d_machines_version_comments_contact ── Topic: 38
          β”‚         └─jpeg_image_gif_images_format
          β”‚              β”œβ”€β– β”€β”€gopher_ftp_files_stuffit_images ── Topic: 51
          β”‚              └─■──jpeg_image_gif_format_images ── Topic: 13
          └─drive_db_card_scsi_windows
              β”œβ”€db_windows_dos_mov_os2
              β”‚    β”œβ”€β– β”€β”€copy_protection_program_software_disk ── Topic: 64
              β”‚    └─■──db_windows_dos_mov_os2 ── Topic: 8
              └─drive_card_scsi_drives_ide
                      β”œβ”€drive_scsi_drives_ide_disk
                      β”‚    β”œβ”€β– β”€β”€drive_scsi_drives_ide_disk ── Topic: 6
                      β”‚    └─■──meg_sale_ram_drive_shipping ── Topic: 12
                      └─card_modem_monitor_video_drivers
                          β”œβ”€β– β”€β”€card_monitor_video_drivers_vga ── Topic: 5
                          └─■──modem_port_serial_irq_com ── Topic: 10

Merge topics

After seeing the potential hierarchy of your topic, you might want to merge specific topics. For example, if topic 1 is 1_space_launch_moon_nasa and topic 2 is 2_spacecraft_solar_space_orbit it might make sense to merge those two topics as they are quite similar in meaning. In BERTopic, you can use .merge_topics to manually select and merge those topics. Doing so will update their topic representation which in turn updates the entire model:

topics_to_merge = [1, 2]
topic_model.merge_topics(docs, topics_to_merge)

If you have several groups of topics you want to merge, create a list of lists instead:

topics_to_merge = [[1, 2],
                   [3, 4]]
topic_model.merge_topics(docs, topics_to_merge)