MaximalMarginalRelevance
¶
Bases: BaseRepresentation
Calculate Maximal Marginal Relevance (MMR) between candidate keywords and the document.
MMR considers the similarity of keywords/keyphrases with the document, along with the similarity of already selected keywords and keyphrases. This results in a selection of keywords that maximize their within diversity with respect to the document.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
diversity |
float
|
How diverse the select keywords/keyphrases are. Values range between 0 and 1 with 0 being not diverse at all and 1 being most diverse. |
0.1
|
top_n_words |
int
|
The number of keywords/keyhprases to return |
10
|
Usage:
from bertopic.representation import MaximalMarginalRelevance
from bertopic import BERTopic
# Create your representation model
representation_model = MaximalMarginalRelevance(diversity=0.3)
# Use the representation model in BERTopic on top of the default pipeline
topic_model = BERTopic(representation_model=representation_model)
Source code in bertopic\representation\_mmr.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
|
extract_topics(topic_model, documents, c_tf_idf, topics)
¶
Extract topic representations
Parameters:
Name | Type | Description | Default |
---|---|---|---|
topic_model |
The BERTopic model |
required | |
documents |
DataFrame
|
Not used |
required |
c_tf_idf |
csr_matrix
|
Not used |
required |
topics |
Mapping[str, List[Tuple[str, float]]]
|
The candidate topics as calculated with c-TF-IDF |
required |
Returns:
Name | Type | Description |
---|---|---|
updated_topics |
Mapping[str, List[Tuple[str, float]]]
|
Updated topic representations |
Source code in bertopic\representation\_mmr.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
|