Max Sum Distance
¶
Calculate Max Sum Distance for extraction of keywords.
We take the 2 x top_n most similar words/phrases to the document. Then, we take all top_n combinations from the 2 x top_n words and extract the combination that are the least similar to each other by cosine similarity.
This is O(n^2) and therefore not advised if you use a large top_n
Parameters:
Name | Type | Description | Default |
---|---|---|---|
doc_embedding
|
ndarray
|
The document embeddings |
required |
word_embeddings
|
ndarray
|
The embeddings of the selected candidate keywords/phrases |
required |
words
|
List[str]
|
The selected candidate keywords/keyphrases |
required |
top_n
|
int
|
The number of keywords/keyhprases to return |
required |
nr_candidates
|
int
|
The number of candidates to consider |
required |
Returns:
Type | Description |
---|---|
List[Tuple[str, float]]
|
List[Tuple[str, float]]: The selected keywords/keyphrases with their distances |
Source code in keybert\_maxsum.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
|