Changelog
Version 0.9.0¶
Release date: 05 Februari, 2025
Model2Vec
You can use Model2Vec for blazingly fast embedding as follows:
from keybert import KeyBERT
from model2vec import StaticModel
embedding_model = StaticModel.from_pretrained("minishlab/potion-base-8M")
kw_model = KeyBERT(embedding_model)
Light-weight KeyBERT
You can now install a light-weight KeyBERT with:
pip install keybert --no-deps scikit-learn model2vec
Fixes
- Add Model2Vec & light-weight installation in #253
- Add Text Generation Inference with JSON output by @joaomsimoes in #235
- Update pre-commit hooks @afuetterer in #237
- Set up lint job using pre-commit/action @afuetterer in #238
Version 0.8.5¶
Release date: 14 June, 2024
- Use
batch_size
parameter withkeybert.backend.SentenceTransformerBackend
by @adhadse in #210 - Add system_prompt param to LLMs by @lucafirefox in #214
- Update OpenAI API response by @lucafirefox in #213
- Drop support for python 3.6 and 3.7 by @afuetterer in #230
- Bump github actions versions by @afuetterer in #228
- Switch from setup.py to pyproject.toml by @afuetterer in #231
Version 0.8.4¶
Release date: 15 Februari, 2024
- Update default Cohere model to
command
by @sam-frampton in #194 - Fix KeyLLM fails when no GPU is available by @igor-pechersky in #201
- Fix
AttributeError: 'tuple' object has no attribute 'page_content'
in LangChain in #199
Version 0.8.3¶
Release date: 29 November, 2023
- Fix support for openai>=1
You can now use it as follows:
import openai
from keybert.llm import OpenAI
from keybert import KeyLLM
# Create your LLM
client = openai.OpenAI(api_key=MY_API_KEY)
llm = OpenAI(client)
# Load it in KeyLLM
kw_model = KeyLLM(llm)
Version 0.8.2¶
Release date: 29 September, 2023
- Fixed cuda error when using pre-calculated embeddings with
KeyBERT
+KeyLLM
Version 0.8.1¶
Release date: 29 September, 2023
- Remove unnecessary print statements
Version 0.8.0¶
Release date: 29 September, 2023
Highlights:
- Use
KeyLLM
to leverage LLMs for extracting keywords - Use it either with or without candidate keywords generated through
KeyBERT
- Multiple LLMs are integrated: OpenAI, Cohere, LangChain, HF, and LiteLLM
import openai
from keybert.llm import OpenAI
from keybert import KeyLLM
# Create your LLM
openai.api_key = "sk-..."
llm = OpenAI()
# Load it in KeyLLM
kw_model = KeyLLM(llm)
See here for full documentation on use cases of KeyLLM
and here for the implemented Large Language Models.
Fixes:
- Enable Guided KeyBERT for seed keywords differing among docs by @shengbo-ma in #152
Version 0.7.0¶
Release date: 3 November, 2022
Highlights:
- Cleaned up documentation and added several visual representations of the algorithm (excluding MMR / MaxSum)
- Added function to extract and pass word- and document embeddings which should make fine-tuning much faster
from keybert import KeyBERT
kw_model = KeyBERT()
# Prepare embeddings
doc_embeddings, word_embeddings = kw_model.extract_embeddings(docs)
# Extract keywords without needing to re-calculate embeddings
keywords = kw_model.extract_keywords(docs, doc_embeddings=doc_embeddings, word_embeddings=word_embeddings)
Do note that the parameters passed to .extract_embeddings
for creating the vectorizer should be exactly the same as those in .extract_keywords
.
Fixes:
- Redundant documentation was removed by @mabhay3420 in #123
- Fixed Gensim backend not working after v4 migration (#71)
- Fixed
candidates
not working (#122)
Version 0.6.0¶
Release date: 25 July, 2022
Highlights:
- Major speedup, up to 2x to 5x when passing multiple documents (for MMR and MaxSum) compared to single documents
- Same results whether passing a single document or multiple documents
- MMR and MaxSum now work when passing a single document or multiple documents
- Improved documentation
- Added 🤗 Hugging Face Transformers
from keybert import KeyBERT
from transformers.pipelines import pipeline
hf_model = pipeline("feature-extraction", model="distilbert-base-cased")
kw_model = KeyBERT(model=hf_model)
- Highlighting support for Chinese texts
- Now uses the
CountVectorizer
for creating the tokens - This should also improve the highlighting for most applications and higher n-grams
- Now uses the
NOTE: Although highlighting for Chinese texts is improved, since I am not familiar with the Chinese language there is a good chance it is not yet as optimized as for other languages. Any feedback with respect to this is highly appreciated!
Fixes:
- Fix typo in ReadMe by @priyanshul-govil in #117
- Add missing optional dependencies (gensim, use, and spacy) by @yusuke1997 in #114
Version 0.5.1¶
Release date: 31 March, 2022
- Added a page about leveraging
CountVectorizer
andKeyphraseVectorizers
- Shoutout to @TimSchopf for creating and optimizing the package!
- The
KeyphraseVectorizers
package can be found here
- Fixed Max Sum Similarity returning incorrect similarities #92
- Thanks to @kunihik0 for the PR!
- Fixed out of bounds condition in MMR
- Thanks to @artmatsak for the PR!
- Started styling with Flake8 and Black (which was long overdue)
- Added pre-commit to make following through a bit easier with styling
Version 0.5.0¶
Release date: 28 September, 2021
Highlights:
- Added Guided KeyBERT
- kw_model.extract_keywords(doc, seed_keywords=seed_keywords)
- Thanks to @zolekode for the inspiration!
- Use the newest all-* models from SBERT
Miscellaneous:
- Added instructions in the FAQ to extract keywords from Chinese documents
- Fix typo in ReadMe by @koaning in #51
Version 0.4.0¶
Release date: 23 June, 2021
Highlights:
- Highlight a document's keywords with:
keywords = kw_model.extract_keywords(doc, highlight=True)
- Use
paraphrase-MiniLM-L6-v2
as the default embedder which gives great results!
Miscellaneous:
- Update Flair dependencies
- Added FAQ
Version 0.3.0¶
Release date: 10 May, 2021
The two main features are candidate keywords and several backends to use instead of Flair and SentenceTransformers!
Highlights:
- Use candidate words instead of extracting those from the documents (#25)
KeyBERT().extract_keywords(doc, candidates)
- Spacy, Gensim, USE, and Custom Backends were added (see documentation here)
Fixes:
- Improved imports
- Fix encoding error when locally installing KeyBERT (#30)
Miscellaneous:
Version 0.2.0¶
Release date: 9 Feb, 2021
Highlights:
- Add similarity scores to the output
- Add Flair as a possible back-end
- Update documentation + improved testing
Version 0.1.2¶
Release date: 28 Oct, 2020
Added Max Sum Similarity as an option to diversify your results.
Version 0.1.0¶
Release date: 27 Oct, 2020
This first release includes keyword/keyphrase extraction using BERT and simple cosine similarity. There is also an option to use Maximal Marginal Relevance to select the candidate keywords/keyphrases.