TextGeneration
¶
Bases: BaseLLM
Text2Text or text generation with transformers.
NOTE: The resulting keywords are expected to be separated by commas so any changes to the prompt will have to make sure that the resulting keywords are comma-separated.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
Union[str, pipeline]
|
A transformers pipeline that should be initialized as "text-generation"
for gpt-like models or "text2text-generation" for T5-like models.
For example, |
required |
prompt
|
str
|
The prompt to be used in the model. If no prompt is given,
|
None
|
pipeline_kwargs
|
Mapping[str, Any]
|
Kwargs that you can pass to the transformers.pipeline when it is called. |
{}
|
random_state
|
int
|
A random state to be passed to |
42
|
verbose
|
bool
|
Set this to True if you want to see a progress bar for the keyword extraction. |
False
|
Usage:
To use a gpt-like model:
from keybert.llm import TextGeneration
from keybert import KeyLLM
# Create your LLM
generator = pipeline('text-generation', model='gpt2')
llm = TextGeneration(generator)
# Load it in KeyLLM
kw_model = KeyLLM(llm)
# Extract keywords
document = "The website mentions that it only takes a couple of days to deliver but I still have not received mine."
keywords = kw_model.extract_keywords(document)
You can use a custom prompt and decide where the document should
be inserted with the [DOCUMENT]
tag:
from keybert.llm import TextGeneration
prompt = "I have the following documents '[DOCUMENT]'. Please give me the keywords that are present in this document and separate them with commas:"
# Create your representation model
generator = pipeline('text2text-generation', model='google/flan-t5-base')
llm = TextGeneration(generator)
Source code in keybert\llm\_textgeneration.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
|
extract_keywords(documents, candidate_keywords=None)
¶
Extract topics.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
documents
|
List[str]
|
The documents to extract keywords from |
required |
candidate_keywords
|
List[List[str]]
|
A list of candidate keywords that the LLM will fine-tune For example, it will create a nicer representation of the candidate keywords, remove redundant keywords, or shorten them depending on the input prompt. |
None
|
Returns:
Name | Type | Description |
---|---|---|
all_keywords |
All keywords for each document |
Source code in keybert\llm\_textgeneration.py
98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
|