Skip to content

OpenAI

Bases: BaseLLM

Using the OpenAI API to extract keywords.

The default method is openai.Completion if chat=False. The prompts will also need to follow a completion task. If you are looking for a more interactive chats, use chat=True with model=gpt-3.5-turbo.

For an overview see: https://platform.openai.com/docs/models

NOTE: The resulting keywords are expected to be separated by commas so any changes to the prompt will have to make sure that the resulting keywords are comma-separated.

Parameters:

Name Type Description Default
client

A openai.OpenAI client

required
model str

Model to use within OpenAI, defaults to "text-ada-001". NOTE: If a gpt-3.5-turbo model is used, make sure to set chat to True.

'gpt-3.5-turbo-instruct'
generator_kwargs Mapping[str, Any]

Kwargs passed to openai.Completion.create for fine-tuning the output.

{}
prompt str

The prompt to be used in the model. If no prompt is given, self.default_prompt_ is used instead. NOTE: Use "[DOCUMENT]" in the prompt to decide where the document needs to be inserted

None
system_prompt str

The message that sets the behavior of the assistant. It's typically used to provide high-level instructions for the conversation.

'You are a helpful assistant.'
delay_in_seconds float

The delay in seconds between consecutive prompts in order to prevent RateLimitErrors.

None
exponential_backoff bool

Retry requests with a random exponential backoff. A short sleep is used when a rate limit error is hit, then the requests is retried. Increase the sleep length if errors are hit until 10 unsuccesfull requests. If True, overrides delay_in_seconds.

False
chat bool

Set this to True if a chat model is used. Generally, this GPT 3.5 or higher See: https://platform.openai.com/docs/models/gpt-3-5

False
verbose bool

Set this to True if you want to see a progress bar for the keyword extraction.

False

Usage:

To use this, you will need to install the openai package first:

pip install openai

Then, get yourself an API key and use OpenAI's API as follows:

import openai
from keybert.llm import OpenAI
from keybert import KeyLLM

# Create your LLM
client = openai.OpenAI(api_key=MY_API_KEY)
llm = OpenAI(client)

# Load it in KeyLLM
kw_model = KeyLLM(llm)

# Extract keywords
document = "The website mentions that it only takes a couple of days to deliver but I still have not received mine."
keywords = kw_model.extract_keywords(document)

You can also use a custom prompt:

prompt = "I have the following document: [DOCUMENT] \nThis document contains the following keywords separated by commas: '"
llm = OpenAI(client, prompt=prompt, delay_in_seconds=5)

If you want to use OpenAI's ChatGPT model:

llm = OpenAI(client, model="gpt-3.5-turbo", delay_in_seconds=10, chat=True)
Source code in keybert\llm\_openai.py
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
class OpenAI(BaseLLM):
    r"""Using the OpenAI API to extract keywords.

    The default method is `openai.Completion` if `chat=False`.
    The prompts will also need to follow a completion task. If you
    are looking for a more interactive chats, use `chat=True`
    with `model=gpt-3.5-turbo`.

    For an overview see:
    https://platform.openai.com/docs/models

    NOTE: The resulting keywords are expected to be separated by commas so
    any changes to the prompt will have to make sure that the resulting
    keywords are comma-separated.

    Arguments:
        client: A `openai.OpenAI` client
        model: Model to use within OpenAI, defaults to `"text-ada-001"`.
               NOTE: If a `gpt-3.5-turbo` model is used, make sure to set
               `chat` to True.
        generator_kwargs: Kwargs passed to `openai.Completion.create`
                          for fine-tuning the output.
        prompt: The prompt to be used in the model. If no prompt is given,
                `self.default_prompt_` is used instead.
                NOTE: Use `"[DOCUMENT]"` in the prompt
                to decide where the document needs to be inserted
        system_prompt: The message that sets the behavior of the assistant.
                       It's typically used to provide high-level instructions
                       for the conversation.
        delay_in_seconds: The delay in seconds between consecutive prompts
                          in order to prevent RateLimitErrors.
        exponential_backoff: Retry requests with a random exponential backoff.
                             A short sleep is used when a rate limit error is hit,
                             then the requests is retried. Increase the sleep length
                             if errors are hit until 10 unsuccesfull requests.
                             If True, overrides `delay_in_seconds`.
        chat: Set this to True if a chat model is used. Generally, this GPT 3.5 or higher
              See: https://platform.openai.com/docs/models/gpt-3-5
        verbose: Set this to True if you want to see a progress bar for the
                 keyword extraction.

    Usage:

    To use this, you will need to install the openai package first:

    `pip install openai`

    Then, get yourself an API key and use OpenAI's API as follows:

    ```python
    import openai
    from keybert.llm import OpenAI
    from keybert import KeyLLM

    # Create your LLM
    client = openai.OpenAI(api_key=MY_API_KEY)
    llm = OpenAI(client)

    # Load it in KeyLLM
    kw_model = KeyLLM(llm)

    # Extract keywords
    document = "The website mentions that it only takes a couple of days to deliver but I still have not received mine."
    keywords = kw_model.extract_keywords(document)
    ```

    You can also use a custom prompt:

    ```python
    prompt = "I have the following document: [DOCUMENT] \nThis document contains the following keywords separated by commas: '"
    llm = OpenAI(client, prompt=prompt, delay_in_seconds=5)
    ```

    If you want to use OpenAI's ChatGPT model:

    ```python
    llm = OpenAI(client, model="gpt-3.5-turbo", delay_in_seconds=10, chat=True)
    ```
    """

    def __init__(
        self,
        client,
        model: str = "gpt-3.5-turbo-instruct",
        prompt: str = None,
        system_prompt: str = "You are a helpful assistant.",
        generator_kwargs: Mapping[str, Any] = {},
        delay_in_seconds: float = None,
        exponential_backoff: bool = False,
        chat: bool = False,
        verbose: bool = False,
    ):
        self.client = client
        self.model = model

        if prompt is None:
            self.prompt = DEFAULT_CHAT_PROMPT if chat else DEFAULT_PROMPT
        else:
            self.prompt = prompt

        self.system_prompt = system_prompt
        self.default_prompt_ = DEFAULT_CHAT_PROMPT if chat else DEFAULT_PROMPT
        self.delay_in_seconds = delay_in_seconds
        self.exponential_backoff = exponential_backoff
        self.chat = chat
        self.verbose = verbose

        self.generator_kwargs = generator_kwargs
        if self.generator_kwargs.get("model"):
            self.model = generator_kwargs.get("model")
        if self.generator_kwargs.get("prompt"):
            del self.generator_kwargs["prompt"]
        if not self.generator_kwargs.get("stop") and not chat:
            self.generator_kwargs["stop"] = "\n"

    def extract_keywords(self, documents: List[str], candidate_keywords: List[List[str]] = None):
        """Extract topics.

        Arguments:
            documents: The documents to extract keywords from
            candidate_keywords: A list of candidate keywords that the LLM will fine-tune
                        For example, it will create a nicer representation of
                        the candidate keywords, remove redundant keywords, or
                        shorten them depending on the input prompt.

        Returns:
            all_keywords: All keywords for each document
        """
        all_keywords = []
        candidate_keywords = process_candidate_keywords(documents, candidate_keywords)

        for document, candidates in tqdm(zip(documents, candidate_keywords), disable=not self.verbose):
            prompt = self.prompt.replace("[DOCUMENT]", document)
            if candidates is not None:
                prompt = prompt.replace("[CANDIDATES]", ", ".join(candidates))

            # Delay
            if self.delay_in_seconds:
                time.sleep(self.delay_in_seconds)

            # Use a chat model
            if self.chat:
                messages = [{"role": "system", "content": self.system_prompt}, {"role": "user", "content": prompt}]
                kwargs = {"model": self.model, "messages": messages, **self.generator_kwargs}
                if self.exponential_backoff:
                    response = chat_completions_with_backoff(self.client, **kwargs)
                else:
                    response = self.client.chat.completions.create(**kwargs)
                keywords = response.choices[0].message.content.strip()

            # Use a non-chat model
            else:
                if self.exponential_backoff:
                    response = completions_with_backoff(
                        self.client, model=self.model, prompt=prompt, **self.generator_kwargs
                    )
                else:
                    response = self.client.completions.create(model=self.model, prompt=prompt, **self.generator_kwargs)
                keywords = response.choices[0].text.strip()
            keywords = [keyword.strip() for keyword in keywords.split(",")]
            all_keywords.append(keywords)

        return all_keywords

extract_keywords(documents, candidate_keywords=None)

Extract topics.

Parameters:

Name Type Description Default
documents List[str]

The documents to extract keywords from

required
candidate_keywords List[List[str]]

A list of candidate keywords that the LLM will fine-tune For example, it will create a nicer representation of the candidate keywords, remove redundant keywords, or shorten them depending on the input prompt.

None

Returns:

Name Type Description
all_keywords

All keywords for each document

Source code in keybert\llm\_openai.py
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
def extract_keywords(self, documents: List[str], candidate_keywords: List[List[str]] = None):
    """Extract topics.

    Arguments:
        documents: The documents to extract keywords from
        candidate_keywords: A list of candidate keywords that the LLM will fine-tune
                    For example, it will create a nicer representation of
                    the candidate keywords, remove redundant keywords, or
                    shorten them depending on the input prompt.

    Returns:
        all_keywords: All keywords for each document
    """
    all_keywords = []
    candidate_keywords = process_candidate_keywords(documents, candidate_keywords)

    for document, candidates in tqdm(zip(documents, candidate_keywords), disable=not self.verbose):
        prompt = self.prompt.replace("[DOCUMENT]", document)
        if candidates is not None:
            prompt = prompt.replace("[CANDIDATES]", ", ".join(candidates))

        # Delay
        if self.delay_in_seconds:
            time.sleep(self.delay_in_seconds)

        # Use a chat model
        if self.chat:
            messages = [{"role": "system", "content": self.system_prompt}, {"role": "user", "content": prompt}]
            kwargs = {"model": self.model, "messages": messages, **self.generator_kwargs}
            if self.exponential_backoff:
                response = chat_completions_with_backoff(self.client, **kwargs)
            else:
                response = self.client.chat.completions.create(**kwargs)
            keywords = response.choices[0].message.content.strip()

        # Use a non-chat model
        else:
            if self.exponential_backoff:
                response = completions_with_backoff(
                    self.client, model=self.model, prompt=prompt, **self.generator_kwargs
                )
            else:
                response = self.client.completions.create(model=self.model, prompt=prompt, **self.generator_kwargs)
            keywords = response.choices[0].text.strip()
        keywords = [keyword.strip() for keyword in keywords.split(",")]
        all_keywords.append(keywords)

    return all_keywords