Skip to content

LiteLLM

Bases: BaseLLM

Extract keywords using LiteLLM to call any LLM API using OpenAI format such as Anthropic, Huggingface, Cohere, TogetherAI, Azure, OpenAI, etc.

NOTE: The resulting keywords are expected to be separated by commas so any changes to the prompt will have to make sure that the resulting keywords are comma-separated.

Parameters:

Name Type Description Default
model str

Model to use within LiteLLM, defaults to OpenAI's "gpt-3.5-turbo".

'gpt-3.5-turbo'
generator_kwargs Mapping[str, Any]

Kwargs passed to litellm.completion for fine-tuning the output.

{}
prompt str

The prompt to be used in the model. If no prompt is given, self.default_prompt_ is used instead. NOTE: Use "[DOCUMENT]" in the prompt to decide where the document needs to be inserted

None
system_prompt str

The message that sets the behavior of the assistant. It's typically used to provide high-level instructions for the conversation.

'You are a helpful assistant.'
delay_in_seconds float

The delay in seconds between consecutive prompts in order to prevent RateLimitErrors.

None
verbose bool

Set this to True if you want to see a progress bar for the keyword extraction.

False

Usage:

Let's use OpenAI as an example:

import os
from keybert.llm import LiteLLM
from keybert import KeyLLM

# Select LLM
os.environ["OPENAI_API_KEY"] = "sk-..."
llm = LiteLLM("gpt-3.5-turbo")

# Load it in KeyLLM
kw_model = KeyLLM(llm)

# Extract keywords
document = "The website mentions that it only takes a couple of days to deliver but I still have not received mine."
keywords = kw_model.extract_keywords(document)

You can also use a custom prompt:

prompt = "I have the following document: [DOCUMENT] \nThis document contains the following keywords separated by commas: '"
llm = LiteLLM("gpt-3.5-turbo", prompt=prompt)
Source code in keybert\llm\_litellm.py
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
class LiteLLM(BaseLLM):
    r"""Extract keywords using LiteLLM to call any LLM API using OpenAI format
    such as Anthropic, Huggingface, Cohere, TogetherAI, Azure, OpenAI, etc.

    NOTE: The resulting keywords are expected to be separated by commas so
    any changes to the prompt will have to make sure that the resulting
    keywords are comma-separated.

    Arguments:
        model: Model to use within LiteLLM, defaults to OpenAI's `"gpt-3.5-turbo"`.
        generator_kwargs: Kwargs passed to `litellm.completion`
                          for fine-tuning the output.
        prompt: The prompt to be used in the model. If no prompt is given,
                `self.default_prompt_` is used instead.
                NOTE: Use `"[DOCUMENT]"` in the prompt
                to decide where the document needs to be inserted
        system_prompt: The message that sets the behavior of the assistant.
                       It's typically used to provide high-level instructions
                       for the conversation.
        delay_in_seconds: The delay in seconds between consecutive prompts
                          in order to prevent RateLimitErrors.
        verbose: Set this to True if you want to see a progress bar for the
                 keyword extraction.

    Usage:

    Let's use OpenAI as an example:

    ```python
    import os
    from keybert.llm import LiteLLM
    from keybert import KeyLLM

    # Select LLM
    os.environ["OPENAI_API_KEY"] = "sk-..."
    llm = LiteLLM("gpt-3.5-turbo")

    # Load it in KeyLLM
    kw_model = KeyLLM(llm)

    # Extract keywords
    document = "The website mentions that it only takes a couple of days to deliver but I still have not received mine."
    keywords = kw_model.extract_keywords(document)
    ```

    You can also use a custom prompt:

    ```python
    prompt = "I have the following document: [DOCUMENT] \nThis document contains the following keywords separated by commas: '"
    llm = LiteLLM("gpt-3.5-turbo", prompt=prompt)
    ```
    """

    def __init__(
        self,
        model: str = "gpt-3.5-turbo",
        prompt: str = None,
        system_prompt: str = "You are a helpful assistant.",
        generator_kwargs: Mapping[str, Any] = {},
        delay_in_seconds: float = None,
        verbose: bool = False,
    ):
        self.model = model

        if prompt is None:
            self.prompt = DEFAULT_PROMPT
        else:
            self.prompt = prompt

        self.system_prompt = system_prompt
        self.default_prompt_ = DEFAULT_PROMPT
        self.delay_in_seconds = delay_in_seconds
        self.verbose = verbose

        self.generator_kwargs = generator_kwargs
        if self.generator_kwargs.get("model"):
            self.model = generator_kwargs.get("model")
        if self.generator_kwargs.get("prompt"):
            del self.generator_kwargs["prompt"]

    def extract_keywords(self, documents: List[str], candidate_keywords: List[List[str]] = None):
        """Extract topics.

        Arguments:
            documents: The documents to extract keywords from
            candidate_keywords: A list of candidate keywords that the LLM will fine-tune
                        For example, it will create a nicer representation of
                        the candidate keywords, remove redundant keywords, or
                        shorten them depending on the input prompt.

        Returns:
            all_keywords: All keywords for each document
        """
        all_keywords = []
        candidate_keywords = process_candidate_keywords(documents, candidate_keywords)

        for document, candidates in tqdm(zip(documents, candidate_keywords), disable=not self.verbose):
            prompt = self.prompt.replace("[DOCUMENT]", document)
            if candidates is not None:
                prompt = prompt.replace("[CANDIDATES]", ", ".join(candidates))

            # Delay
            if self.delay_in_seconds:
                time.sleep(self.delay_in_seconds)

            # Use a chat model
            messages = [{"role": "system", "content": self.system_prompt}, {"role": "user", "content": prompt}]
            kwargs = {"model": self.model, "messages": messages, **self.generator_kwargs}

            response = completion(**kwargs)
            keywords = response["choices"][0]["message"]["content"].strip()
            keywords = [keyword.strip() for keyword in keywords.split(",")]
            all_keywords.append(keywords)

        return all_keywords

extract_keywords(documents, candidate_keywords=None)

Extract topics.

Parameters:

Name Type Description Default
documents List[str]

The documents to extract keywords from

required
candidate_keywords List[List[str]]

A list of candidate keywords that the LLM will fine-tune For example, it will create a nicer representation of the candidate keywords, remove redundant keywords, or shorten them depending on the input prompt.

None

Returns:

Name Type Description
all_keywords

All keywords for each document

Source code in keybert\llm\_litellm.py
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
def extract_keywords(self, documents: List[str], candidate_keywords: List[List[str]] = None):
    """Extract topics.

    Arguments:
        documents: The documents to extract keywords from
        candidate_keywords: A list of candidate keywords that the LLM will fine-tune
                    For example, it will create a nicer representation of
                    the candidate keywords, remove redundant keywords, or
                    shorten them depending on the input prompt.

    Returns:
        all_keywords: All keywords for each document
    """
    all_keywords = []
    candidate_keywords = process_candidate_keywords(documents, candidate_keywords)

    for document, candidates in tqdm(zip(documents, candidate_keywords), disable=not self.verbose):
        prompt = self.prompt.replace("[DOCUMENT]", document)
        if candidates is not None:
            prompt = prompt.replace("[CANDIDATES]", ", ".join(candidates))

        # Delay
        if self.delay_in_seconds:
            time.sleep(self.delay_in_seconds)

        # Use a chat model
        messages = [{"role": "system", "content": self.system_prompt}, {"role": "user", "content": prompt}]
        kwargs = {"model": self.model, "messages": messages, **self.generator_kwargs}

        response = completion(**kwargs)
        keywords = response["choices"][0]["message"]["content"].strip()
        keywords = [keyword.strip() for keyword in keywords.split(",")]
        all_keywords.append(keywords)

    return all_keywords