LiteLLM

Preparation
• Log in to cometapi. Click "ADD API key" in the API keys to get your token key: sk-xxxxx and baseurl: https://api.cometapi.com

Prerequisites

Python 3.6+ installed.

Access to a CometAPI account. Generate an API key from the API Keys page.

Jupyter Notebook or a Python environment for running the examples (optional, but recommended for interactive testing).

Step 1: Install LiteLLM

Install the LiteLLM library using pip. This is a one-time setup.

Step 2: Set Up Your API Key

You need Paste the key you just got from CometAPI to authenticate requests.

Set it as an environment variable (recommended for security) or pass it directly in your code.

Here's an example in Python:

Note: Using the environment variable is safer as it avoids hardcoding sensitive information in your scripts.

Step 3: Make a Basic Completion Call

Use LiteLLM's completion function to send messages to a CometAPI model. You can specify models like cometapi/gpt-5 or cometapi/gpt-4o.

Method 1: Use the environment variable for the API key (recommended).

Method 2: Pass the API key explicitly.

Example:

Expected Output:

The code will print the model's responses, e.g.:

I'm doing well, thank you! How about you?
Hello! I'm doing great, thanks for asking. How can I assist you today?

This sends a simple user message and retrieves the model's completion. You can customize the messages array for more complex conversations (e.g., add system prompts or multi-turn chats).

Step 4: Asynchronous and Streaming Calls

For non-blocking or real-time applications, use LiteLLM's acompletion function for asynchronous calls. This is useful with Python's asyncio for handling concurrency.

You can also enable streaming to receive responses in chunks (e.g., for live chat interfaces).

Example:

Explanation:

acompletion is the asynchronous version of completion.

stream=True enables streaming, where the response is yielded in real-time chunks.

Use asyncio to run the function (e.g., in a Jupyter Notebook with await or via asyncio.run() in scripts).

If an error occurs, it's caught and printed for debugging.

Expected Output:

You'll see the response object and individual chunks printed, e.g.:

Testing asynchronous completion with streaming
Response object: <async_generator object acompletion at 0x...>
Chunk: {'choices': [{'delta': {'content': 'Hello'}, 'index': 0}]}
Chunk: {'choices': [{'delta': {'content': '!'}, 'index': 0}]}
... (full response streamed in parts)

Additional Tips

Supported Models: CometAPI models follow the format cometapi/<model-name>, e.g., cometapi/gpt-5, cometapi/gpt-4o, cometapi/chatgpt-4o-latest. Check the CometAPI documentation for the latest models.

Error Handling: Always wrap calls in try-except blocks to handle issues like invalid keys or network errors.

Advanced Features: LiteLLM supports parameters like temperature, max_tokens, and top_p for fine-tuning responses. Add them to the completion or acompletion calls, e.g., completion(..., temperature=0.7).

Security: Never commit your API key to version control. Use environment variables or secret managers.

Troubleshooting: If you encounter issues, ensure your API key is valid and check LiteLLM's logs. For more details, refer to the LiteLLM Documentation or CometAPI Doc.

Rate Limits and Costs: Monitor your API usage in the CometAPI console.

code example

Prerequisites#

Step 1: Install LiteLLM#

Step 2: Set Up Your API Key#

Step 3: Make a Basic Completion Call#