Llama Memo

Posted Jul 16, 2024

By Yue Lin 1 min read

Sources

Ollama. It is easy to use.
Hugging Face. It provides tokenizer in Python API.

Ollama

Installation

Download the app on the website.
Install the app.
Run ollama run llama3 in a terminal, then it will install Llama3: 7B by default, taking up 4.7G storage space.
Then we can query Llama3 in the terminal.
1. To exit: Ctrl + d or /bye
2. To enter again: ollama run llama3

Once we open the app, the server will run at http://localhost:11434.

Using in Command Line

My snippet in Termius:

source ~/.bash_profile
conda activate rlbasic
ollama run llama3

Using by Python

Way 1: Server

[Official Guide]

  
import requests
import json

default_url = "http://127.0.0.1:11434/api/generate"


def query_model(text, server_url=default_url, use_stream=True):
    response = requests.post(
        server_url, json={"model": "llama3", "prompt": text}, stream=use_stream
    )

    results = ""
    for line in response.iter_lines():
        if line:
            json_response = json.loads(line.decode("utf-8"))
            response_i = json_response.get("response", "")
            if use_stream:
                print("Response_i:", response_i)
            results += response_i
            if json_response.get("done", False):
                break

    print("Response:", results)


while True:
    user_input = input("Input: ")
    if user_input.lower() == "quit":
        break
    query_model(user_input)

Way 2: Python API

[Official Guide]

pip install ollama

  
import ollama

response = ollama.chat(
    model="llama3",
    messages=[
        {
            "role": "user",
            "content": "Why is the sky blue?",
        },
    ],
)
print(response["message"]["content"])

  
# Streaming
import ollama

stream = ollama.chat(
    model='llama3',
    messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
    stream=True,
)

for chunk in stream:
    print(chunk['message']['content'], end='', flush=True)

Hugging Face

Download

Fill a form in the website.
1. And then we will get an email. There is a URL in there.
Clone the repo.
1. Run the download.sh script.
2. Copy and paste the URL.
Specify which model weights to download.

Efficiency, Code Utils

This post is licensed under CC BY 4.0 by the author.

Sources

Ollama

Installation

Using in Command Line

Using by Python

Way 1: Server

Way 2: Python API

Hugging Face

Download

Trending Tags