Post

Llama Memo

Sources

  1. Ollama. It is easy to use.
  2. Hugging Face. It provides tokenizer in Python API.

Ollama

Installation

  1. Download the app on the website.
  2. Install the app.
  3. Run ollama run llama3 in a terminal, then it will install Llama3: 7B by default, taking up 4.7G storage space.
  4. Then we can query Llama3 in the terminal.
    1. To exit: Ctrl + d or /bye
    2. To enter again: ollama run llama3

Once we open the app, the server will run at http://localhost:11434.

Using in Command Line

My snippet in Termius:

1
2
3
source ~/.bash_profile
conda activate rlbasic
ollama run llama3

Using by Python

Way 1: Server

[Official Guide]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import requests
import json

default_url = "http://127.0.0.1:11434/api/generate"


def query_model(text, server_url=default_url, use_stream=True):
    response = requests.post(
        server_url, json={"model": "llama3", "prompt": text}, stream=use_stream
    )

    results = ""
    for line in response.iter_lines():
        if line:
            json_response = json.loads(line.decode("utf-8"))
            response_i = json_response.get("response", "")
            if use_stream:
                print("Response_i:", response_i)
            results += response_i
            if json_response.get("done", False):
                break

    print("Response:", results)


while True:
    user_input = input("Input: ")
    if user_input.lower() == "quit":
        break
    query_model(user_input)

Way 2: Python API

[Official Guide]

1
pip install ollama
1
2
3
4
5
6
7
8
9
10
11
12
import ollama

response = ollama.chat(
    model="llama3",
    messages=[
        {
            "role": "user",
            "content": "Why is the sky blue?",
        },
    ],
)
print(response["message"]["content"])
1
2
3
4
5
6
7
8
9
10
11
# Streaming
import ollama

stream = ollama.chat(
    model='llama3',
    messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
    stream=True,
)

for chunk in stream:
    print(chunk['message']['content'], end='', flush=True)

Hugging Face

Download

  1. Fill a form in the website.
    1. And then we will get an email. There is a URL in there.
  2. Clone the repo.
    1. Run the download.sh script.
    2. Copy and paste the URL.
  3. Specify which model weights to download.
This post is licensed under CC BY 4.0 by the author.