Llama Memo
Sources
- Ollama. It is easy to use.
- Hugging Face. It provides tokenizer in Python API.
Ollama
Installation
- Download the app on the website.
- Install the app.
- Run
ollama run llama3
in a terminal, then it will installLlama3: 7B
by default, taking up 4.7G storage space. - Then we can query
Llama3
in the terminal.- To exit:
Ctrl + d
or/bye
- To enter again:
ollama run llama3
- To exit:
Once we open the app, the server will run at http://localhost:11434.
Using in Command Line
My snippet in Termius
:
1
2
3
source ~/.bash_profile
conda activate rlbasic
ollama run llama3
Using by Python
Way 1: Server
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import requests
import json
default_url = "http://127.0.0.1:11434/api/generate"
def query_model(text, server_url=default_url, use_stream=True):
response = requests.post(
server_url, json={"model": "llama3", "prompt": text}, stream=use_stream
)
results = ""
for line in response.iter_lines():
if line:
json_response = json.loads(line.decode("utf-8"))
response_i = json_response.get("response", "")
if use_stream:
print("Response_i:", response_i)
results += response_i
if json_response.get("done", False):
break
print("Response:", results)
while True:
user_input = input("Input: ")
if user_input.lower() == "quit":
break
query_model(user_input)
Way 2: Python API
1
pip install ollama
1
2
3
4
5
6
7
8
9
10
11
12
import ollama
response = ollama.chat(
model="llama3",
messages=[
{
"role": "user",
"content": "Why is the sky blue?",
},
],
)
print(response["message"]["content"])
1
2
3
4
5
6
7
8
9
10
11
# Streaming
import ollama
stream = ollama.chat(
model='llama3',
messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
stream=True,
)
for chunk in stream:
print(chunk['message']['content'], end='', flush=True)
Hugging Face
Download
- Fill a form in the website.
- And then we will get an email. There is a URL in there.
- Clone the repo.
- Run the
download.sh
script. - Copy and paste the URL.
- Run the
- Specify which model weights to download.
This post is licensed under CC BY 4.0 by the author.