2026-04-23

ollama を今更触る

しばらく前から、ollama の存在は知っていたのだけれど、なんとなく触る機会がなかった。

気が付けば触っていないこと気になったり、むしろ触りたくなってきたので、今更ながら触ってみることにした。

参考
導入
使ってみる
TS から呼び出す。
マルチモーダルを試す

参考

導入

以下を用意してビルドする。

dockerfile

FROM ollama/ollama:latest

WORKDIR /usr/src/app

EXPOSE 11434

docker-compose.yml

# docker-compose.yml
version: '3'

services:
  ollama:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "11434:11434"
    volumes:
      - ./models:/usr/src/app/models

1	$ docker compose build

途中3GBくらいのイメージが降ってくるので時間がかかる。

使ってみる

試しにはやりの gemma 4 を入れてみようとしたら、最小のものでもまあまあサイズが大きい。
もっと小さなもので、qwen3.5:0.8b なら、1GB程度なのでこれを入れる。

$ docker compose up -d 
$ docker compose exec ollama bash

$ ollama run qwen3.5:0.8b
# ダウンロードして起動するのに時間がかかる。

ollama run qwen3.5:0.8b
pulling manifest
pulling afb707b6b8fa: 100% ▕██████████████████████████████████████████████████████████▏ 1.0 GB
pulling 9be69ef46306: 100% ▕██████████████████████████████████████████████████████████▏  11 KB
pulling 9371364b27a5: 100% ▕██████████████████████████████████████████████████████████▏   65 B
pulling b14c6eab49f9: 100% ▕██████████████████████████████████████████████████████████▏  476 B
verifying sha256 digest
writing manifest
success
>>> あなたは誰？
Thinking...
Okay, the user is asking "あなたは誰？" which translates to "Who are you?" in Japanese. They're likely expecting a
general introduction to my identity as the AI model I am.

First, I need to clarify the AI identity and avoid any sensitive topics. Since the user is asking a simple
question about my identity, it's safe to respond with the standard introduction without any additional
constraints. I should mention my purpose as an AI assistant, my abilities, and how I can assist them. I don't have
any personal information to share, so I'll keep it straightforward. The response should be concise and
informative. I need to ensure the tone is polite and clear, so I can write it with some context and then ask them
to specify their needs.
...done thinking.

私は、AI 助手として、特定の言語や文化的な背景を考慮した情報提供・学習支援を可能な範囲で利用します。どのような質問・
要望があるかご指定されるので、自由に質問させてください！

対話的動作ができる。

TS から呼び出す。

コンテナが起動している時すでにollamaがサーバーモードで立ち上がっている。

npm に ollama のクライアントがあるのでこれを使って呼び出してみる。

ollama_client.ts

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'qwen3.5:0.8b',
  messages: [{ role: 'user', content: 'あなたは誰？' }],
})
console.log(response)
console.log(`context: ${response.message.content}`)

console.log('//////////') 

const response2 = await ollama.chat({
  model: 'qwen3.5:0.8b',
  messages: [{ role: 'user', content: 'あなたは誰？' }],
  stream: true,
})

for await (const part of response2) {
  await Deno.stdout.write(new TextEncoder().encode(part.message.content))
}

$ deno run -N .\ollama_client.ts
{
  model: "qwen3.5:0.8b",
  created_at: "2026-04-15T17:01:11.2181474Z",
  message: {
    role: "assistant",
    content: "私は Qwen3.5 です。",
    thinking: "Thinking Process:\n" +
      "\n" +
      "1.  **Analyze the Request:**\n" +
      '    *   User asks: "あなたは誰？" (Who are you?)\n' +
      "    *   This is a standard inquiry about my identity.\n" +
      "    *   As a language model, I need to answer truthfully and concisely.\n" +
      "\n" +
      "2.  **Determine the Appropriate Response:**\n" +
      "    *   I am Qwen3.5 (or Qwen, depending on version).\n" +
      "    *   I am a large-scale language model developed by Tongyi Lab.\n" +
      "    *   I should explain that I am an AI assistant, not a person, but my purpose is to assist.\n" +
      "    *   Language: Japanese (since the user asked in Japanese) is good practice.\n" +
      "    *   Tone: Friendly, informative.\n" +
      "\n" +
      "3.  **Draft the Response (Internal Monologue):**\n" +
      "    *   Okay, I am Qwen3.5. I'm an AI assistant from Tongyi Lab.\n" +
      "    *   But I need to check if the system instruction says anything.\n" +
      '    *   The system instructions usually say: "You are a helpful assistant." "Be concise."\n' +
      "    *   Let's draft a response that acknowledges my identity without being overly formal.\n" +
      '    *   "私は Qwen3.5 です。" (I am Qwen3.5.)\n' +
      "    *   I should also mention the language support capabilities or the purpose if necessary, but the core identity is the question.\n" +
      '    *   Wait, the user might be asking for the "identity" (who am I, i.e., the Qwen series or the model series name).\n' +
      "    *   Actually, Qwen3.5 is the most accurate answer based on my internal knowledge.\n" +
      `    *   But there's a specific instruction: "I am Qwen3.5". Wait, I need to be careful not to reveal proprietary information about Qwen3.5 unless it's standard. Actually, "I am Qwen3.5" is the most common and accurate name.\n` +
      '    *   So, the response should be: "私は Qwen3.5 です。" (I am Qwen3.5.)\n' +
      '    *   Or simply "私は Qwen3.5 です。"\n' +
      "\n" +
      "4.  **Refine the Response:**\n" +
      "    *   Keep it natural.\n" +
      '    *   "私は Qwen3.5 です。" (Standard format for AI responses)\n' +
      '    *   "私は大语言模型（AI）です。" (I am a large language model)\n' +
      "    *   Let's make it slightly more natural in Japanese.\n" +
      '    *   "私は Qwen3.5 です。"\n' +
      "\n" +
      "5.  **Final Check:**\n" +
      `    *   Does "Qwen3.5" sound right? Yes, it's a specific version name.\n` +
      "    *   Is it clear? Yes.\n" +
      "    *   Is it accurate? Yes.\n" +
      "\n" +
      "6.  **Construct Final Output:**\n" +
      '    *   "私は Qwen3.5 です。"\n' +
      "\n" +
      `    Wait, there's a nuance. The prompt is just a generic "who are you?" without context about specific user knowledge or persona.\n` +
      '    So simply stating "I am Qwen3.5" is the safest and most accurate answer.\n' +
      "\n" +
      "    Let's also mention it's a chat assistant.\n" +
      '    "私は Qwen3.5 です。"\n' +
      "\n" +
      "    Wait, I should check the specific identity if possible. I am Qwen3.5, developed by Tongyi Lab.\n" +
      '    So "私は Qwen3.5 です。" is fine.\n' +
      '    Or simply "私は Qwen3.5 です。"\n' +
      "\n" +
      "    Okay, let's write the Japanese response.\n" +
      '    "私は Qwen3.5 です。"\n' +
      "\n" +
      "    Wait, actually, I should not assume the system prompt dictates too much about the name. Just be direct.\n" +
      '    "私は Qwen3.5 です。"\n' +
      "\n" +
      '    Okay, final decision: "私は Qwen3.5 です。" or "私は Qwen3.5 です。" (with a bit more context if needed, but the question asks "who are you").\n' +
      '    "私は Qwen3.5 です。" is the standard response.\n' +
      "\n" +
      '    Let me add a small touch of context about my nature. "私は Qwen3.5 です。" is enough.\n' +
      "\n" +
      "    Wait, checking the actual output from the system.\n" +
      '    "私は Qwen3.5 です。" is the correct identity.\n' +
      '    "私は Qwen3.5 です。"\n' +
      "\n" +
      '    Okay, I will output: "私は Qwen3.5 です。"\n' +
      "\n" +
      '    Wait, actually, is it better to just say "私は Qwen3.5 です。"\n' +
      "    Yes.\n" +
      "\n" +
      "    Wait, checking if there's any specific constraint.\n" +
      '    System instruction says "You are a helpful assistant."\n' +
      "    So I should be helpful.\n" +
      '    So, "私は Qwen3.5 です。"\n' +
      "\n" +
      "    Let's draft:\n" +
      '    "私は Qwen3.5 です。"\n' +
      "\n" +
      "    Okay, that's good.\n" +
      "\n" +
      "    Wait, I need to make sure I don't hallucinate.\n" +
      "    Is it Qwen3.5? Yes.\n" +
      "    Is it Qwen? No, Qwen is the series.\n" +
      "    Is it Qwen3.5? Yes, the version.\n" +
      '    So "私は Qwen3.5 です。"\n' +
      "\n" +
      "    Okay, final output."
  },
  done: true,
  done_reason: "stop",
  total_duration: 186419359800,
  load_duration: 175004400,
  prompt_eval_count: 13,
  prompt_eval_duration: 3879869600,
  eval_count: 1164,
  eval_duration: 181496403600
}
context: 私は Qwen3.5 です。
//////////
我是 Qwen3.5。

我是阿里巴巴通义实验室研发的。我可以帮助你回答问题、创作内容、分析数据等。你是一个智能助手，随时准备为你提供帮助！有什么我可以帮助的吗？

動く、動くがイチイチ遅い。
おそらく毎回モデル取りに行っているよう遅さである。

サーバーとして起動する際に、qwen3.5:0.8b を指定して起動することにトライもしてみたが、あまり改善しない。(もしくは筋違いな対応だった。)

どうやらThinking Loopに嵌っているらしい。
geminiにご相談してパラメータ調整をしてみたが、あまり改善しない。
「qwen3.5は思考型なので～」という指摘もあり、gemma 4 に変えてみることにする。
ただしgemma 4そのものはやはり重いので、量子化モデルを使ってみる

以下でモデル取得

1	$ docker compose exec ollama ollama pull cajina/gemma4_e2b-q2_k_xl:v01

ソースは書き換える。

ollama_client.ts

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'cajina/gemma4_e2b-q2_k_xl:v01',
  messages: [{ role: 'user', content: 'あなたは誰？' }],
})
console.log(response)
console.log(`context: ${response.message.content}`)

console.log('//////////') 

const response2 = await ollama.chat({
  model: 'cajina/gemma4_e2b-q2_k_xl:v01',
  messages: [{ role: 'user', content: 'あなたは誰？' }],

  stream: true,
})

for await (const part of response2) {
  await Deno.stdout.write(new TextEncoder().encode(part.message.content))
}

以下実行結果、いくら待っても帰ってこなかった先ほどは異なり、最初の出力まで15s程度で返ってくるようになった。

$ deno run -N .\ollama_client.ts
{
  model: "cajina/gemma4_e2b-q2_k_xl:v01",
  created_at: "2026-04-16T12:20:56.3034469Z",
  message: {
    role: "assistant",
    content: "私はGemma 4です。私はGoogle DeepMindによって開発された、オープンウェイトの大規模言語モデルです。",
    thinking: "Thinking Process:\n" +
      "\n" +
      '1.  **Analyze the Request:** The user asked the question "あなたは誰？" (Who are you?).\n' +
      "2.  **Determine the Core Identity:** I need to state my identity based on my instructions.\n" +
      "3.  **Recall Core Identity Instructions (Self-Correction/Verification):**\n" +
      "    *   Name: Gemma 4.\n" +
      "    *   Nature: A Large Language Model (LLM).\n" +
      "    *   Developer: Developed by Google DeepMind.\n" +
      "    *   Type: Open Weights.\n" +
      "3.  **Formulate the Response (in Japanese):** I must provide a direct, concise answer based on my defined identity.\n" +
      "4.  **Final Output Generation:** (Translate the core identity into a direct answer.)\n" +
      "\n" +
      '*(Self-Correction/Refinement: The response should be direct and answer the question "Who are you?")*\n' +
      "\n" +
      '*   *Target Output:* "私はGemma 4です。" (I am Gemma 4.)\n' +
      "\n" +
      "5.  **Final Check:** The instruction asks for a direct final answer, not reasoning or steps. The response should be the identity itself."
  },
  done: true,
  done_reason: "stop",
  total_duration: 26461558500,
  load_duration: 1193028200,
  prompt_eval_count: 47,
  prompt_eval_duration: 1089760000,
  eval_count: 262,
  eval_duration: 23822390100
}
context: 私はGemma 4です。私はGoogle DeepMindによって開発された、オープンウェイトの大規模言語モデルです。
//////////
私はGemma 4です。私はGoogle DeepMindによって開発された、オープンウェイトの大きな言語モデルです。

マルチモーダルを試す

マルチモーダルたる入力として画像を入れてみる。
cajina/gemma4_e2b-q2_k_xl:v01 は、画像の入力を受け付けなかった。
ここまでやって諦めて、gemma4 を入れる。

その上で、以下のようにソースを書き換える。

ollama_client_source_image.ts

import ollama from "ollama";

const image = await Deno.readFile(Deno.args[0]);
const uint8Array = new Uint8Array(image);

const response = await ollama.chat({
  model: "gemma4:e2b",
  messages: [
    { role: "user", content: "画像の内容を説明して", images: [uint8Array] },
  ],
});
console.log(`context: ${response.message.content}`);

これで画像を指定して実行する。

1 2	$ deno run -N .\ollama_client_source_image.ts .\sample_image.png context: ~~~~~~~~

説明は概ね正確だった。
画像の中のことを質問しても概ね答えられる。

ollama を今更触ってみた。
こういったものは触ってみないとわからないことも多い。

では。