Calling Ollama from a Go Application

Elliot Forbes · Mar 7, 2026 · 7 min read

In this tutorial, we’re going to explore how to interact with Ollama directly from a Go application. Ollama runs a local REST API that lets you generate text, handle chat conversations, create embeddings, and more. The best part? It’s super straightforward to work with from Go.

By the end of this guide, you’ll know how to make requests to Ollama, handle streaming responses, build a reusable client, and even create a simple chatbot API. Let’s dive in.

Prerequisites

Before we get started, you’ll need:

  • Ollama installed on your machine (grab it from ollama.ai)
  • At least one model pulled (try ollama pull mistral or ollama pull llama2)
  • A working Go development environment (version 1.16+)
  • Basic familiarity with Go, HTTP requests, and JSON

Make sure Ollama is running before you try any of the examples. By default, it listens on http://localhost:11434.

Understanding Ollama’s API Endpoints

Ollama exposes several key endpoints that we’ll work with:

  • /api/generate - Generate text based on a prompt (supports streaming)
  • /api/chat - Handle multi-turn conversations with message history
  • /api/embeddings - Create vector embeddings for text
  • /api/tags - List all pulled models

For most of our examples, we’ll focus on /api/generate and /api/chat since those are the most commonly used.

Making Your First Request

Let’s start simple. Here’s how to make a basic generation request to Ollama using Go’s standard net/http package:

package main

import (
	"bytes"
	"encoding/json"
	"fmt"
	"io"
	"net/http"
)

func main() {
	// Prepare the request payload
	payload := map[string]interface{}{
		"model":  "mistral",
		"prompt": "What is the capital of France?",
		"stream": false,
	}

	payloadBytes, _ := json.Marshal(payload)

	// Make the POST request
	resp, err := http.Post(
		"http://localhost:11434/api/generate",
		"application/json",
		bytes.NewBuffer(payloadBytes),
	)
	if err != nil {
		fmt.Println("Error:", err)
		return
	}
	defer resp.Body.Close()

	// Read and parse the response
	body, _ := io.ReadAll(resp.Body)
	var result map[string]interface{}
	json.Unmarshal(body, &result)

	fmt.Println("Response:", result["response"])
}

This works, but there’s something important to know about Ollama: by default, it streams responses. Let’s handle that.

Handling Streaming Responses

When you don’t set "stream": false, Ollama sends back NDJSON (newline-delimited JSON). Each line is a separate JSON object. This is actually really efficient for getting incremental results as they’re generated.

Here’s how to handle streaming:

package main

import (
	"bufio"
	"bytes"
	"encoding/json"
	"fmt"
	"net/http"
)

func main() {
	payload := map[string]interface{}{
		"model":  "mistral",
		"prompt": "Write a short poem about Go programming",
		"stream": true, // Enable streaming
	}

	payloadBytes, _ := json.Marshal(payload)

	resp, err := http.Post(
		"http://localhost:11434/api/generate",
		"application/json",
		bytes.NewBuffer(payloadBytes),
	)
	if err != nil {
		fmt.Println("Error:", err)
		return
	}
	defer resp.Body.Close()

	// Read the streaming response line by line
	scanner := bufio.NewScanner(resp.Body)
	for scanner.Scan() {
		var result map[string]interface{}
		json.Unmarshal(scanner.Bytes(), &result)

		// Print each chunk of text as it arrives
		if text, ok := result["response"].(string); ok {
			fmt.Print(text)
		}
	}
	fmt.Println() // Final newline
}

This approach lets you display results to the user as they come in, which feels much more responsive.

Building a Chat Application

For multi-turn conversations, Ollama provides the /api/chat endpoint. You send a list of messages with roles (user, assistant, system), and Ollama maintains context:

package main

import (
	"bufio"
	"bytes"
	"encoding/json"
	"fmt"
	"net/http"
)

type Message struct {
	Role    string `json:"role"`
	Content string `json:"content"`
}

type ChatRequest struct {
	Model    string    `json:"model"`
	Messages []Message `json:"messages"`
	Stream   bool      `json:"stream"`
}

func chat(messages []Message) string {
	req := ChatRequest{
		Model:    "mistral",
		Messages: messages,
		Stream:   false,
	}

	payload, _ := json.Marshal(req)

	resp, _ := http.Post(
		"http://localhost:11434/api/chat",
		"application/json",
		bytes.NewBuffer(payload),
	)
	defer resp.Body.Close()

	var result map[string]interface{}
	json.NewDecoder(resp.Body).Decode(&result)

	message := result["message"].(map[string]interface{})
	return message["content"].(string)
}

func main() {
	messages := []Message{
		{Role: "user", Content: "What's your favorite programming language?"},
	}

	response := chat(messages)
	fmt.Println("Assistant:", response)

	// Continue the conversation
	messages = append(messages, Message{Role: "assistant", Content: response})
	messages = append(messages, Message{Role: "user", Content: "Why do you prefer it?"})

	response = chat(messages)
	fmt.Println("Assistant:", response)
}

Notice how we keep appending to the messages slice? That’s what gives the model context about the previous conversation.

Creating a Reusable Ollama Client

Writing the same HTTP logic repeatedly gets tedious. Let’s build a proper client struct:

package main

import (
	"bufio"
	"bytes"
	"context"
	"encoding/json"
	"fmt"
	"io"
	"net/http"
	"time"
)

type OllamaClient struct {
	baseURL string
	client  *http.Client
}

type GenerateRequest struct {
	Model  string `json:"model"`
	Prompt string `json:"prompt"`
	Stream bool   `json:"stream"`
}

type GenerateResponse struct {
	Response string `json:"response"`
	Done     bool   `json:"done"`
}

func NewOllamaClient(baseURL string) *OllamaClient {
	return &OllamaClient{
		baseURL: baseURL,
		client: &http.Client{
			Timeout: 5 * time.Minute,
		},
	}
}

func (c *OllamaClient) Generate(ctx context.Context, model, prompt string) (string, error) {
	req := GenerateRequest{
		Model:  model,
		Prompt: prompt,
		Stream: false,
	}

	payload, _ := json.Marshal(req)

	httpReq, _ := http.NewRequestWithContext(
		ctx,
		"POST",
		c.baseURL+"/api/generate",
		bytes.NewBuffer(payload),
	)
	httpReq.Header.Set("Content-Type", "application/json")

	resp, err := c.client.Do(httpReq)
	if err != nil {
		return "", err
	}
	defer resp.Body.Close()

	var result GenerateResponse
	json.NewDecoder(resp.Body).Decode(&result)

	return result.Response, nil
}

func (c *OllamaClient) GenerateStream(ctx context.Context, model, prompt string, callback func(string)) error {
	req := GenerateRequest{
		Model:  model,
		Prompt: prompt,
		Stream: true,
	}

	payload, _ := json.Marshal(req)

	httpReq, _ := http.NewRequestWithContext(
		ctx,
		"POST",
		c.baseURL+"/api/generate",
		bytes.NewBuffer(payload),
	)
	httpReq.Header.Set("Content-Type", "application/json")

	resp, err := c.client.Do(httpReq)
	if err != nil {
		return err
	}
	defer resp.Body.Close()

	scanner := bufio.NewScanner(resp.Body)
	for scanner.Scan() {
		var result GenerateResponse
		json.Unmarshal(scanner.Bytes(), &result)
		callback(result.Response)
	}

	return scanner.Err()
}

func main() {
	client := NewOllamaClient("http://localhost:11434")

	// Simple generation
	ctx, cancel := context.WithTimeout(context.Background(), 2*time.Minute)
	defer cancel()

	response, err := client.Generate(ctx, "mistral", "Explain quantum computing briefly")
	if err != nil {
		fmt.Println("Error:", err)
		return
	}
	fmt.Println(response)

	// Streaming with callback
	client.GenerateStream(ctx, "mistral", "Count to 5", func(chunk string) {
		fmt.Print(chunk)
	})
}

This client wraps the HTTP complexity and gives us a clean API to work with.

Using the Official ollama-go Library

If you prefer not to roll your own, there’s an official Go client library. Install it with:

go get github.com/ollama/ollama

Then use it like this:

package main

import (
	"context"
	"fmt"

	"github.com/ollama/ollama/api"
)

func main() {
	client, err := api.NewClient("http://localhost:11434", nil)
	if err != nil {
		fmt.Println("Error:", err)
		return
	}

	req := &api.GenerateRequest{
		Model:  "mistral",
		Prompt: "Hello, how are you?",
		Stream: false,
	}

	response, err := client.Generate(context.Background(), req)
	if err != nil {
		fmt.Println("Error:", err)
		return
	}

	fmt.Println(response)
}

Much simpler if you’re comfortable with external dependencies.

Handling Errors and Timeouts

Long-running generations can take time. Use context.WithTimeout to avoid hanging forever:

package main

import (
	"bytes"
	"context"
	"encoding/json"
	"fmt"
	"net/http"
	"time"
)

func generateWithTimeout(model, prompt string, timeout time.Duration) (string, error) {
	ctx, cancel := context.WithTimeout(context.Background(), timeout)
	defer cancel()

	payload := map[string]interface{}{
		"model":  model,
		"prompt": prompt,
		"stream": false,
	}

	payloadBytes, _ := json.Marshal(payload)

	req, _ := http.NewRequestWithContext(
		ctx,
		"POST",
		"http://localhost:11434/api/generate",
		bytes.NewBuffer(payloadBytes),
	)
	req.Header.Set("Content-Type", "application/json")

	client := &http.Client{}
	resp, err := client.Do(req)
	if err != nil {
		if ctx.Err() == context.DeadlineExceeded {
			return "", fmt.Errorf("generation timed out after %v", timeout)
		}
		return "", err
	}
	defer resp.Body.Close()

	var result map[string]interface{}
	json.NewDecoder(resp.Body).Decode(&result)

	return result["response"].(string), nil
}

func main() {
	response, err := generateWithTimeout("mistral", "Tell me a joke", 30*time.Second)
	if err != nil {
		fmt.Println("Error:", err)
		return
	}
	fmt.Println(response)
}

This approach prevents your application from hanging if Ollama becomes unresponsive.

Building a Chatbot API Server

Let’s tie everything together and create a simple HTTP server that wraps Ollama as a chatbot API:

package main

import (
	"encoding/json"
	"fmt"
	"net/http"
	"sync"
)

type Message struct {
	Role    string `json:"role"`
	Content string `json:"content"`
}

type ChatRequest struct {
	Message string `json:"message"`
}

type ChatResponse struct {
	Reply string `json:"reply"`
}

var (
	conversationHistory []Message
	mu                  sync.Mutex
)

func chatHandler(w http.ResponseWriter, r *http.Request) {
	if r.Method != http.MethodPost {
		http.Error(w, "Method not allowed", http.StatusMethodNotAllowed)
		return
	}

	var req ChatRequest
	json.NewDecoder(r.Body).Decode(&req)

	mu.Lock()
	conversationHistory = append(conversationHistory, Message{
		Role:    "user",
		Content: req.Message,
	})
	mu.Unlock()

	// Build the chat request with full history
	chatReq := map[string]interface{}{
		"model":    "mistral",
		"messages": conversationHistory,
		"stream":   false,
	}

	payloadBytes, _ := json.Marshal(chatReq)

	resp, err := http.Post(
		"http://localhost:11434/api/chat",
		"application/json",
		bytes.NewBuffer(payloadBytes),
	)
	if err != nil {
		http.Error(w, "Ollama error", http.StatusInternalServerError)
		return
	}
	defer resp.Body.Close()

	var ollamaResp map[string]interface{}
	json.NewDecoder(resp.Body).Decode(&ollamaResp)

	message := ollamaResp["message"].(map[string]interface{})
	assistantResponse := message["content"].(string)

	mu.Lock()
	conversationHistory = append(conversationHistory, Message{
		Role:    "assistant",
		Content: assistantResponse,
	})
	mu.Unlock()

	w.Header().Set("Content-Type", "application/json")
	json.NewEncoder(w).Encode(ChatResponse{Reply: assistantResponse})
}

func main() {
	http.HandleFunc("/chat", chatHandler)
	fmt.Println("Server running on :8080")
	http.ListenAndServe(":8080", nil)
}

You can test this with:

curl -X POST http://localhost:8080/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello! What can you do?"}'

The server maintains conversation history, so follow-up questions will include context.

Wrapping Up

You now know how to work with Ollama from Go. We covered the basics of making requests, handling streaming responses, building a reusable client, and creating a practical API server.

From here, you could add features like model selection, persistent conversation storage, rate limiting, or integration with your existing Go applications. The possibilities are endless.

Remember to always handle timeouts and errors gracefully, and you’ll have a solid foundation for building AI-powered Go applications.

Happy coding!