RAG Chat Application Using Ollama

RAG Chat Application: A local AI-powered chat system that retrieves relevant health tips from documents and provides context-aware responses. It uses PostgreSQL + pgvector for semantic search, enabling fast and accurate information retrieval from your health knowledge base.

How To Run In Local:

Make sure docker is running in local.
Use these commands in terminal;

1)  ./start-ollama.sh
2) docker compose build
3) docker compose up -d
4) mvn spring-boot:run

Chat API Endpoint:

http://localhost:8087/mysbragapp/actuator/health
http://localhost:8087/mysbragapp/api/rag_chat/healthy_tips

1. Ollama

Ollama is a local LLM (Large Language Model) server that runs models on your machine. Think of it as a local AI API server—similar to OpenAI—but everything runs locally instead of in the cloud.

Key Features:

Hosts multiple models (chat and embedding models).
Provides endpoints like /api/chat (text generation) and /api/embed (vector embeddings).
Can be run via Docker or installed natively.
Handles model execution on your local CPU/GPU.

2. `nomic-embed-text` (Embedding Model)

Converts text into vector embeddings for semantic search.
Use in RAG App: Called via /api/embed to create embeddings for document chunks.
Vector dimension: 768 (default if the model cannot report it).
Memory footprint: ~1–1.5 GB; recommended for local setups.

3. `mistral` (Chat / LLM Model)

Generates context-aware responses based on prompts.
Use in RAG App: Called via /api/chat to answer user queries using retrieved embeddings.
Memory requirement: Typically 4–6 GB or more depending on model size.

4. How They Work Together

Component	Model	Endpoint	Purpose
Embedding	`nomic-embed-text`	`/api/embed`	Convert text chunks into vectors for storing in pgvector
Chat / LLM	`mistral`	`/api/chat`	Generate answers using retrieved embeddings from the vector store
Vector Store	pgvector in PostgreSQL	N/A	Store and search embeddings for similarity search

Analogy:

Embeddings → “index” your documents like a search engine
Chat → “answer” your questions using the indexed documents

5. Common Issues & Solutions (Local RAG)

#	Error / Issue	Context	Root Cause	Recommended Solution
1	`model "mxbai-embed-large" not found (404)`	Startup / embedding	Spring AI defaults to missing model	Set embedding model: `spring.ai.ollama.embedding.model=nomic-embed-text`
2	`model requires more system memory (4.5 GiB)`	Document ingestion	Embedding model too large for available RAM	Use smaller embedding model (`nomic-embed-text`) or increase Docker/Ollama memory
3	`expected 1024 dimensions, not 768`	PgVector insert	PgVector table schema mismatched	Drop & recreate table or alter column to `vector(768)`
4	`model requires more system memory (4.5 GiB)`	Chat API	Chat model too large for available RAM	Switch to smaller chat model (`llama3.2:3b`, `phi3:mini`)
5	Repeated `Retry count: 1,2,3…`	Runtime logs	Spring AI retrying on non-recoverable errors	Disable retries for local use: `spring.ai.retry.enabled=false`
6	Confusion about memory location	Troubleshooting	Ollama uses native system/Docker RAM, not JVM heap	Do not tune JVM heap; adjust Ollama model size or Docker memory
7	`docker exec ... ollama pull ...` fails	Model pull	Ollama container not running	Verify Ollama runtime (Docker vs native) and pull model accordingly

6. Model & Vector Dimension Reference

Embedding Model	Vector Dimension	Approx. RAM	Notes
`mxbai-embed-large`	1536	~4.5 GB	Too large for local laptop memory (~3.7 GB)
`nomic-embed-text`	768	~1–1.5 GB	Recommended for local use; fits PgVector 768 schema
`openai-text-embedding-3-large`	1536	~3–4 GB	Cloud API; not for local Ollama
`openai-text-embedding-3-small`	1024	~2 GB	Smaller, less accurate; for limited memory

PgVector Table Mapping:

Embedding Model	PgVector `vector(n)`
`nomic-embed-text`	`vector(768)`
`mxbai-embed-large`	`vector(1536)`

7. Observed Runtime Errors

Error Type / Exception	HTTP / System Code	Context	Root Cause	Recommended Fix
`ResourceAccessException`	N/A	POST `/api/embed`	Ollama not running	Start Ollama or check port
`ConnectException` / `ClosedChannelException`	N/A	POST `/api/embed`	Network/socket issue	Ensure Ollama is accessible from Spring Boot
`NonTransientAiException`	404	Embedding call	Missing model `"mxbai-embed-large"`	Pull model or switch to `nomic-embed-text`
`TransientAiException`	500	Embedding call	Model memory exceeds available RAM	Increase system/Docker memory or use lighter model
`WARN` fallback	N/A	PgVectorStore init	Failed to get embedding dims	Check embedding API & model config
ApplicationContext startup failure	N/A	Spring Boot init	Embedding model failure	Fix Ollama model/API availability

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.mvn/wrapper		.mvn/wrapper
req_resp_logs		req_resp_logs
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
Vector_DB_DataModel.png		Vector_DB_DataModel.png
compose.yaml		compose.yaml
mvnw		mvnw
mvnw.cmd		mvnw.cmd
ollama-rag-service.png		ollama-rag-service.png
pom.xml		pom.xml
start-ollama.sh		start-ollama.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Chat Application Using Ollama

1. Ollama

2. `nomic-embed-text` (Embedding Model)

3. `mistral` (Chat / LLM Model)

4. How They Work Together

5. Common Issues & Solutions (Local RAG)

6. Model & Vector Dimension Reference

7. Observed Runtime Errors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG Chat Application Using Ollama

1. Ollama

2. nomic-embed-text (Embedding Model)

3. mistral (Chat / LLM Model)

4. How They Work Together

5. Common Issues & Solutions (Local RAG)

6. Model & Vector Dimension Reference

7. Observed Runtime Errors

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2. `nomic-embed-text` (Embedding Model)

3. `mistral` (Chat / LLM Model)

Packages