Self-Hosted

This guide will walk you through installing and running simba on your local system using both pip, git or docker

you can choose the method that suits you best, if you want to use the SDK for free, we recommand using the pip installation method, if you want to have more control over the source code we recommand installing the full system. If you want to use the prebuilt solution, we recommand docker.

Installation Methods

Install simba-core

simba-core is the PyPi package that contains the server logic and API, it is necessary to run it to be able to use the SDK

pip install simba-core

To install the dependencies faster we recommand using uv

pip install uv 
uv pip install simba-core

Create a config.yaml file

The config.yaml file is one of the most important files of this setup, because it’s what will parameter the Embedding model, vector store type, retreival strategy , database, worker celery for parsing and also the llm you’re using

Go to your project root and modify config.yaml, you can get inspired from this one below

project:
  name: "Simba"
  version: "1.0.0"
  api_version: "/api/v1"

paths:
  base_dir: null  # Will be set programmatically
  faiss_index_dir: "vector_stores/faiss_index"
  vector_store_dir: "vector_stores"

llm:
  provider: "openai" #OPTIONS:ollama,openai
  model_name: "gpt-4o-mini"
  temperature: 0.0
  max_tokens: null
  streaming: true
  additional_params: {}

embedding:
  provider: "huggingface"
  model_name: "BAAI/bge-base-en-v1.5"
  device: "cpu"  # OPTIONS: cpu,cuda,mps
  additional_params: {}

vector_store:
  provider: "faiss"
  collection_name: "simba_collection"

  additional_params: {}

chunking:
  chunk_size: 512
  chunk_overlap: 200

retrieval:
  method: "hybrid" # OPTIONS: default, semantic, keyword, hybrid, ensemble, reranked
  k: 5
  # Method-specific parameters
  params:
    # Semantic retrieval parameters
    score_threshold: 0.5
    
    # Hybrid retrieval parameters
    prioritize_semantic: true
    
    # Ensemble retrieval parameters
    weights: [0.7, 0.3]  # Weights for semantic and keyword retrievers
    
    # Reranking parameters
    reranker_model: colbert
    reranker_threshold: 0.7

# Database configuration
database:
  provider: litedb # Options: litedb, sqlite
  additional_params: {}

celery: 
  broker_url: ${CELERY_BROKER_URL:-redis://redis:6379/0}
  result_backend: ${CELERY_RESULT_BACKEND:-redis://redis:6379/1}

The config file should be at the same place where your running simba, otherwise that’s not going to work

Create .env file

If you need to use openai, or mistral AI, or you want to log the chatbot traces using langsmith, or use ollama, you should specify it in your .env

OPENAI_API_KEY=your_openai_api_key  #(optional)
MISTRAL_API_KEY=your_mistral_api_key #(optional)
LANGCHAIN_TRACING_V2=true #(optional) 
LANGCHAIN_API_KEY=your_langchain_api_key (#optional)
REDIS_HOST=localhost 
CELERY_BROKER_URL=redis://localhost:6379/0
CELERY_RESULT_BACKEND=redis://localhost:6379/1

Run the server

Now that you have your .env, and config.yaml, you can run the following command

simba server

This will start the server at http://localhost:8000. You will see a logging message in the console

Starting Simba server...
INFO:     Started server process [62940]
INFO:     Waiting for application startup.
2025-03-12 16:42:50 - simba.__main__ - INFO - ==================================================
2025-03-12 16:42:50 - simba.__main__ - INFO - Starting SIMBA Application
2025-03-12 16:42:50 - simba.__main__ - INFO - ==================================================
2025-03-12 16:42:50 - simba.__main__ - INFO - Project Name: Simba
2025-03-12 16:42:50 - simba.__main__ - INFO - Version: 1.0.0
2025-03-12 16:42:50 - simba.__main__ - INFO - LLM Provider: openai
2025-03-12 16:42:50 - simba.__main__ - INFO - LLM Model: gpt-4o
2025-03-12 16:42:50 - simba.__main__ - INFO - Embedding Provider: huggingface
2025-03-12 16:42:50 - simba.__main__ - INFO - Embedding Model: BAAI/bge-base-en-v1.5
2025-03-12 16:42:50 - simba.__main__ - INFO - Embedding Device: mps
2025-03-12 16:42:50 - simba.__main__ - INFO - Vector Store Provider: faiss
2025-03-12 16:42:50 - simba.__main__ - INFO - Database Provider: litedb
2025-03-12 16:42:50 - simba.__main__ - INFO - Retrieval Method: hybrid
2025-03-12 16:42:50 - simba.__main__ - INFO - Retrieval Top-K: 5
2025-03-12 16:42:50 - simba.__main__ - INFO - Base Directory: /Users/mac/Documents/simba
2025-03-12 16:42:50 - simba.__main__ - INFO - Upload Directory: /Users/mac/Documents/simba/uploads
2025-03-12 16:42:50 - simba.__main__ - INFO - Vector Store Directory: /Users/mac/Documents/simba/vector_stores
2025-03-12 16:42:50 - simba.__main__ - INFO - ==================================================
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

Install SDK

You can now install the SDK to start using simba SDK in local mode

pip install simba-client

Basic usage

from simba_sdk import SimbaClient
      
client = SimbaClient(api_url="http://localhost:8000") 
document = client.documents.create(file_path="path/to/your/document.pdf")
document_id = document[0]["id"]
      
parsing_result = client.parser.parse_document(document_id,parser="docling", sync=True)
      
retrieval_results = client.retriever.retrieve(query="your-query")
      
for result in retrieval_results["documents"]:
  print(f"Content: {result['page_content']}")
  print(f"Metadata: {result['metadata']['source']}")
  print("====" * 10)

Install simba-core

simba-core is the PyPi package that contains the server logic and API, it is necessary to run it to be able to use the SDK

pip install simba-core

To install the dependencies faster we recommand using uv

pip install uv 
uv pip install simba-core

Create a config.yaml file

Go to your project root and modify config.yaml, you can get inspired from this one below

project:
  name: "Simba"
  version: "1.0.0"
  api_version: "/api/v1"

paths:
  base_dir: null  # Will be set programmatically
  faiss_index_dir: "vector_stores/faiss_index"
  vector_store_dir: "vector_stores"

llm:
  provider: "openai" #OPTIONS:ollama,openai
  model_name: "gpt-4o-mini"
  temperature: 0.0
  max_tokens: null
  streaming: true
  additional_params: {}

embedding:
  provider: "huggingface"
  model_name: "BAAI/bge-base-en-v1.5"
  device: "cpu"  # OPTIONS: cpu,cuda,mps
  additional_params: {}

vector_store:
  provider: "faiss"
  collection_name: "simba_collection"

  additional_params: {}

chunking:
  chunk_size: 512
  chunk_overlap: 200

retrieval:
  method: "hybrid" # OPTIONS: default, semantic, keyword, hybrid, ensemble, reranked
  k: 5
  # Method-specific parameters
  params:
    # Semantic retrieval parameters
    score_threshold: 0.5
    
    # Hybrid retrieval parameters
    prioritize_semantic: true
    
    # Ensemble retrieval parameters
    weights: [0.7, 0.3]  # Weights for semantic and keyword retrievers
    
    # Reranking parameters
    reranker_model: colbert
    reranker_threshold: 0.7

# Database configuration
database:
  provider: litedb # Options: litedb, sqlite
  additional_params: {}

celery: 
  broker_url: ${CELERY_BROKER_URL:-redis://redis:6379/0}
  result_backend: ${CELERY_RESULT_BACKEND:-redis://redis:6379/1}

The config file should be at the same place where your running simba, otherwise that’s not going to work

Create .env file

If you need to use openai, or mistral AI, or you want to log the chatbot traces using langsmith, or use ollama, you should specify it in your .env

OPENAI_API_KEY=your_openai_api_key  #(optional)
MISTRAL_API_KEY=your_mistral_api_key #(optional)
LANGCHAIN_TRACING_V2=true #(optional) 
LANGCHAIN_API_KEY=your_langchain_api_key (#optional)
REDIS_HOST=localhost 
CELERY_BROKER_URL=redis://localhost:6379/0
CELERY_RESULT_BACKEND=redis://localhost:6379/1

Run the server

Now that you have your .env, and config.yaml, you can run the following command

simba server

This will start the server at http://localhost:8000. You will see a logging message in the console

Starting Simba server...
INFO:     Started server process [62940]
INFO:     Waiting for application startup.
2025-03-12 16:42:50 - simba.__main__ - INFO - ==================================================
2025-03-12 16:42:50 - simba.__main__ - INFO - Starting SIMBA Application
2025-03-12 16:42:50 - simba.__main__ - INFO - ==================================================
2025-03-12 16:42:50 - simba.__main__ - INFO - Project Name: Simba
2025-03-12 16:42:50 - simba.__main__ - INFO - Version: 1.0.0
2025-03-12 16:42:50 - simba.__main__ - INFO - LLM Provider: openai
2025-03-12 16:42:50 - simba.__main__ - INFO - LLM Model: gpt-4o
2025-03-12 16:42:50 - simba.__main__ - INFO - Embedding Provider: huggingface
2025-03-12 16:42:50 - simba.__main__ - INFO - Embedding Model: BAAI/bge-base-en-v1.5
2025-03-12 16:42:50 - simba.__main__ - INFO - Embedding Device: mps
2025-03-12 16:42:50 - simba.__main__ - INFO - Vector Store Provider: faiss
2025-03-12 16:42:50 - simba.__main__ - INFO - Database Provider: litedb
2025-03-12 16:42:50 - simba.__main__ - INFO - Retrieval Method: hybrid
2025-03-12 16:42:50 - simba.__main__ - INFO - Retrieval Top-K: 5
2025-03-12 16:42:50 - simba.__main__ - INFO - Base Directory: /Users/mac/Documents/simba
2025-03-12 16:42:50 - simba.__main__ - INFO - Upload Directory: /Users/mac/Documents/simba/uploads
2025-03-12 16:42:50 - simba.__main__ - INFO - Vector Store Directory: /Users/mac/Documents/simba/vector_stores
2025-03-12 16:42:50 - simba.__main__ - INFO - ==================================================
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

Install SDK

You can now install the SDK to start using simba SDK in local mode

pip install simba-client

Basic usage

from simba_sdk import SimbaClient
      
client = SimbaClient(api_url="http://localhost:8000") 
document = client.documents.create(file_path="path/to/your/document.pdf")
document_id = document[0]["id"]
      
parsing_result = client.parser.parse_document(document_id,parser="docling", sync=True)
      
retrieval_results = client.retriever.retrieve(query="your-query")
      
for result in retrieval_results["documents"]:
  print(f"Content: {result['page_content']}")
  print(f"Metadata: {result['metadata']['source']}")
  print("====" * 10)

Clone the repository

For a complete installation including the backend and frontend

git clone https://github.com/GitHamza0206/simba.git
cd simba

Backend installation

Simba uses Poetry for dependency management

curl -sSL https://install.python-poetry.org | python3 -

then install the virtual environement and activate it

poetry config virtualenvs.in-project true
poetry install
source .venv/bin/activate

It should activate the python simba environement, also if you’re using vscode IDE, make sure to update your python Interpreter

Modify the config.yaml file

Go to your project root and create config.yaml, you can get inspired from this one below

project:
  name: "Simba"
  version: "1.0.0"
  api_version: "/api/v1"

paths:
  base_dir: null  # Will be set programmatically
  faiss_index_dir: "vector_stores/faiss_index"
  vector_store_dir: "vector_stores"

llm:
  provider: "openai" #OPTIONS:ollama,openai
  model_name: "gpt-4o-mini"
  temperature: 0.0
  max_tokens: null
  streaming: true
  additional_params: {}

embedding:
  provider: "huggingface"
  model_name: "BAAI/bge-base-en-v1.5"
  device: "cpu"  # OPTIONS: cpu,cuda,mps
  additional_params: {}

vector_store:
  provider: "faiss"
  collection_name: "simba_collection"

  additional_params: {}

chunking:
  chunk_size: 512
  chunk_overlap: 200

retrieval:
  method: "hybrid" # OPTIONS: default, semantic, keyword, hybrid, ensemble, reranked
  k: 5
  # Method-specific parameters
  params:
    # Semantic retrieval parameters
    score_threshold: 0.5
    
    # Hybrid retrieval parameters
    prioritize_semantic: true
    
    # Ensemble retrieval parameters
    weights: [0.7, 0.3]  # Weights for semantic and keyword retrievers
    
    # Reranking parameters
    reranker_model: colbert
    reranker_threshold: 0.7

# Database configuration
database:
  provider: litedb # Options: litedb, sqlite
  additional_params: {}

celery: 
  broker_url: ${CELERY_BROKER_URL:-redis://redis:6379/0}
  result_backend: ${CELERY_RESULT_BACKEND:-redis://redis:6379/1}

Create .env

If you need to use openai, or mistral AI, or you want to log the chatbot traces using langsmith, or use ollama, you should specify it in your .env

OPENAI_API_KEY=your_openai_api_key  #(optional)
MISTRAL_API_KEY=your_mistral_api_key #(optional)
LANGCHAIN_TRACING_V2=true #(optional) 
LANGCHAIN_API_KEY=your_langchain_api_key (#optional)
REDIS_HOST=localhost 
CELERY_BROKER_URL=redis://localhost:6379/0
CELERY_RESULT_BACKEND=redis://localhost:6379/1

Run the server

simba server

This will start the server at http://localhost:8000. You will see a logging message in the console

Starting Simba server...
INFO:     Started server process [62940]
INFO:     Waiting for application startup.
2025-03-12 16:42:50 - simba.__main__ - INFO - ==================================================
2025-03-12 16:42:50 - simba.__main__ - INFO - Starting SIMBA Application
2025-03-12 16:42:50 - simba.__main__ - INFO - ==================================================
2025-03-12 16:42:50 - simba.__main__ - INFO - Project Name: Simba
2025-03-12 16:42:50 - simba.__main__ - INFO - Version: 1.0.0
2025-03-12 16:42:50 - simba.__main__ - INFO - LLM Provider: openai
2025-03-12 16:42:50 - simba.__main__ - INFO - LLM Model: gpt-4o
2025-03-12 16:42:50 - simba.__main__ - INFO - Embedding Provider: huggingface
2025-03-12 16:42:50 - simba.__main__ - INFO - Embedding Model: BAAI/bge-base-en-v1.5
2025-03-12 16:42:50 - simba.__main__ - INFO - Embedding Device: mps
2025-03-12 16:42:50 - simba.__main__ - INFO - Vector Store Provider: faiss
2025-03-12 16:42:50 - simba.__main__ - INFO - Database Provider: litedb
2025-03-12 16:42:50 - simba.__main__ - INFO - Retrieval Method: hybrid
2025-03-12 16:42:50 - simba.__main__ - INFO - Retrieval Top-K: 5
2025-03-12 16:42:50 - simba.__main__ - INFO - Base Directory: /Users/mac/Documents/simba
2025-03-12 16:42:50 - simba.__main__ - INFO - Upload Directory: /Users/mac/Documents/simba/uploads
2025-03-12 16:42:50 - simba.__main__ - INFO - Vector Store Directory: /Users/mac/Documents/simba/vector_stores
2025-03-12 16:42:50 - simba.__main__ - INFO - ==================================================
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

Run the front UI

you can run the frontend by running

simba front

or navigate to /frontend and run

cd /frontend
npm install 
npm run dev

then you should see your local instance at http://localhost:5173 

Run Parsers

If you want to enable document parsers, you should start the celery worker instance, this is necessary if you want to run docling parser. Celery requires redis , to start redis you have to open a terminal and run

redis-server

Once redis is running you can open a new terminal and run

simba parsers

Docker setup

we use makefile to build simba, this is easiest setup

Clone repository

git clone https://github.com/GitHamza0206/simba.git
cd simba

Create a config.yaml file

Go to your project root and create config.yaml, you can get inspired from this one below

project:
  name: "Simba"
  version: "1.0.0"
  api_version: "/api/v1"

paths:
  base_dir: null  # Will be set programmatically
  faiss_index_dir: "vector_stores/faiss_index"
  vector_store_dir: "vector_stores"

llm:
  provider: "openai" #OPTIONS:ollama,openai
  model_name: "gpt-4o-mini"
  temperature: 0.0
  max_tokens: null
  streaming: true
  additional_params: {}

embedding:
  provider: "huggingface"
  model_name: "BAAI/bge-base-en-v1.5"
  device: "cpu"  # OPTIONS: cpu,cuda,mps
  additional_params: {}

vector_store:
  provider: "faiss"
  collection_name: "simba_collection"

  additional_params: {}

chunking:
  chunk_size: 512
  chunk_overlap: 200

retrieval:
  method: "hybrid" # OPTIONS: default, semantic, keyword, hybrid, ensemble, reranked
  k: 5
  # Method-specific parameters
  params:
    # Semantic retrieval parameters
    score_threshold: 0.5
    
    # Hybrid retrieval parameters
    prioritize_semantic: true
    
    # Ensemble retrieval parameters
    weights: [0.7, 0.3]  # Weights for semantic and keyword retrievers
    
    # Reranking parameters
    reranker_model: colbert
    reranker_threshold: 0.7

# Database configuration
database:
  provider: litedb # Options: litedb, sqlite
  additional_params: {}

celery: 
  broker_url: ${CELERY_BROKER_URL:-redis://redis:6379/0}
  result_backend: ${CELERY_RESULT_BACKEND:-redis://redis:6379/1}

Create .env file

If you need to use openai, or mistral AI, or you want to log the chatbot traces using langsmith, or use ollama, you should specify it in your .env

OPENAI_API_KEY=your_openai_api_key  #(optional)
MISTRAL_API_KEY=your_mistral_api_key #(optional)
LANGCHAIN_TRACING_V2=true #(optional) 
LANGCHAIN_API_KEY=your_langchain_api_key (#optional)
REDIS_HOST=localhost 
CELERY_BROKER_URL=redis://localhost:6379/0
CELERY_RESULT_BACKEND=redis://localhost:6379/1

Build & up images

# Build the Docker image
DEVICE=cpu make build
# Start the Docker container
DEVICE=cpu make up

Build & up images (With ollama)

# Build the Docker image
ENABLE_OLLAMA=True DEVICE=cpu make build
# Start the Docker container
ENABLE_OLLAMA=True DEVICE=cpu make up

This will start:

The Simba backend API
Redis for caching and task queue
Celery workers for parsing tasks
The Simba frontend UI

All services will be properly configured to work together.

To stop the services:

make down

You can find more information about Docker setup here: Docker Setup

Dependencies

Simba has the following key dependencies:

Core Dependencies

Vector Store Support

Embedding Models

Frontend

Troubleshooting

to be added…

Next Steps

Once you have Simba installed, proceed to:

Getting Started

Core Concepts

Installation Methods

Docker setup

Dependencies

Troubleshooting

Next Steps

Getting Started

Core Concepts

​Installation Methods

​Docker setup

​

​Dependencies

​Troubleshooting

​Next Steps

Installation Methods

Docker setup

Dependencies

Troubleshooting

Next Steps