Options
To adjust these options, edit:
~/.config/chat-script/chat-script.ini
App
Option | Desc | Default |
---|---|---|
share | Whether to create a publicly shareable link for the gradio app | False |
server_name | IP address that local app is deployed at | 127.0.0.1 |
server_port | Port that local app is deployed at | 7860 |
inbrowser | Whether to automatically launch the gradio app in a new tab on the default browser | True |
Chain
Option | Desc | Default |
---|---|---|
embeddings_model | Name of Ollama LLM used to generate embeddings | mxbai-embed-large |
chat_model | Name of Ollama LLM used to generate responses | mistral |
moderation_model | Name of Ollama LLM used to moderate queries | llama-guard3:1b |
embeddings_url | URL of Ollama LLM used to generate embeddings | http://localhost:11434 |
chat_url | URL of Ollama LLM used to generate responses | http://localhost:11434 |
moderation_url | URL of Ollama LLM used to moderate queries | http://localhost:11434 |
show_progress | Whether to display embeddings model batch progress | False |
keep_alive | How long the model will stay loaded into memory | 5m |
temperature | The temperature of the chat model. Increasing the temperature will make the model answer more creatively | 0.6 |
top-k | Reduces the probability of generating nonsense. A higher value will give more diverse answers, while a lower value will be more conservative | 30 |
top-p | Works together with top-k. A higher value will lead to more diverse text, while a lower value will generate more conservative text | 0.7 |
collection_name | Name of local document collection | rag-chroma |
top_n_results | Amount of documents to return | 3 |
rag_fusion | Whether to enable rag-fusion, an advanced rag technique that may improve semantic search relevance | True |
num_queries | Number of synthetic queries to generate for rag-fusion | 2 |
top_n_results_fusion | Maximum amount of documents to return for rag-fusion (maximum, as unique union is taken) | 2 |
embeddings_gpu | Whether to use the GPU when generating embeddings (on devices with <8GB VRAM, setting to False can reduce latency) | True |
Embeddings
Option | Desc | Default |
---|---|---|
embeddings_model | Name of Ollama LLM used to generate embeddings | mxbai-embed-large |
embeddings_url | URL of Ollama LLM used to generate embeddings | http://localhost:11434 |
show_progress | Whether to display document loading and embeddings model batch progress | True |
collection_name | Name of local document collection | rag-chroma |
use_multithreading | Whether to enable CPU multithreading for loading documents | True |
chunk_size | Number of tokens in each split document chunk | 250 |
chunk_overlap | Number of tokens shared between consecutive split document chunks | 50 |
batch_size | Maximum number of split documents in each embeddings batch | 41666 |
Response
Option | Desc | Default |
---|---|---|
context_stream_delay | Amount of time in s to delay tokens streamed for non-LLM text (sources, moderation notice) | 0.075 |
max_history | Maximum number of previous user messages to include as context | 2 |
print_state | Whether to print app state for each query. Includes: IP address, chat history, and context | True |
moderate | Whether to moderate user queries before allowing responses. Prints IP address and offending query even if print_state is false (allows for privacy-preserving moderation) | False |
moderate_alert | Whether to display system alerts when an unsafe question is received (Linux-only) | False |