AI RAG with Yugabyte
- kyle Hailey
- 2 days ago
- 7 min read
Updated: 15 minutes ago

One of the more compelling applications of AI today is in frontline customer support.
Virtually every company that sells a product or service has some form of customer support. And almost universally, support teams operate under budget constraints—whether in time, staffing, or cost. What if we could offer better support for less cost? That’s where AI, and more specifically Retrieval-Augmented Generation (RAG), enters the picture.
But how do we get our own support data into the AI?
That’s the real question.
Support content typically lives across many places: public documentation, internal knowledge bases, Slack threads, support tickets, and more. To leverage this content effectively, we need to vectorize it—convert it into embeddings that preserve semantic meaning and can be searched efficiently.
And for many companies, storing this data externally is a non-negotiable. Hosting your own vector database ensures control, privacy, and security—key requirements for enterprise adoption.
Here’s how we can do it:
We transform all our support content into vector embeddings and store them in a PostgreSQL-compatible vector database like YugabyteDB, which supports hybrid transactional and vector workloads. Then, when a user asks a question, we generate an embedding for that question, search our internal vector store for semantically similar content, and pass those results as context to the LLM to generate accurate, relevant responses.
This is the essence of RAG.
A Practical Example
Below is a simple example that:
1. Loads a directory of files (./data/paul_graham)
2. Splits them into chunks
3. Vectorizes the chunks using OpenAI embeddings
4. Stores each chunk’s ID, text, and vector in a Yugabyte table called vectors
This exact same setup could be used for real support content—internal docs, chat logs, email threads, etc.—instead of sample essays.
All you need is an OpenAI API key (exported as OPENAI_API_KEY), and a running YugabyteDB instance with vector support enabled.
Once loaded, you can query the vector table with any user question, retrieve the most relevant matches, and use them to feed a large language model like GPT-4.
NOTE:
This same RAG (Retrieval-Augmented Generation) approach isn’t limited to just “Ask Your Support Knowledge Base” scenarios. It can also be applied to a wide range of enterprise use cases, including:
• Semantic search
• Recommendations (for products, services, advice, and more)
• Personalization
• Fraud detection
RAG is a versatile framework that can enhance decision-making and user experience across many business domains.
Github repo at: https://github.com/kyle-hailey/rag_example/blob/main/README.md
I created a Yugabyte database here: https://cloud.yugabyte.com/
using the last available version 2.25.1 and enabled the vector extention and created a table for the vector embeddings:
CREATE EXTENSION vector;
CREATE TABLE vectors (
id TEXT PRIMARY KEY,
article_text TEXT,
embedding VECTOR(1536)
);
CREATE INDEX NONCONCURRENTLY ON vectors USING ybhnsw (embedding vector_cosine_ops);
see: Yugabyte Vector Docs
You can connect to the database from the web console or you can install psql (or other tools like DBeaver)
I installed psql on my mac with "brew install postgresql@15"
The connected with
"psql -h hostname -p 5433 -U yugabyte"
on EC2 I installed with "sudo dnf install -y postgresql15-server postgresql15-contrib"
I used Python 3.9
Setup
python3.9 -m venv aiblog
pip install llama-index
source aiblog/bin/activate
cd aiblog
pip install llama-index
pip install psycopg2
OPENAI_API_KEY='your openAI key'
# in ./aiblog/data I have a file about "paul_graham"
# you could put any textual data that in ./aiblog/data
# that you want to suppliment the LLM retrieval with
# I only needed llama-index and psycopg2 packages but you
# can install my full enviroment with requirements.txt from
# the github repo
# https://github.com/kyle-hailey/rag_example
pip install -r requirements.txt
vi insert.py
Pyhton program to store documentation as embeddings, i.e. vectorize text data and store in a vector field in the database for later lookup.
SimpleDirectoryReader() used to read the data can use all sorts of formats from text to pdf, to word, to jpeg, mp3 etc
insert.py
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.schema import Document
import openai
import psycopg2
connection_string = "postgresql://yugabyte:password@127.0.0.1:5433/yugabyte"
# Try to connect and give feedback
try:
conn = psycopg2.connect(connection_string)
cursor = conn.cursor()
print("✅ Successfully connected to the database.\n")
except Exception as e:
print("❌ Failed to connect to the database.")
print("Error:", e)
exit(1)
# Load documents and create index
print("📄 Loading documents...")
documents = SimpleDirectoryReader("./data").load_data()
print(f"📦 Loaded {len(documents)} documents.\n")
print("🔍 Vectorizing documents...")
index = VectorStoreIndex.from_documents(documents)
print("✅ Vectorization complete.\n")
# Insert documents with clean feedback
for doc_id, doc in index.docstore.docs.items():
embedding = index._embed_model.get_text_embedding(doc.text)
embedding_str = "[" + ",".join(map(str, embedding)) + "]"
insert_sql = """
INSERT INTO vectors (id, article_text, embedding)
VALUES (%s, %s, %s)
ON CONFLICT (id) DO NOTHING;
"""
try:
cursor.execute(insert_sql, (doc_id, doc.text, embedding_str))
conn.commit()
text_snippet = doc.text[:40].replace("\n", " ").strip()
print(f"📥 {len(doc.text):4d} chars | \"{text_snippet}\" | { [round(v, 4) for v in embedding[:5]] }")
except Exception as e:
print(f"❌ Failed to insert row: {e}")
print("\n🎉 Done inserting all data.")
cursor.close()
conn.close()
Results running the program
$ python insert.py
✅ Successfully connected to the database.
📄 Loading documents...
📦 Loaded 1 documents.
🔍 Vectorizing documents...
✅ Vectorization complete.
📥 4170 chars | "What I Worked On February 2021 Before" | [0.0041,
📥 4325 chars | "All that seemed left for philosophy were" |[0.0197,
📥 4193 chars | "Its brokenness did, as so often happens," |[0.0065,
📥 4339 chars | "If he even knew about the strange classe" [-0.0068,
📥 4291 chars | "The students and faculty in the painting" [-0.0073,
📥 4329 chars | "I wanted to go back to RISD, but I was n" |[0.0019,
📥 4261 chars | "But alas it was more like the Accademia" | [0.0065,
📥 4293 chars | "After I moved to New York I became her d" [-0.0001,
📥 4319 chars | "Now we felt like we were really onto som" [-0.0179,
📥 4258 chars | "In its time, the editor was one of the b" [-0.0091,
📥 4181 chars | "A company with just a handful of employe" |[0.0008,
📥 4244 chars | "I stuck it out for a few more months, th" |[0.0073,
📥 4292 chars | "But about halfway through the summer I r" |[0.0034,
📥 4456 chars | "One of the most conspicuous patterns I'v" [-0.0037,
📥 4454 chars | "Horrified at the prospect of having my i" |[0.0007,
📥 4235 chars | "We'd use the building I owned in Cambrid" |[0.0128,
📥 4128 chars | "It was originally meant to be a news agg" |[0.0031,
📥 4161 chars | "It had already eaten Arc, and was in the" |[0.0125,
📥 4381 chars | "Then in March 2015 I started working on" |[-0.0092,
📥 4352 chars | "I remember taking the boys to the coast" | [0.0182,
📥 4472 chars | "But when the software is an online store" [-0.0007,
📥 1805 chars | "[17] Another problem with HN was a bizar" | [0.005,
🎉 Done inserting all data.
Program to leverage the embeddings when asking the LLM questions:
• Your question is converted to a vector
• The vector is used to retrieve relevant chunks from the database
• Those chunks are passed to GPT-4
• GPT-4 answers your question using that context
question.py
import psycopg2
import openai
import os
from llama_index.embeddings.openai import OpenAIEmbedding
# --- Setup ---
openai.api_key = os.getenv("OPENAI_API_KEY")
embed_model = OpenAIEmbedding(model="text-embedding-ada-002")
client = openai.OpenAI()
connection_string = "postgresql://yugabyte:password@127.0.0.1:5433/yugabyte"
def ask_question(question, top_k=7):
# Connect to DB
conn = psycopg2.connect(connection_string)
cursor = conn.cursor()
# Get embedding of question
query_embedding = embed_model.get_query_embedding(question)
embedding_str = "[" + ",".join(map(str, query_embedding)) + "]"
# Vector search
search_sql = """
SELECT id, article_text, embedding <=> %s AS distance
FROM vectors
ORDER BY embedding <=> %s
LIMIT %s;
"""
cursor.execute(search_sql, (embedding_str, embedding_str, top_k))
results = cursor.fetchall()
cursor.close()
conn.close()
if not results:
return "No matching documents found."
# Print the first 40 characters of each retrieved chunk
print("\n🔍 Retrieved context snippets:")
for _, text, distance in results:
print(f"- {text[:40]!r} (distance: {distance:.4f})")
# Build context
context = "\n\n".join([f"{text}" for (_, text, _) in results])
# Ask GPT
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a helpful assistant that answers questions using the provided context."},
{"role": "user", "content": f"Context:\n{context}"},
{"role": "user", "content": f"Question: {question}"}
],
temperature=0.3,
)
return response.choices[0].message.content.strip()
# --- Interactive Loop ---
if __name__ == "__main__":
try:
print("Ask me a question (press Ctrl+C to quit):\n")
while True:
question = input("❓ Your question: ").strip()
if not question:
continue
answer = ask_question(question)
print("\n💡 Answer:\n" + answer + "\n")
except KeyboardInterrupt:
print("\nGoodbye! 👋")
% python question.py
Ask me a question (press Ctrl+C to quit):
❓ Your question: tell me about paul graham
🔍 Retrieved context snippets:
- 'Over the next several years Paul Graham' (distance: 0.1471)
- 'The article is about Paul Graham\n\nWhat ' (distance: 0.1513)
- 'Paul Graham certainly did. So at the end' (distance: 0.1523)
- 'They either lived long ago or were myste' (distance: 0.1530)
- 'But the most important thing Paul Graha' (distance: 0.1583)
- 'You can do something similar on a map of' (distance: 0.1621)
- 'When Paul Graham was dealing with some ' (distance: 0.1628)
💡 Answer:
Paul Graham is a writer, programmer, and entrepreneur. He has written numerous essays on various topics, some of which were reprinted as a book titled "Hackers & Painters". He has also worked on spam filters and has a passion for painting. He was known for hosting dinners for a group of friends every Thursday night, teaching him how to cook for groups.
Before college, Graham mainly focused on writing and programming. He wrote short stories and tried programming on the IBM 1401. He later got a microcomputer and started programming more seriously, writing simple games and a word processor. In college, he initially planned to study philosophy but switched to AI.
Graham also worked on a new dialect of Lisp, called Arc, and gave a talk at a Lisp conference about how they'd used Lisp at Viaweb. This talk gained significant attention online, leading him to realize the potential of online essays.
In 2003, Graham met Jessica Livingston at a party. She was in charge of marketing at a Boston investment bank and later compiled a book of interviews with startup founders. In 2005, Graham and Livingston, along with Robert and Trevor, decided to start their own investment firm, which became Y Combinator.
Graham also worked on several different projects, including the development of the programming language Arc and the creation of the online platform Hacker News. In 2012, he decided to hand over Y Combinator to Sam Altman and retire.
Comentarios