Building a RAG Tool in Ruby 4: What Actually Happened

One of my priorities this quarter was running a few AI pilot experiments. This was one of them.

When I mentioned the project to a teammate, he said, “You should write this up.” So here we are.

Others on our team had already been exploring embeddings, vector databases, and RAG. I’d been watching from the sidelines… until it was time to roll up my sleeves and build something myself.

The Problem

At Planet Argon, we manage several client projects. We live in Jira (I know… I know…). We keep decisions in Confluence. We ship code from GitHub. Over years a lot of institutional knowledge piles up across those systems… past bugs, old tradeoffs, and the “we tried that once” stories.

The problem is that nobody remembers all of it. A new ticket comes in: “users can’t export reports to PDF”. Somewhere in Jira there’s a ticket from eight months ago where we debugged a Safari-specific PDF export issue. Of course it was Safari. Somewhere in Confluence there’s a permissions matrix that’s suddenly relevant. If you weren’t assigned to the project back then, you would never know to look.

So we start over. We ask clarifying questions from scratch. We search Slack to ask if anyone has asked something like this before. Tickets go into development with vague acceptance criteria, and the back-and-forth that should have happened before coding shows up during code review and/or when we’re QAing on staging instead.

A vague ticket is a polite way to ask engineers to guess. Guessing can be expensive.

I wanted to build something that could surface that historical context automatically. Point it at a ticket and get suggested clarifying questions grounded in what we actually ~~know~~ “remember” about this project.

Why Ruby, Why Minimal Dependencies

Ruby is what our team loves working in. If we were going to learn embeddings, vector search, and LLM integration, I wanted everything around those ideas to feel familiar.

I also wanted to keep the dependency footprint deliberately small. This is an internal tool for a small team. Every Ruby gem you add is a gem you maintain. I’ve watched too many internal tools rot after someone pulled in thirty dependencies for a weekend project, then nobody wanted to deal with the upgrade treadmill six months later.

Rather than listing the full Gemfile here, I’ll call out the gems that aren’t obvious. You already know thor, faraday, nokogiri, and concurrent-ruby. The interesting ones:

ruby-openai: handles both embedding generation and LLM completions. One gem, two jobs.
pinecone and chroma-db: Pinecone for the shared production index, Chroma for local development via Docker. More on this below.
mcp: Model Context Protocol server for Claude Code integration. This came later, and it changed everything.
The tty-* family: not necessary… but nicer when you’re watching a 20-minute ingestion run.

Beyond that, Ruby’s standard library handled most of the rest. The instinct to reach for a gem is strong, but for most things the stdlib is genuinely sufficient.

Why Not a Server

Early on I made a decision that shaped the whole architecture: no running HTTP server with an endpoint. (at least, not yet).

A server is a commitment. Hosting. VPNs. Monitoring. Security reviews. Someone eventually asking, “who owns this?”. For an internal experiment that might not pan out, that felt like a lot of ceremony up front.

So I built it as a CLI tool. Each engineer runs it locally on their own machine. The only shared infrastructure is Pinecone, a cloud-hosted vector database. Everyone gets API keys to the same Pinecone index, but each client’s data lives in its own namespace. Engineers use their own Atlassian and GitHub API tokens when they want to run an ingestion.

Related: Internal Tooling Maturity Ladder is an approach that I’ve been exploring with our internal tools. The idea is to start with the simplest possible implementation (a script that solves the problem for one person), then evolve it through stages of maturity (CLI tool, shared server, versioned gem) as the need becomes clearer and the team is ready to invest more.

Here’s what the environment setup looks like:

# .env: each engineer has their own copy
# OpenAI (for embeddings and analysis)
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o-mini

# Atlassian (shared instance, individual tokens)
ATLASSIAN_BASE_URL=https://planetargon.atlassian.net
ATLASSIAN_EMAIL=you@planetargon.com
ATLASSIAN_API_TOKEN=ATATT3x...

# GitHub (individual tokens)
GITHUB_TOKEN=ghp_...

# Pinecone (shared index, namespaces isolate client data)
PINECONE_API_KEY=...
PINECONE_INDEX_NAME=clarion

This kept the experiment low-stakes. No deployment pipeline, no server to maintain, no VPN to configure. If it didn’t work out, there was nothing to decommission. Engineers pull updates from the main branch, run bundle install, and they’re current. It’ll likely become a proper gem we version at some point, but for now the simplicity of “pull main and go” is working fine.

What the CLI Looks Like

We called it Clarion. A clarion is a signal… felt right for a tool whose whole job is to surface things worth paying attention to.

The entrypoint is dead simple:

#!/usr/bin/env ruby
require "bundler/setup"
require_relative "../lib/clarion"

Clarion::CLI.start(ARGV)

Here’s the help output:

$ bin/clarion help

Commands:
  clarion analyze TICKET_ID    # Analyze a Jira ticket and suggest clarifications
  clarion help [COMMAND]       # Describe available commands or one specific command
  clarion ingest SUBCOMMAND    # Ingest data from various sources
  clarion ingest_all CLIENT    # Ingest Jira, Confluence, and GitHub data for a client
  clarion mcp                  # Start MCP server (for Claude Code integration)

$ bin/clarion help ingest

Commands:
  clarion ingest confluence    # Ingest Confluence pages for a specific space
  clarion ingest github        # Ingest GitHub repository data
  clarion ingest help          # Describe subcommands or one specific subcommand
  clarion ingest jira          # Ingest Jira tickets for a specific project

The CLI: Thor Subcommands

Thor for the CLI, nothing exotic. Here’s the skeleton:

module Clarion
  class CLI < Thor
    desc "analyze TICKET_ID", "Analyze a Jira ticket and suggest clarifications"
    option :verbose, type: :boolean, desc: "Enable verbose output"
    def analyze(ticket_id)
      validate_ticket_id!(ticket_id)
      analyzer = Clarion::Analyzer.new(ticket_id, verbose: options[:verbose])
      puts analyzer.analyze
    end

    desc "ingest_all CLIENT", "Ingest Jira, Confluence, and GitHub data for a client"
    option :limit, type: :numeric, default: 100
    option :parallel, type: :boolean, default: true
    def ingest_all(client_name)
      # Looks up client config, dispatches to parallel ingestion
    end

    desc "mcp", "Start MCP server (for Claude Code integration)"
    option :namespace, type: :string, desc: "Client namespace (auto-detected if omitted)"
    def mcp
      Clarion::McpServer.new(namespace: options[:namespace]).run
    end

    # Nested subcommand for individual ingestion
    desc "ingest SUBCOMMAND", "Ingest data from various sources"
    subcommand "ingest", Ingest

    private

    def validate_ticket_id!(ticket_id)
      return if ticket_id =~ /^[A-Z]+-\d+$/
      raise Thor::Error, "Invalid ticket ID format. Expected: PROJECT-123"
    end
  end
end

The Ingest subcommand is its own Thor class, giving us scoped commands for each data source:

# Analyze a ticket
$ bin/clarion analyze WR-123

# Ingest everything for a client (parallel by default)
$ bin/clarion ingest_all waystar --limit=500

# Or ingest individual sources
$ bin/clarion ingest jira --namespace=waystar --project=WR --limit=500
$ bin/clarion ingest confluence --namespace=waystar --space=WR
$ bin/clarion ingest github --namespace=waystar --repo=planetargon/waystar-web

# Start an MCP server for Claude Code
$ bin/clarion mcp --namespace=waystar

Every ingest command requires explicit --namespace and source-specific scoping flags (--project, --space, --repo). This is deliberate. Operations should never run without explicit client scope.

Client Configuration

Each client maps to a namespace, a Jira project, a Confluence space, and optionally GitHub repos:

# config/clients.yml
clients:
  waystar:
    namespace: waystar
    jira_project: WR
    confluence_space: WR
    vector_store: pinecone
    github_repos:
      - planetargon/waystar-web
      - planetargon/waystar-api

  piedpiper:
    namespace: piedpiper
    jira_project: PP
    confluence_space: PP
    vector_store: pinecone
    github_repos:
      - planetargon/piedpiper-app

  pierpoint:
    namespace: pierpoint
    jira_project: PPC
    confluence_space: PPC
    vector_store: chroma    # Local Chroma for testing

Note the per-client vector_store setting. One client can use Pinecone (shared, cloud-hosted) while another uses Chroma (local Docker instance) for development. The tool doesn’t care. The vector store abstraction handles it.

Embeddings: Simpler Than I Expected, Until They Weren’t

The ruby-openai gem makes the embedding call straightforward:

EMBEDDING_MODEL = "text-embedding-3-small"
EMBEDDING_DIMENSION = 1536

def generate_embedding(text)
  return Array.new(EMBEDDING_DIMENSION, 0.0) if text.nil? || text.strip.empty?

  response = @openai.embeddings(
    parameters: {
      model: EMBEDDING_MODEL,
      input: text.strip
    }
  )

  response["data"][0]["embedding"]
end

One thing I didn’t appreciate initially is that every embedding call costs money and adds latency. My early version used search, which takes a text string and internally calls OpenAI to generate an embedding before querying Pinecone:

# Before: each search() call generates its own embedding internally
similar  = @vector_store.search(query_text, filter: { source: "jira" })
docs     = @vector_store.search(query_text, filter: { source: ["confluence", "github"] })
resolved = @vector_store.search(query_text, filter: resolved_filter)

That’s three sequential calls to OpenAI’s embedding API for the exact same text, followed by three sequential calls to Pinecone. Six network round-trips, all in series.

Looking at the search method, you can see why. It generates a fresh embedding every time:

def search(query, filter: nil, top_k: 10)
  query_embedding = generate_embedding(query)  # Hits OpenAI every call
  search_by_vector(query_embedding, filter: filter, top_k: top_k)
end

The fix was two things at once: generate the embedding once, then pass that vector directly to search_by_vector (which skips the embedding step). Then run those three Pinecone queries concurrently:

# After: one embedding, three parallel vector searches
query_vector = @search.embed(query_text)

similar  = Thread.new { @search.search_by_vector(query_vector, source: "jira") }
docs     = Thread.new { @search.search_by_vector(query_vector, source: ["confluence", "github"]) }
resolved = Thread.new { @search.search_by_vector(query_vector, resolved_filter) }

The OpenAI embedding calls went from 3 to 1. The Pinecone queries stayed at 3 but now run concurrently instead of sequentially. Two wins from a small refactor.

I also learned about truncation the hard way. Some Jira tickets are enormous… long comment threads, embedded images described in markup, and extensive acceptance criteria. The embedding model has a token limit. We now truncate text at 30,000 characters before sending it for embedding.

Would’ve been nice to learn that from documentation rather than from a production error. Oh well.

The Vector Store Abstraction

I didn’t want to be locked into a single vector database, especially early on when I wasn’t sure which one would work best for us. So I built a simple abstraction layer. It’s a factory that returns different backends behind the same interface:

class VectorStore
  def self.new(namespace:, backend: nil)
    backend ||= ENV.fetch("VECTOR_STORE_BACKEND", "memory")
    case backend.downcase
    when "pinecone" then VectorStores::Pinecone.new(namespace: namespace)
    when "chroma"   then VectorStores::Chroma.new(namespace: namespace)
    when "memory"   then VectorStores::Memory.new(namespace: namespace)
    end
  end
end

All three backends implement the same base contract:

module VectorStores
  class Base
    attr_reader :namespace

    def initialize(namespace: nil)
      @namespace = namespace
    end

    def upsert(documents)
      raise NotImplementedError, "#{self.class}#upsert must be implemented"
    end

    def search(query, filter: nil, top_k: 10)
      raise NotImplementedError, "#{self.class}#search must be implemented"
    end

    def search_by_vector(vector, filter: nil, top_k: 10)
      raise NotImplementedError, "#{self.class}#search_by_vector must be implemented"
    end

    def embed(text)
      raise NotImplementedError, "#{self.class}#embed must be implemented"
    end

    def delete_all(namespace: nil)
      raise NotImplementedError, "#{self.class}#delete_all must be implemented"
    end

    def stats
      raise NotImplementedError, "#{self.class}#stats must be implemented"
    end
  end
end

Callers just use upsert, search, search_by_vector, stats. They never know or care whether they’re talking to Pinecone, Chroma, or an in-memory hash.

The Pinecone backend stores document text inside the metadata (Pinecone doesn’t have a native text field), then strips it back out on retrieval:

# During upsert: embed text into metadata
metadata = (doc[:metadata] || {}).merge(text: doc[:text])
{ id: doc[:id], values: embedding, metadata: metadata }

# During search: extract text back out, unescape newlines
matches.map do |match|
  result = match.dup
  if result["metadata"] && result["metadata"]["text"]
    text = result["metadata"]["text"]
    result["text"] = text.is_a?(String) ? text.gsub('\\n', "\n") : text
    result["metadata"] = result["metadata"].except("text")
  end
  result
end

This paid off quickly. We started with the in-memory backend (pure Ruby cosine similarity, persists to a JSON file) just to prove the concept worked at all. Then Chroma for local development. You can run it in Docker. No cloud account needed. Then Pinecone for the shared production dataset that the whole team can access.

Ingesting Messy Real-World Data

This is where things got messy.

Jira: Flattening the Ticket

Each Jira ticket gets transformed into a document with an ID, a text blob, and structured metadata:

def transform(ticket)
  key = ticket["key"]
  fields = ticket["fields"] || {}

  {
    id: "jira_#{@namespace}_#{key}",   # e.g., "jira_waystar_WR-123"
    text: build_text(key, fields),
    metadata: build_metadata(key, fields)
  }
end

The text blob concatenates everything meaningful about the ticket: the key, summary, description, comments (with author tags), labels, parent/subtask relationships, and any embedded Confluence links.

Jira’s rich text format is a nested JSON tree. Jira uses something called Atlassian Document Format (ADF) for ticket descriptions and comments. It’s not HTML. It’s not Markdown. It’s a deeply nested JSON structure with node types like paragraph, bulletList, taskItem, mention, inlineCard, and emoji. I had to write a recursive parser to walk that tree and flatten it into plain text:

class AdfParser
  def extract_text(adf_doc)
    return "" unless adf_doc.is_a?(Hash)
    extract_blocks(adf_doc).join(" ").strip
  end

  private

  def extract_blocks(adf_doc)
    return [] unless adf_doc["content"].is_a?(Array)
    adf_doc["content"].map { |node| format_block(node) }
  end

  def format_block(node)
    return "" unless node.is_a?(Hash)

    case node["type"]
    when "taskList" then format_task_list(node)
    when "bulletList", "orderedList" then format_list(node)
    else extract_from_node(node)
    end
  end

  def extract_from_node(node)
    case node["type"]
    when "text"      then node["text"] || ""
    when "hardBreak" then "\n"
    when "mention"   then "@#{node.dig('attrs', 'text') || 'user'}"
    when "emoji"     then node.dig("attrs", "shortName") || ""
    when "inlineCard", "blockCard" then node.dig("attrs", "url") || ""
    else inline_text(node)
    end
  end
end

Not complex, but the kind of thing you don’t anticipate until you see your first embedding full of raw JSON nodes. Thankfully, we can task Claude Code with figuring out some of this chaos.

Comment authors matter. We tag each Jira comment as [Team] or [Client] based on the commenter’s email domain:

def determine_author_type(email)
  if email.include?("@planetargon.com")
    "[Team]"
  elsif email.empty?
    ""
  else
    "[Client]"
  end
end

This matters more than I thought it would. The LLM can distinguish between internal engineering discussion and client-facing conversation when generating suggested questions.

GitHub: PRs, Issues, Docs, and Code

We pull READMEs, docs, PRs, issues, and source files. Honestly, the source code has been the least useful of the bunch. PRs and issues have the “why”… the discussion, the tradeoffs, the things that almost shipped but didn’t. Source files have the “what,” but without the surrounding conversation the embedding doesn’t give you much you couldn’t get from grep.

Batch Uploads and Deterministic IDs

Documents get uploaded to the vector store in batches of 20. Errors in one batch don’t abort subsequent batches:

class BatchUploader
  BATCH_SIZE = 20

  def upload(documents)
    documents.each_slice(BATCH_SIZE) do |batch|
      @vector_store.upsert(batch)
      @processed_count += batch.length
    rescue StandardError => e
      @error_count += batch.length
    end
  end
end

Every document gets a deterministic ID based on its source: jira_waystar_WR-123, confluence_waystar_12345_chunk_2, or github_waystar_waystar-web_pr_47. This means re-running ingestion overwrites old documents instead of creating duplicates. Engineers can re-ingest anytime without polluting the dataset.

When a Jira ticket updates, the next ingestion run replaces the old embedding with the new one. Same with Confluence pages and GitHub content. The vector store stays in sync with reality without complex change detection or deletion logic.

One gap we haven’t closed yet: deletion. When a Jira ticket gets deleted or a Confluence page is removed, the old embedding stays in the index. We have no cleanup strategy right now. It hasn’t caused real problems, but it’s on the list.

The tradeoff: someone needs to remember to run ingestion periodically. But the simplicity is worth it.

Parallel Ingestion with concurrent-ruby

When ingesting all sources for a client, the tool uses concurrent-ruby to run Jira, Confluence, and GitHub ingestions in parallel:

pool = Concurrent::FixedThreadPool.new(3)

futures = []
futures << Concurrent::Future.execute(executor: pool) { ingest_jira }
futures << Concurrent::Future.execute(executor: pool) { ingest_confluence }
github_repos.each do |repo|
  futures << Concurrent::Future.execute(executor: pool) { ingest_github(repo) }
end

# Wait for all to complete
futures.each(&:wait)

Thread-safe state tracking uses Concurrent::Hash:

@results = Concurrent::Hash.new
@timings = Concurrent::Hash.new
@status = Concurrent::Hash.new

After completion, the tool calculates time saved versus sequential execution and reports a speedup factor. In practice, parallel ingestion typically finishes in about 60% of the time sequential would take, since the API calls to Jira, Confluence, and GitHub can overlap.

Running an ingestion looks like this:

$ bin/clarion ingest_all waystar --limit=500

════════════════════════════════════════════════════════════
                  COMBINED DATA INGESTION
════════════════════════════════════════════════════════════

ℹ Client: waystar
ℹ Namespace: waystar
ℹ Vector store: pinecone
ℹ Jira project: WR
ℹ Confluence space: WR
ℹ GitHub repos: planetargon/waystar-web
ℹ Limit: 500 items per source
ℹ Mode: Parallel

  ✓ Jira (WR)                     Complete (487/500 processed)
  ✓ Confluence (WR)               Complete (245/500 processed)
  ✓ GitHub: waystar-web            Complete (498/500 processed)

════════════════════════════════════════════════════════════
                    INGESTION RESULTS
════════════════════════════════════════════════════════════

ℹ ✓ Jira: 487 processed, 0 errors (45.2s)
ℹ ✓ Confluence: 245 processed, 0 errors (38.1s)
ℹ ✓ Github Waystar Web: 498 processed, 0 errors (52.7s)

════════════════════════════════════════════════════════════
                   PERFORMANCE SUMMARY
════════════════════════════════════════════════════════════

ℹ Total documents processed: 1230
✓ Total time: 58.3s
ℹ Time saved vs sequential: 77.7s (2.3x speedup)

✓ Client 'waystar' is ready for analysis!

Retrieval and Re-Ranking

Raw cosine similarity gets you most of the way there, but not all the way. The vector search returns the 40 most similar Jira tickets, and some of them are similar for the wrong reasons… same boilerplate language, same component name, but not actually useful context.

The context builder generates one embedding, then runs three concurrent searches. Similar tickets. Resolved tickets filtered by component. Documentation from Confluence and GitHub.

def gather_all_context(ticket, ticket_id, current_key, created_time)
  query = @search.build_query(ticket)
  query_vector = @search.embed(query)

  similar_thread = Thread.new do
    results = @search.search_by_vector(query_vector, { source: "jira" }, 40)
    score_and_limit_results(results, ticket, current_key, created_time, 16)
  end

  resolved_thread = Thread.new do
    results = @search.search_by_vector(query_vector, resolved_filter, 12)
    format_resolved_tickets(results)
  end

  docs_thread = Thread.new do
    results = @search.search_by_vector(query_vector, { source: ["confluence", "github"] }, 32)
    process_and_limit_docs(results, ticket, ticket_id, created_time, 16)
  end

  {
    similar_tickets:  similar_thread.value,
    related_resolved: resolved_thread.value,
    documentation:    docs_thread.value
  }
end

After retrieval, I added two scoring tweaks that made a noticeable difference:

Relationship boost. If a retrieved ticket is a parent or subtask of the ticket being analyzed, its score gets a 1.5x multiplier:

def apply_relationship_boost(ticket_data, relationship_type)
  ticket_data[:relationship] = relationship_type
  ticket_data[:score] *= 1.5
end

Temporal decay. Tickets older than 7 days get a 0.7x multiplier. Older than 30 days, 0.3x:

def age_adjustment_params(days_before)
  return [0.3, "Created #{days_before} days before ticket"] if days_before > 30
  return [0.7, nil] if days_before > 7
  [nil, nil]
end

These aren’t machine learning models. They’re just multipliers applied after retrieval. I was surprised how much difference they made. A few lines of Ruby math moved the output from “interesting but noisy” to something I’d actually act on.

It’s still early days, I expect that we’ll likely need to tweak this a bunch as we see more real-world queries and get feedback from engineers.

The Prompting Side

Two things surprised me here.

First: structured JSON output. Huge deal. We set response_format: { type: "json_object" } on the LLM call, which means the response is always valid JSON. No regex parsing, no hoping the model follows your format instructions. The response comes back with a defined structure:

{
  "ticket_type": "feature",
  "clarity_assessment": "needs_clarification",
  "clarifying_questions": [
    {
      "question": "The question to ask the client",
      "rationale": "Why this matters for implementation",
      "reference": "WR-892: similar issue last quarter"
    }
  ],
  "suggested_acceptance_criteria": [
    "User can export all report types to PDF",
    "Export completes within 30 seconds",
    "Error message displays if export fails"
  ],
  "potential_edge_cases": [
    "Special characters in report data",
    "Very large reports (>10,000 rows)"
  ],
  "implementation_notes": "Brief notes on approach"
}

Once you have reliable structure, everything downstream gets simpler.

Second: the prompt is where your actual communication style lives. Ours doesn’t just say “generate clarifying questions”… it encodes how we actually talk to clients:

Instead of asking open-ended technical questions, frame them as confirmations:

“It sounds like this needs to work in Chrome. Should we also make sure it works in Safari and Firefox?”

Rather than:

“What browsers need to be supported?”

The prompt covers dozens of specific communication scenarios. A few examples from the actual prompt file:

When clients apologize for not being technical:

“No need to apologize. You’re describing exactly what we need to know. The ‘what’s broken’ is your expertise; the ‘why it’s broken’ is ours.”

When scope is creeping:

“There’s a lot of good stuff here. To make sure nothing gets lost, would it help to break this into separate tickets? That way we can track the export fix and the new filter feature independently.”

When clients describe workarounds they’re using:

“Good thinking on the CSV workaround. That’ll keep things moving. We’ll fix the PDF export so you don’t have to keep doing that extra step.”

When something is working as designed:

“So it turns out the system is doing what it was originally built to do, but I hear you that it’s not what you need it to do. Want us to write up a feature request to change this behavior?”

The prompt took real care to get right, but not because it was technically hard. As a client-services company we put a lot of thought into our communication style… how we ask questions, how we handle scope creep, how we respond when something is working as designed but not as the client expected. Baking that into the prompt mattered. We also have engineers from different regions and backgrounds, and anything that helps guide everyone toward a consistent, confident tone with clients is worth the effort.

This is the part that makes it ours and not just another RAG wrapper. The vector search finds the history. The prompt makes it sound like us.

We also maintain two separate prompt files. prompts/analyzer_default.md is for open tickets (“what’s unclear?”). prompts/analyzer_completed.md is for closed tickets (retrospective analysis). The tool detects the ticket’s status and selects the right prompt automatically. It’s a small touch, but it means the output is always contextually appropriate.

I expected engineers to be the primary users. The first people to actually build it into their workflow were our PMs. They’re in tickets all day, working with clients on acceptance criteria… and having suggested clarifying questions grounded in project history turns out to be directly useful for that conversation, not just for development. I should have seen that coming.

The MCP Surprise

I didn’t expect this part to become the most useful thing in the whole project.

The tool started as a CLI experiment. Run bin/clarion analyze WR-123 in your terminal, get output, copy what’s useful. It worked, but there was friction. You had to switch contexts, looking at a Jira ticket, looking at and jumping away from your editor, and remember the command syntax.

Having spent a bunch of time recently in Claude code, I wondered… could we bring this analysis directly into the editor? I think this took me less than two hours from “I wonder if this could be an MCP server” to “oh wow, it’s actually working”.

I quickly found the mcp gem, which implements Anthropic’s Model Context Protocol. MCP lets you expose a tool as a server that Claude Code can call directly. Here’s what the server looks like:

class McpServer
  def initialize(namespace: nil, working_directory: Dir.pwd)
    @namespace = namespace
    @working_directory = working_directory
    @client = resolve_client
  end

  def run
    server = build_server
    transport = MCP::Server::Transports::StdioTransport.new(server)
    transport.open
  end

  private

  def build_server
    MCP::Server.new(
      name: "clarion",
      version: Clarion::VERSION,
      tools: [Mcp::AnalyzeTool.build(@client)]
    )
  end
end

The MCP tool itself is built dynamically. The tool description is baked in with the client’s namespace and ticket prefix at startup time, so Claude Code knows exactly what it can do:

module AnalyzeTool
  def self.build(client)
    tool = Class.new(MCP::Tool) do
      tool_name "analyze_ticket"
      description "Analyze a Jira ticket and suggest clarifying questions " \
                  "and acceptance criteria. Scoped to client '#{client.namespace}' " \
                  "(ticket prefix: #{client.ticket_prefix})."

      input_schema(
        properties: {
          ticket_key: {
            type: "string",
            description: "Jira ticket ID (e.g., #{client.ticket_prefix}-123)"
          }
        },
        required: ["ticket_key"]
      )
    end
    # ... wire up call, validation, and analysis methods
    tool
  end
end

Each MCP server instance is scoped to a single client namespace. When an engineer is working in a client’s repository, they drop a small JSON config file at the repo root:

{
  "mcpServers": {
    "clarion": {
      "command": "/path/to/clarion/bin/clarion-mcp",
      "args": ["--namespace=waystar"]
    }
  }
}

Getting set up isn’t quite drag-and-drop. A new engineer needs to clone the repo, run bundle install, configure their own .env with API keys for OpenAI, Atlassian, GitHub, and Pinecone, then drop the right config file into whichever client repo they’re working in. Our team is used to juggling .env files across multiple projects, so this didn’t feel like a big ask… but it’s worth knowing upfront if you’re thinking about something similar.

The bin/clarion-mcp wrapper is a one-liner. It sets the working directory, then delegates:

#!/bin/bash
cd "$(dirname "$0")/.."
exec bundle exec ruby -Ilib bin/clarion mcp "$@"

Now they can ask Claude Code to “analyze WR-123” and get the full analysis inline. Clarifying questions. Suggested acceptance criteria. Edge cases. Implementation notes. All without leaving their editor.

Auto-detection from git remote. If the client’s repo is configured in clients.yml with its github_repos, you can even skip the --namespace flag. The server shells out to git remote get-url origin, parses the owner/repo slug, and looks it up automatically.

One gotcha worth mentioning: TTY output breaks MCP’s stdio transport. All those nice spinners and progress bars and colored output that make the CLI experience polished? They corrupt the MCP response stream. I had to suppress stdout during MCP calls:

def run_analysis(key)
  config = AnalyzerConfig.build(key, result_formatter: PlainTextFormatter.new)
  original_stdout = $stdout
  $stdout = File.open(File::NULL, "w")
  begin
    Analyzer.new(config).analyze
  ensure
    $stdout = original_stdout
  end
end

Small thing, but it would have been confusing to debug without knowing to look for it. We also have a separate PlainTextFormatter that outputs clean text for MCP, versus the ResultFormatter that uses colored boxes and unicode for the CLI.

Where It Gets Really Interesting: MCP in Combination

Clarion as an MCP server is useful on its own. But the thing that got me excited was running it alongside other MCP servers in the same Claude Code session.

Our engineers can have Clarion (our embedded project history), the Atlassian MCP (live read/write access to Jira and Confluence), and the GitHub MCP all connected at once.

The combination we’ve found most useful: analyze a ticket with Clarion, review the suggested clarifying questions, adjust the wording, then use the Atlassian MCP to post the comment directly on the Jira ticket. The whole loop closes without leaving Claude Code.

Cross-source research has been useful too. Ask “what do we know about how authentication works in this project?” and get results from Jira tickets where auth bugs were fixed, Confluence pages documenting the auth flow, and GitHub PRs where the auth code was changed. All from one query, scoped to that client. With GitHub MCP connected, you can then check whether the docs still match the actual code.

The underlying idea is that before anyone starts building, the ticket should be clear. Clarion sits at that boundary. The suggested questions aren’t generic… they’re informed by the specific history of this project. “Last time we did a PDF export on this project, Safari caused problems” is more useful than “have you considered browser compatibility?”.

Multi-Tenant Scoping: The Hard Constraint

One constraint that shaped everything: Planet Argon uses a single Atlassian account across most of our client projects (some clients own their own Atlassian accounts). Same Jira instance, same Confluence instance, one set of API credentials.

That means data isolation has to be enforced in our code, not by infrastructure boundaries. Every operation requires an explicit client namespace. The vector store uses that namespace to partition data. One Pinecone index. Many isolated namespaces. Ticket IDs are validated against the expected prefix before any analysis runs.

Granted, our engineers do have access to reference different clients at the same time in their Atlassian account, but the tool itself is always scoped to one client per run. That’s the important part.

The validation happens at multiple layers. In the CLI:

def validate_ticket_id!(ticket_id)
  return if ticket_id =~ /^[A-Z]+-\d+$/
  raise Thor::Error, "Invalid ticket ID format. Expected: PROJECT-123"
end

And again in the MCP tool, where it also checks the prefix matches the scoped client:

def validate_ticket_prefix!(key)
  unless key.match?(/^[A-Z]+-\d+$/)
    raise ArgumentError, "Invalid ticket ID format: #{key}. Expected: PROJECT-123"
  end

  prefix = key.split("-").first
  return if prefix == scoped_client.ticket_prefix

  raise ClientScopeError,
        "Ticket #{key} does not belong to client " \
        "'#{scoped_client.namespace}' (expected prefix: #{scoped_client.ticket_prefix})"
end

If you’re working in the waystar namespace and try to analyze PP-123, you get a clear error: "Ticket PP-123 does not belong to client 'waystar' (expected prefix: WR)". Not results from the wrong client.

It’s a simple system. Namespaces and prefix checks. Again, engineers technically have access to all clients’ data in Atlassian, but the tool enforces discipline. You have to be intentional about which client’s context you’re working in. We don’t want someone accidentally running an analysis against the wrong client’s project and making assumptions based on irrelevant history.

What’s Next

A few people on the team are using it regularly. I’m hoping that spreads in the coming weeks. Whether this becomes a genuine workflow change or an interesting experiment we eventually deprioritize… I genuinely don’t know yet. I’ll write a follow-up when there’s something real to report.

We’re using gpt-4o-mini for now because it was the easiest thing to get running and the output has been good enough that we haven’t felt pressure to switch.

Atlassian is building AI features into Jira and Confluence, and some of that will overlap with what we’ve built. But Atlassian’s tooling only knows about what’s inside Atlassian. It can’t see GitHub repos, PR histories, or how past implementations actually played out in code. Our tool bridges that gap… context across all three systems, shaped by how we work.

Our team is also experimenting more with LLM-assisted code generation. But this tool sits deliberately upstream of that. It’s about the collaboration layer. Making sure what we’re about to build is well-understood before anyone writes code. A perfectly generated pull request against a vague ticket is still a miss.

We’ll probably open source this eventually, but the codebase is full of references to real client projects in tests and config. Scrubbing that is on the list. Not the priority right now.

If you’re thinking about building something like this… just start. Ruby has what you need. The gems are there. It’s more approachable than it looks from the outside.

p.s. Oh, and did you notice I never mentioned Ruby 4 anywhere in this post? That’s because there’s nothing to mention. It just worked.

Building a RAG Tool in Ruby 4: What Actually Happened

The Problem

Why Ruby, Why Minimal Dependencies

Why Not a Server

What the CLI Looks Like

The CLI: Thor Subcommands

Client Configuration

Embeddings: Simpler Than I Expected, Until They Weren’t

The Vector Store Abstraction

Ingesting Messy Real-World Data

Jira: Flattening the Ticket

GitHub: PRs, Issues, Docs, and Code

Batch Uploads and Deterministic IDs

Parallel Ingestion with concurrent-ruby

Retrieval and Re-Ranking

The Prompting Side

The MCP Surprise

Where It Gets Really Interesting: MCP in Combination

Multi-Tenant Scoping: The Hard Constraint

What’s Next

Tags

Hi, I'm Robby.

Building a RAG Tool in Ruby 4: What Actually Happened

The Problem

Why Ruby, Why Minimal Dependencies

Why Not a Server

What the CLI Looks Like

The CLI: Thor Subcommands

Client Configuration

Embeddings: Simpler Than I Expected, Until They Weren’t

The Vector Store Abstraction

Ingesting Messy Real-World Data

Jira: Flattening the Ticket

GitHub: PRs, Issues, Docs, and Code

Batch Uploads and Deterministic IDs

Parallel Ingestion with concurrent-ruby

Retrieval and Re-Ranking

The Prompting Side

The MCP Surprise

Where It Gets Really Interesting: MCP in Combination

Multi-Tenant Scoping: The Hard Constraint

What’s Next

Tags

Hi, I'm Robby.

Related Posts

Healthy Rails Apps Don't Happen by Accident