Mining Gold in Digital Conversations

Gold Miner is an app I created to transform interesting conversations we have at thoughtbot into blog posts. The articles generated are part of the This week in #dev series, and today I’ll talk about some of the technical details of the app, like how we use artificial intelligence, async Ruby, and other interesting patterns.

The MVP

The first step was to classify what I thought were “interesting messages”. We share a lot on our public Slack channels, so I decided to search messages containing “tip” or “TIL” in them. To allow people to hand-pick particular messages, I also fetched anything reacted with a :rupee-gold: emoji.

I created a MessagesQuery class to help me build a message search query like:

interesting_messages = MessagesQuery
  .new
  .on_channel("dev")
  .sent_after("2023-04-12")

Due to some limitations of the Slack API, I had to fetch the messages in three different requests. It was a bit slow, but not too bad:

def search_interesting_messages
  til_messages = @slack.search_messages(
    query: interesting_messages.with_topic("TIL")
  )
  tip_messages = @slack.search_messages(
    query: interesting_messages.with_topic("tip")
  )
  hand_picked_messages = @slack.search_messages(
    query: interesting_messages.with_reaction("rupee-gold")
  )

  til_messages + tip_messages + hand_picked_messages
end

After that, I would grab those messages, extract the text, author, and permalink and format as a very simple Markdown file. Then, I’d go manually through each message, read it, summarize it, choose tags, think of a title and then publish the article.

I’m Too Lazy For This

For a while, that was it. I’d run the script, manually create the article, and open a PR to our blog repo. That was just too much work! My developer brain was begging for automation. I immediately thought of using an LLM to summarize the messages, generate titles, and extract topics! OpenAI had several APIs available, so it was an easy choice for me.

Before I started, I didn’t want to tie the app to a particular vendor, so I developed the concept of a BlogPost::Writer. The BlogPost class would delegate all that manual work I used to do to a writer object it would receive on initialization. That’s a case of the Strategy pattern (with an immutable strategy).

Here’s an example of how it generates a highlight from a message:

class BlogPost
  def initialize(messages, writer:)
    @messages = messages
    @writer = writer
  end

  def highlight_from(message)
    <<~MARKDOWN
      ## #{@writer.give_title_to(message)}

      #{@writer.summarize(message)}
    MARKDOWN
  end
end

The writer is now the one responsible for generating a title, summary, and extracting relevant topics from a message. Since Ruby doesn’t have interfaces, I decided to codify that protocol in a shared RSpec example.

RSpec.shared_examples "a blog post writer" do
  it {
    expect(writer_instance).to
      respond_to(:extract_topics_from).with(1).argument
  }
  it {
    expect(writer_instance).to
      respond_to(:give_title_to).with(1).argument
  }
  it {
    expect(writer_instance).to
      respond_to(:summarize).with(1).argument
  }
end

The old behavior, i.e., only returning a message as is (not summarized) was moved to a writer class called BlogPost::SimpleWriter.

Artificial Intelligence To The Rescue

Now that I had a blog post writer protocol, I could create a new writer with OpenAI. I used the ruby-openai gem and implemented the interface in no time.

This class is quite simple because ChatGPT itself is doing all the heavy lifting. One detail I added was a fallback to the SimpleWriter if the the call to the OpenAI API fails for some reason. It enables the app to keep running even if the ChatGPT is down or in case one of the requests can’t complete.

Here’s how it extracts topics from a message:

def extract_topics_from(message)
  topics_json = ask_openai <<~PROMPT
    Extract the 3 most relevant topics, if possible in one word,
    from this text as a single parseable JSON array: #{message[:text]}
  PROMPT

  if (topics = try_parse_json(topics_json))
    topics
  else
    # case we can't parse the JSON, fallback to the simple writer
    fallback_topics_for(message)
  end
end

Boom! Now, I let AI do the hard work for me. I still have to read the messages to check if the content is correct, but I don’t have to think about titles, summaries, or topics anymore.

Also, because we support multiple strategies, I could create new writers for any other tools like Google’s Bard AI or even a self-hosted AI like Dolly.

Wait For It

Everything was working fine, and I did many editions of This week in #dev, but there was one problem with the OpenAI writer: it was slow. For each Slack message, we had to issue three API requests (summary, title, and topics) and network calls are slow. On top of that, all the requests were made sequentially, so it could take anywhere between 20 to 60 seconds to generate a blog post with four messages. That was even worse on peak hours for the ChatGPT API (not to mention those Slack API calls).

Those are key points, though. This app was IO-bound: most of the time, it was waiting for an HTTP request to complete. A perfect candidate for async Ruby.

The method for generating a highlight from a message, for instance, now looks like this:

def highlight_from(message)
  title_task = Async { @writer.give_title_to(message) }
  summary_task = Async { @writer.summarize(message) }

  <<~MARKDOWN
    ## #{title_task.wait}

    #{summary_task.wait}
  MARKDOWN
end

It’s cool that none of the other code had to change, and because I added all the async infrastructure to the BlogPost class, every writer now runs asynchronously! I even added tests to ensure all the writer calls run concurrently. While at it, I also made the Slack API calls async, so the app searches messages in parallel.

The total time was reduced to less than a fourth of what it was before, a massive win!

Other Goodies

There are a few other minor things I did in this app that are worth mentioning:

Monads: I used the dry-monads gem to handle errors gracefully. It helped me to structure the code in a railway-oriented way, which I find much easier to maintain than exceptions (in particular, when dealing with Threads).
Dependency injection: I did a fair amount of dependency injection in this app. Because I was doing TDD, it made testing much easier, especially when dealing with code that interacts with external services.
Zeitwerk: I used the Zeitwerk gem to load all the app code. It avoids all those manual requires and keeps the code organized in the same way the files are arranged in the file system (like we do in Rails apps). All that for a single Zeitwerk::Loader.for_gem.setup call? Love it!
App setup: I created a bin/setup script to install all the dependencies and set up the app. It helps new developers get started quickly and is a nice form of documentation.

Next steps

There’s still a lot of room for improvement in Gold Miner, but since I’m the only user, I’ve been taking it slow. One area I’d like to improve is making Gold Miner open a PR automatically for our blog repo, and adding each of the message authors as reviewers. Some parts of the code that could be better encapsulated and organized, but it has been good enough for me so far, so I didn’t bother.