Robby on Rails: When TSearch2 Met AJAXthoughts.sort_by{|t| t[:topic]}.collect tag:www.robbyonrails.com,2005:TypoTypo2006-09-05T22:12:43-04:00Robby Russellurn:uuid:58dfd916704b1d543e5e6dcdcde99efb2005-08-21T21:48:00-04:002006-09-05T22:12:43-04:00When TSearch2 Met AJAX<p>Last night, a local <span class="caps">PDX</span>.rb-ist, asked about full text searching in <a href="http://www.postgresql.org">PostgreSQL</a>. I pointed him to <a href="http://rubyurl.com/4g6">TSearch2</a>, which is a nice little addon to handle full text searching with indexing, ranking, highlighting, etc. To my knowledge, it’s the closest to a google-like search that you can get with PostgreSQL. Some people in #postgresql (irc.freenode.net), said that you can build custom functions that will allow you to quote content, and do other fun stuff within your search string. We can discuss that another time.</p>
<p>After thinking it over, I thought, “why not put ajax on top of a full text search and see what it can do?”</p>
<p>The first question, where was I going to get a bunch of content that I could search through and have it be somewhat meaningful for the public, if I decide to put it up as a demo page. The RubyOnRails mailing list came to mind, so after seeing that I couldn’t download the full archive from the rails mailman page (at least not that I could tell), I decided that I would just import my Maildir for that mailing list.</p>
<p>This added another initial step. What would be a good way to import the 13,000~ emails that I had in the folder?</p>
<p>I knew that worst case, I could find a module on <span class="caps">CPAN</span> and build a perl script to import it… since I didn’t see anything in the standard ruby library. Then I found <a href="http://rubyurl.com/kwO">TMail</a>. Someone said that they think ActionMailer uses TMail as well.</p>
<p>The resulting quick and dirty script became:</p>
<div class="typocode"><pre><code class="typocode_ruby "><span class="comment">#!/usr/bin/env ruby</span>
<span class="ident">require</span> <span class="punct">'</span><span class="string">tmail</span><span class="punct">'</span>
<span class="ident">require</span> <span class="punct">'</span><span class="string">rubygems</span><span class="punct">'</span>
<span class="ident">require</span> <span class="punct">'</span><span class="string">postgres</span><span class="punct">'</span>
<span class="ident">require</span> <span class="punct">'</span><span class="string">dbi</span><span class="punct">'</span>
<span class="ident">conn</span> <span class="punct">=</span> <span class="constant">DBI</span><span class="punct">.</span><span class="ident">connect</span><span class="punct">("</span><span class="string">DBI:Pg:database=rails_mailinglist;host=localhost;port=5403</span><span class="punct">",</span> <span class="punct">"</span><span class="string">username</span><span class="punct">",</span> <span class="punct">"</span><span class="string">password</span><span class="punct">"</span> <span class="punct">)</span>
<span class="constant">MAILBOX</span> <span class="punct">=</span> <span class="punct">"</span><span class="string">.MailingLists.Ruby.RubyOnRails</span><span class="punct">"</span>
<span class="ident">sql</span> <span class="punct">=</span> <span class="punct">"</span><span class="string">INSERT INTO archives (sender, recipient, subject, body) VALUES (?,?,?,?)</span><span class="punct">"</span>
<span class="attribute">@sth</span> <span class="punct">=</span> <span class="ident">conn</span><span class="punct">.</span><span class="ident">prepare</span><span class="punct">(</span><span class="ident">sql</span><span class="punct">)</span>
<span class="ident">box</span> <span class="punct">=</span> <span class="constant">TMail</span><span class="punct">::</span><span class="constant">Maildir</span><span class="punct">.</span><span class="ident">new</span><span class="punct">(</span><span class="constant">MAILBOX</span><span class="punct">)</span>
<span class="ident">box</span><span class="punct">.</span><span class="ident">each</span> <span class="keyword">do</span> <span class="punct">|</span><span class="ident">port</span><span class="punct">|</span>
<span class="ident">mail</span> <span class="punct">=</span> <span class="constant">TMail</span><span class="punct">::</span><span class="constant">Mail</span><span class="punct">.</span><span class="ident">new</span><span class="punct">(</span><span class="ident">port</span><span class="punct">)</span>
<span class="ident">p</span> <span class="ident">mail</span><span class="punct">.</span><span class="ident">subject</span>
<span class="attribute">@sth</span><span class="punct">.</span><span class="ident">execute</span><span class="punct">(</span><span class="ident">mail</span><span class="punct">.</span><span class="ident">from</span><span class="punct">,</span> <span class="ident">mail</span><span class="punct">.</span><span class="ident">to</span><span class="punct">,</span> <span class="ident">mail</span><span class="punct">.</span><span class="ident">subject</span><span class="punct">,</span> <span class="ident">mail</span><span class="punct">.</span><span class="ident">body</span><span class="punct">)</span>
<span class="keyword">end</span>
<span class="ident">exit</span>
</code></pre></div>
<p>Not rocket science. :-)</p>
<p>Okay, so I let that start running through the mailing list emails that I have, and opened up another tab in <a href="http://iterm.sourceforge.net/">iTerm</a> and typed our friend, <code>rails archives</code> followed by <code>cd archives</code>. The next step was to modify the <code>config/database.yml</code> file.</p>
<p>(you all know how to do that, right?)</p>
<p>Okay, you should still be with me…so far.</p>
<p>After I got my database settings in place, I ran <code>./script/generate scaffold Archive</code> and watched it created my new filles to play with.</p>
<code>./script/server</code> and I am looking at the first several emails that are in my RubyOnRails mailing list folder. I notice that the first one is the confirmation email from the day that I signed up on the mailing list. <b>Mon, 24 Jan 2005 16:00:14 +0000 (GMT) </b>. So, I delete that email and the ‘welcome to..’ one so that no one sees my mailman password/confirm info. ;-)
<p><b>Installation</b></p>
<p>So, Rails has no problem with the data. So, I then head over to the <a href="http://rubyurl.com/4g6">Tsearch2</a> site and look for some installation information. I walked through <a href="http://rubyurl.com/Crk">this walkthrough</a></p>
<p><b>Database Structure</b></p>
<p>For this example, I kept it pretty simple for the database structure. I believe the create script was:</p>
<code>
<pre>
CREATE TABLE archives (
id SERIAL PRIMARY KEY,
sender VARCHAR(255),
recipient VARCHAR(255),
subject VARCHAR(255),
body TEXT
);
</pre>
</code>
<p>The rest was basically following through with those steps and building the triggers and functions around the <code>subject</code> and <code>body</code> fields in the table.</p>
<p>To use the tsearch2 functionality, I used <code>find_by_sql</code> rather than using just <code>find</code>.</p>
<code>
<pre>
@archives = Archive.find_by_sql("SELECT id, headline(body,q) as headline, body, rank(idxfti,q) as rank, sender, subject FROM archives, to_tsquery('#{@str}') AS q WHERE idxfti @@ q ORDER BY rank(idxfti,q) DESC LIMIT 100")
</pre>
</code>
<p>The <code>@str</code> variable is a value that I build based on the string(s) that the user is typing in the search field. <b>Tsearch2</b> requires that you sepeare each string with a pipe (<code>|</code>). So, I put in a few checks on the string that was being passed to my method in my controller by <span class="caps">AJAX</span>. (I’ll let you take the time to figure out how to get <span class="caps">AJAX</span> in Rails working and watching a text field… it’s not hard to find info on google. ) :-)</p>
<p><b>The end result?</b></p>
<p>I will warn you that this does’t work in all browsers, some IE people said they had issues… and I spent enough time tinkering with it to just settle with this for now. :-)</p>
<p>I present… <a href="http://railslist.robbyonrails.com/archives/">fulltext searching with PostgreSQL on Rails</a>.</p>
<p>There are approx 13,000 emails in the system, so I put a limit on the number of responses that show up to 100.</p>
<p><b>My Thoughts</b></p>
<p>Well, it was an interesting concept. I’m not a big fan of livesearching, it doesn’t really seem to buy us much when working with this sort of data. I do find live auto-completion to be quite useful though. It’s not practical to have <span class="caps">AJAX</span> peg the database every second as I type for new content and it’s obvious that a database with that much content is not going to respond as snappy as you would hope. However, I decided to compare the speed to searching in Thunderbird and Evolution. From my sophesticated benchmarking suite (my imaginary stop watch)...</p>
<p><b><span class="caps">AJAX</span> won!</b></p>
<p>okay, I should be fair and say, <b>Tsearch2 won</b> as it is doing all the heavy lifting.</p>
<p>Enjoy!</p><p>Last night, a local <span class="caps">PDX</span>.rb-ist, asked about full text searching in <a href="http://www.postgresql.org">PostgreSQL</a>. I pointed him to <a href="http://rubyurl.com/4g6">TSearch2</a>, which is a nice little addon to handle full text searching with indexing, ranking, highlighting, etc. To my knowledge, it’s the closest to a google-like search that you can get with PostgreSQL. Some people in #postgresql (irc.freenode.net), said that you can build custom functions that will allow you to quote content, and do other fun stuff within your search string. We can discuss that another time.</p>
<p>After thinking it over, I thought, “why not put ajax on top of a full text search and see what it can do?”</p>
<p>The first question, where was I going to get a bunch of content that I could search through and have it be somewhat meaningful for the public, if I decide to put it up as a demo page. The RubyOnRails mailing list came to mind, so after seeing that I couldn’t download the full archive from the rails mailman page (at least not that I could tell), I decided that I would just import my Maildir for that mailing list.</p>
<p>This added another initial step. What would be a good way to import the 13,000~ emails that I had in the folder?</p>
<p>I knew that worst case, I could find a module on <span class="caps">CPAN</span> and build a perl script to import it… since I didn’t see anything in the standard ruby library. Then I found <a href="http://rubyurl.com/kwO">TMail</a>. Someone said that they think ActionMailer uses TMail as well.</p>
<p>The resulting quick and dirty script became:</p>
<div class="typocode"><pre><code class="typocode_ruby "><span class="comment">#!/usr/bin/env ruby</span>
<span class="ident">require</span> <span class="punct">'</span><span class="string">tmail</span><span class="punct">'</span>
<span class="ident">require</span> <span class="punct">'</span><span class="string">rubygems</span><span class="punct">'</span>
<span class="ident">require</span> <span class="punct">'</span><span class="string">postgres</span><span class="punct">'</span>
<span class="ident">require</span> <span class="punct">'</span><span class="string">dbi</span><span class="punct">'</span>
<span class="ident">conn</span> <span class="punct">=</span> <span class="constant">DBI</span><span class="punct">.</span><span class="ident">connect</span><span class="punct">("</span><span class="string">DBI:Pg:database=rails_mailinglist;host=localhost;port=5403</span><span class="punct">",</span> <span class="punct">"</span><span class="string">username</span><span class="punct">",</span> <span class="punct">"</span><span class="string">password</span><span class="punct">"</span> <span class="punct">)</span>
<span class="constant">MAILBOX</span> <span class="punct">=</span> <span class="punct">"</span><span class="string">.MailingLists.Ruby.RubyOnRails</span><span class="punct">"</span>
<span class="ident">sql</span> <span class="punct">=</span> <span class="punct">"</span><span class="string">INSERT INTO archives (sender, recipient, subject, body) VALUES (?,?,?,?)</span><span class="punct">"</span>
<span class="attribute">@sth</span> <span class="punct">=</span> <span class="ident">conn</span><span class="punct">.</span><span class="ident">prepare</span><span class="punct">(</span><span class="ident">sql</span><span class="punct">)</span>
<span class="ident">box</span> <span class="punct">=</span> <span class="constant">TMail</span><span class="punct">::</span><span class="constant">Maildir</span><span class="punct">.</span><span class="ident">new</span><span class="punct">(</span><span class="constant">MAILBOX</span><span class="punct">)</span>
<span class="ident">box</span><span class="punct">.</span><span class="ident">each</span> <span class="keyword">do</span> <span class="punct">|</span><span class="ident">port</span><span class="punct">|</span>
<span class="ident">mail</span> <span class="punct">=</span> <span class="constant">TMail</span><span class="punct">::</span><span class="constant">Mail</span><span class="punct">.</span><span class="ident">new</span><span class="punct">(</span><span class="ident">port</span><span class="punct">)</span>
<span class="ident">p</span> <span class="ident">mail</span><span class="punct">.</span><span class="ident">subject</span>
<span class="attribute">@sth</span><span class="punct">.</span><span class="ident">execute</span><span class="punct">(</span><span class="ident">mail</span><span class="punct">.</span><span class="ident">from</span><span class="punct">,</span> <span class="ident">mail</span><span class="punct">.</span><span class="ident">to</span><span class="punct">,</span> <span class="ident">mail</span><span class="punct">.</span><span class="ident">subject</span><span class="punct">,</span> <span class="ident">mail</span><span class="punct">.</span><span class="ident">body</span><span class="punct">)</span>
<span class="keyword">end</span>
<span class="ident">exit</span>
</code></pre></div>
<p>Not rocket science. :-)</p>
<p>Okay, so I let that start running through the mailing list emails that I have, and opened up another tab in <a href="http://iterm.sourceforge.net/">iTerm</a> and typed our friend, <code>rails archives</code> followed by <code>cd archives</code>. The next step was to modify the <code>config/database.yml</code> file.</p>
<p>(you all know how to do that, right?)</p>
<p>Okay, you should still be with me…so far.</p>
<p>After I got my database settings in place, I ran <code>./script/generate scaffold Archive</code> and watched it created my new filles to play with.</p>
<code>./script/server</code> and I am looking at the first several emails that are in my RubyOnRails mailing list folder. I notice that the first one is the confirmation email from the day that I signed up on the mailing list. <b>Mon, 24 Jan 2005 16:00:14 +0000 (GMT) </b>. So, I delete that email and the ‘welcome to..’ one so that no one sees my mailman password/confirm info. ;-)
<p><b>Installation</b></p>
<p>So, Rails has no problem with the data. So, I then head over to the <a href="http://rubyurl.com/4g6">Tsearch2</a> site and look for some installation information. I walked through <a href="http://rubyurl.com/Crk">this walkthrough</a></p>
<p><b>Database Structure</b></p>
<p>For this example, I kept it pretty simple for the database structure. I believe the create script was:</p>
<code>
<pre>
CREATE TABLE archives (
id SERIAL PRIMARY KEY,
sender VARCHAR(255),
recipient VARCHAR(255),
subject VARCHAR(255),
body TEXT
);
</pre>
</code>
<p>The rest was basically following through with those steps and building the triggers and functions around the <code>subject</code> and <code>body</code> fields in the table.</p>
<p>To use the tsearch2 functionality, I used <code>find_by_sql</code> rather than using just <code>find</code>.</p>
<code>
<pre>
@archives = Archive.find_by_sql("SELECT id, headline(body,q) as headline, body, rank(idxfti,q) as rank, sender, subject FROM archives, to_tsquery('#{@str}') AS q WHERE idxfti @@ q ORDER BY rank(idxfti,q) DESC LIMIT 100")
</pre>
</code>
<p>The <code>@str</code> variable is a value that I build based on the string(s) that the user is typing in the search field. <b>Tsearch2</b> requires that you sepeare each string with a pipe (<code>|</code>). So, I put in a few checks on the string that was being passed to my method in my controller by <span class="caps">AJAX</span>. (I’ll let you take the time to figure out how to get <span class="caps">AJAX</span> in Rails working and watching a text field… it’s not hard to find info on google. ) :-)</p>
<p><b>The end result?</b></p>
<p>I will warn you that this does’t work in all browsers, some IE people said they had issues… and I spent enough time tinkering with it to just settle with this for now. :-)</p>
<p>I present… <a href="http://railslist.robbyonrails.com/archives/">fulltext searching with PostgreSQL on Rails</a>.</p>
<p>There are approx 13,000 emails in the system, so I put a limit on the number of responses that show up to 100.</p>
<p><b>My Thoughts</b></p>
<p>Well, it was an interesting concept. I’m not a big fan of livesearching, it doesn’t really seem to buy us much when working with this sort of data. I do find live auto-completion to be quite useful though. It’s not practical to have <span class="caps">AJAX</span> peg the database every second as I type for new content and it’s obvious that a database with that much content is not going to respond as snappy as you would hope. However, I decided to compare the speed to searching in Thunderbird and Evolution. From my sophesticated benchmarking suite (my imaginary stop watch)...</p>
<p><b><span class="caps">AJAX</span> won!</b></p>
<p>okay, I should be fair and say, <b>Tsearch2 won</b> as it is doing all the heavy lifting.</p>
<p>Enjoy!</p>