<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="/stylesheets/rss.css" type="text/css"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/">
  <channel>
    <title>Robby on Rails: Tag _why</title>
    <link>http://www.robbyonrails.com/articles/tag/_why</link>
    <language>en-us</language>
    <ttl>40</ttl>
    <description>thoughts.sort_by{|t| t[:topic]}.collect </description>
    <item>
      <title>Get to Know a Gem: Hpricot</title>
      <description>&lt;p&gt;In this new series, &lt;em&gt;Get to Know a Gem&lt;/em&gt;, we&amp;#8217;re going to take a look at &lt;a href="http://code.whytheluckystiff.net/hpricot/"&gt;hpricot&lt;/a&gt;.&lt;/p&gt;


	&lt;h2&gt;What is Hpricot?&lt;/h2&gt;


	&lt;p&gt;WhyTheLuckyStiff &lt;a href="http://redhanded.hobix.com/inspect/hpricot01.html"&gt;released Hpricot in July of 2006&lt;/a&gt; in an effort to bring fast &lt;span class="caps"&gt;HTML&lt;/span&gt; parsing to the masses. It&amp;#8217;s currently unknown what prompted it, but my guess would be that Why is secretly scraping all the pages on the internet that archive the future. To speed it up, Why has written the Hpricot scanner in C, to be much faster than the other options available in Ruby.&lt;/p&gt;


	&lt;h2&gt;Installation&lt;/h2&gt;


	&lt;p&gt;This process&amp;#8230; is as always with most gems, very simple.&lt;/p&gt;


&lt;pre&gt;&lt;code&gt;
$ sudo gem install hpricot
Password:
Need to update 23 gems from http://gems.rubyforge.org
.......................
complete
Select which gem to install for your platform (powerpc-darwin8.7.0)
 1. hpricot 0.5 (ruby)
 2. hpricot 0.5 (mswin32)
 3. hpricot 0.4 (mswin32)
 4. hpricot 0.4 (ruby)
 5. Cancel installation
&amp;gt; 1
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;Great, let&amp;#8217;s now play with it!&lt;/p&gt;


	&lt;h2&gt;Usage&lt;/h2&gt;


	&lt;p&gt;In this first example, we&amp;#8217;re going to use Hpricot to parse a web page through the Open-URI library. For this, we&amp;#8217;ll need to require a few libs.&lt;/p&gt;


&lt;pre&gt;&lt;code&gt;
require 'rubygems'
require 'hpricot'
require 'open-uri'
&lt;/code&gt;&lt;/pre&gt;

Now that we have the libraries loaded, we can create a new Hpricot object and in this example, we&amp;#8217;ll load the &lt;a href="http://www.planetargon.com/about.html"&gt;&lt;span class="caps"&gt;PLANET ARGON&lt;/span&gt; About page&lt;/a&gt;.
&lt;pre&gt;&lt;code&gt;
# Open the PLANET ARGON about page
page = Hpricot( open( 'http://www.planetargon.com/about.html' ) )    
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;Great, let&amp;#8217;s have some parsing fun. Let&amp;#8217;s parse for the first instance of a &lt;code&gt;div&lt;/code&gt; with a class name of &lt;code&gt;team&lt;/code&gt;. Hpricot will return array of elements that meet your search request.&lt;/p&gt;


&lt;pre&gt;&lt;code&gt;
page.search( "//div[@class='team']" ).size 
=&amp;gt; 7    
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;Great, this is a good sign that I need to add several people to the website. :-)&lt;/p&gt;


	&lt;p&gt;If we want to peak at the first instance of this class, we can do:&lt;/p&gt;


&lt;pre&gt;&lt;code&gt;
page.search( "//div[@class='team']" ).first
=&amp;gt; {elem &amp;lt;div class="team"&amp;gt; "\n" {elem &amp;lt;div class="team_name"&amp;gt; {elem &amp;lt;strong&amp;gt; "Robby Russell" &amp;lt;/strong&amp;gt;} ", Founder &amp;amp;#38; Executive Director" &amp;lt;/div&amp;gt;}    ....SNIP
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;You&amp;#8217;ll notice that there is a &amp;lt;strong&amp;gt; element within the results, which we can search deeper into this tree.&lt;/p&gt;


&lt;pre&gt;&lt;code&gt;
page.search( "//div[@class='team']" ).first.search( "//strong" )
=&amp;gt; #&amp;lt;Hpricot::Elements[{elem &amp;lt;strong&amp;gt; "Robby Russell" &amp;lt;/strong&amp;gt;}]&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;Hpricot provides a method named &lt;code&gt;inner_html&lt;/code&gt;, which will return the contents within the element.&lt;/p&gt;


&lt;pre&gt;&lt;code&gt;
page.search( "//div[@class='team']" ).first.search( "//strong" ).inner_html
=&amp;gt; "Robby Russell" 
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;Let&amp;#8217;s now iterate through each of the elements and output all of the team member names.&lt;/p&gt;


&lt;pre&gt;&lt;code&gt;
# search for each team member div and iterate through them
page.search( "//div[@class='team']" ).each do |team|
  puts team.search( "//strong").inner_html
end    

Robby Russell
Allison Beckwith
Brian Ford
Nicole Fritz
Alain Bloch
Audrey Eschright
Gary Blessington
&lt;/code&gt;&lt;/pre&gt;

	&lt;p&gt;So, there you have it. A quick and basic introduction into using Hpricot for parsing &lt;span class="caps"&gt;HTML&lt;/span&gt; content. You can use Hpricot for a wide variety of structured data, such as &lt;span class="caps"&gt;XML&lt;/span&gt; and &lt;span class="caps"&gt;CSS&lt;/span&gt;. For more examples, please visit the &lt;a href="http://code.whytheluckystiff.net/hpricot/wiki/HpricotBasics"&gt;HpricotBasics&lt;/a&gt; page.&lt;/p&gt;


	&lt;h2&gt;Final Thoughts&lt;/h2&gt;


	&lt;p&gt;I&amp;#8217;m going to guess that Why built this for &lt;a href="http://hoodwink.d"&gt;hoodwink.d&lt;/a&gt;, which I&amp;#8217;ve been a regular user of for a &lt;a href="http://www.robbyonrails.com/articles/2005/08/23/boys-from-the-hoodwink-d"&gt;long time&lt;/a&gt;. I haven&amp;#8217;t spent much time playing with the &lt;a href="http://www.w3.org/TR/xpath"&gt;XPath syntax&lt;/a&gt; and playing around with Hpricot has given me a much better understanding of it.&lt;/p&gt;


	&lt;p&gt;As mentioned at the beginning of this post, I am going to make &lt;em&gt;Getting to Know a Gem&lt;/em&gt; a regular feature on my blog. If you know of a lesser known Gem that needs some attention, please &lt;a href="mailto:suggestions@robbyonrails.com"&gt;send a suggestion to me&lt;/a&gt;.&lt;/p&gt;


	&lt;p&gt;Until next time&amp;#8230;&lt;/p&gt;
</description>
      <pubDate>Tue, 13 Feb 2007 08:48:00 -0600</pubDate>
      <guid isPermaLink="false">urn:uuid:5fe64d73-34ca-4619-90ad-4657c497533f</guid>
      <author>Robby Russell</author>
      <link>http://www.robbyonrails.com/articles/2007/02/13/get-to-know-a-gem-hpricot</link>
      <category>Ruby</category>
      <category>Programming</category>
      <category>hpricot</category>
      <category>xpath</category>
      <category>_why</category>
      <category>rubygems</category>
      <category>open</category>
      <category>uri</category>
    </item>
    <item>
      <title>Try Ruby</title>
      <description>&lt;p&gt;I was lucky enough to see this when it was in the alpha-beta-try-that-again stage.&lt;/p&gt;


	&lt;h2&gt;&lt;a href="http://tryruby.hobix.com"&gt;Try Ruby&lt;/a&gt;&lt;/h2&gt;


	&lt;p&gt;That&amp;#8217;s right&amp;#8230; &lt;a href="http://redhanded.hobix.com/"&gt;_why&lt;/a&gt; has done it again. You might know him as that d00d who made &lt;a href="http://hoodwink.d"&gt;hoodwink.d&lt;/a&gt; or that weird0 who made that &lt;a href="http://poignantguide.net/ruby"&gt;poignant guide&lt;/a&gt;.&lt;/p&gt;


	&lt;p&gt;Have friends who are skeptical of Ruby??? tell them to&amp;#8230; &lt;a href="http://tryruby.hobix.com"&gt;Try Ruby&lt;/a&gt;.&lt;/p&gt;


	&lt;p&gt;Have your parents had a chance to &lt;a href="http://tryruby.hobix.com"&gt;try ruby&lt;/a&gt;?&lt;/p&gt;


	&lt;p&gt;Maybe that weird uncle of yours needs to &lt;a href="http://tryruby.hobix.com"&gt;try ruby&lt;/a&gt;.&lt;/p&gt;


	&lt;p&gt;That guy who sneezed on you on the bus ride&amp;#8230; tell him to &lt;a href="http://tryruby.hobix.com"&gt;try ruby&lt;/a&gt;.&lt;/p&gt;


	&lt;p&gt;I&amp;#8217;m might get some stickers printed and litter Portland with &lt;a href="http://tryruby.hobix.com"&gt;Try Ruby&lt;/a&gt; stickers&amp;#8230;&lt;/p&gt;


	&lt;p&gt;...the revolution begins&amp;#8230; (again)&lt;/p&gt;
</description>
      <pubDate>Tue, 29 Nov 2005 09:18:00 -0600</pubDate>
      <guid isPermaLink="false">urn:uuid:228d7dba5161d1aca505f7880e459576</guid>
      <author>Robby Russell</author>
      <link>http://www.robbyonrails.com/articles/2005/11/29/try-ruby</link>
      <category>Ruby</category>
      <category>Programming</category>
      <category>_why</category>
      <category>ruby</category>
    </item>
    <item>
      <title>Why’s (Poignant) Guide to Ruby in PDF form!</title>
      <description>&lt;p&gt;The famous, &lt;a href="http://poignantguide.net/ruby/"&gt;Why’s (Poignant) Guide to Ruby&lt;/a&gt; has been released as a nicely &lt;a href="http://redhanded.hobix.com/inspect/caughtInMyFiltersTheBestPoignantPdfToDate.html"&gt;formatted &lt;span class="caps"&gt;PDF&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;


	&lt;p&gt;Thanks to &lt;strong&gt;Leon Spencer&lt;/strong&gt; for providing the world with &lt;a href="http://poignantguide.net/ruby/whys-poignant-guide-to-ruby.pdf"&gt;this &lt;span class="caps"&gt;PDF&lt;/span&gt;&lt;/a&gt;.&lt;/p&gt;


	&lt;p&gt;&lt;strong&gt;leon++&lt;/strong&gt;&lt;/p&gt;
</description>
      <pubDate>Tue, 13 Sep 2005 06:57:00 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:88abbdc5ebbc82b335422d753eb78123</guid>
      <author>Robby Russell</author>
      <link>http://www.robbyonrails.com/articles/2005/09/13/why%E2%80%99s-poignant-guide-to-ruby-in-pdf-form</link>
      <category>Ruby</category>
      <category>Programming</category>
      <category>ruby</category>
      <category>book</category>
      <category>_why</category>
    </item>
    <item>
      <title>Boys from the Hoodwink.d</title>
      <description>&lt;p&gt;&lt;img src="http://www.robbyrussell.com/albums/Desktops/hoodwink_d_1.jpg" /&gt;&lt;/p&gt;


	&lt;p&gt;&lt;a href="http://hoodwink.d/"&gt;hoodwink.d&lt;/a&gt; is going to change the world&amp;#8230; or at least how we talk about you when you&amp;#8217;re not looking. :-)&lt;/p&gt;


	&lt;p&gt;&lt;img src="http://www.robbyrussell.com/albums/Desktops/hoodwink_d_2.jpg" /&gt;&lt;/p&gt;


	&lt;p&gt;&lt;strong&gt;hint: _why&lt;/strong&gt;&lt;/p&gt;


	&lt;p&gt;&lt;a href="http://www.robbyrussell.com/albums/Desktops/hoodwink_d_3.jpg"&gt;&lt;img src="http://www.robbyrussell.com/albums/Desktops/hoodwink_d_3.thumb.jpg" /&gt;&lt;/a&gt;
(click to view large version)&lt;/p&gt;
</description>
      <pubDate>Tue, 23 Aug 2005 12:43:00 -0500</pubDate>
      <guid isPermaLink="false">urn:uuid:fe6baed1865c5293f03a18a40af4bd8a</guid>
      <author>Robby Russell</author>
      <link>http://www.robbyonrails.com/articles/2005/08/23/boys-from-the-hoodwink-d</link>
      <category>Ruby</category>
      <category>Programming</category>
      <category>hoodwink</category>
      <category>_why</category>
    </item>
  </channel>
</rss>
