daily_sports.yamlへの追加とCoreへのパッチ

先日追加されたデイリースポーツ用のFeedUpgraderですが、阪神タイガースのニュースだけ取り出せるようにCoreを含めて拡張してみました。

まずはdaily_sports.yamlへのパッチ。Subscription::Configへhttp://www.daily.co.jp/tigers/index.shtmlを登録してください。

 Index: svn/plagger/assets/plugins/filter-entryfulltext/daily_sports.yaml

 ===================================================================

 --- svn/plagger/assets/plugins/filter-entryfulltext/daily_sports.yaml  (リビジョン 560)

 +++ svn/plagger/assets/plugins/filter-entryfulltext/daily_sports.yaml  (作業コピー)

 @@ -1,5 +1,8 @@

  # upgrade Google News Sports feeds

  author: Tatsuhiko Miyagawa

 +custom_feed_handle: http://www\.daily\.co\.jp/tigers/

 +custom_feed_follow_link: /baseball/\d{4}/\d\d/\d\d/\d{6}.shtml

 +custom_feed_search_area: bar_tigers_peper.gif.*?\)((?:<table).*?(?:table>))(

  handle: http://www\.daily\.co\.jp/.*?\.shtml

  extract: <!-- google_ad_section_start=s1 -->\s*(<.*?)<!-- google_ad_section_end=s1 -->

  extract_capture: body

上記でcustom_feed_search_areaが今回拡張した部分です。これで取り込む範囲を指定しています。この部分がないと同じURLを何度も取り込んでしまいます。

同じURLを取り込まないようにするには、この方法のほかに２回目からのURLをはじくか、最後に出現したURLまで上書きする方法が考えられます。

この方法であれば、HTMLの構造（テンプレート？）が変わらない限り間違いなく取り込めると思います。Blog等を指定してRecentItemを取り込む*1という使い方もできるかと思いますので、ほかのページ（FeedUpgrader)にも恩恵があると思います。

以下がCoreへのパッチになります。

 Index: svn/plagger/lib/Plagger/Plugin/Filter/EntryFullText.pm

 ===================================================================

 --- svn/plagger/lib/Plagger/Plugin/Filter/EntryFullText.pm (リビジョン 560)

 +++ svn/plagger/lib/Plagger/Plugin/Filter/EntryFullText.pm (作業コピー)

 @@ -88,6 +88,7 @@

      my $handler = first { $_->custom_feed_handle($args) } @{ $self->{plugins} };

      if ($handler) {

          $args->{match} = $handler->custom_feed_follow_link;

 +        $args->{area}  = $handler->custom_feed_search_area;

          return $self->Plagger::Plugin::CustomFeed::Simple::aggregate($context, $args);

      }

  }

 @@ -158,6 +159,7 @@

  sub new { bless {}, shift }

  sub custom_feed_handle { 0 }

  sub custom_feed_follow_link { }

 +sub custom_feed_search_area { }

  sub handle_force { 0 }

  sub handle { 0 }
 @@ -196,6 +198,10 @@

      $_[0]->{custom_feed_follow_link};

  }
 +sub custom_feed_search_area {

 +    $_[0]->{custom_feed_search_area};

 +}

 +

  sub handle_force {

      my($self, $args) = @_;

      $self->{handle_force}

 Index: svn/plagger/lib/Plagger/Plugin/CustomFeed/Simple.pm

 ===================================================================

 --- svn/plagger/lib/Plagger/Plugin/CustomFeed/Simple.pm    (リビジョン 560)

 +++ svn/plagger/lib/Plagger/Plugin/CustomFeed/Simple.pm    (作業コピー)

 @@ -20,6 +20,9 @@
      if (my $match = $args->{feed}->meta->{follow_link}) {

          $args->{match} = $match;

 +        if(my $area = $args->{feed}->meta->{follow_link}){

 +            $args->{area}  = $area

 +        }

          return $self->aggregate($context, $args);

      }
 @@ -49,6 +52,9 @@
      my $re = $args->{match}; +    if($content =~ /$args->{area}/s){

 +       $content = $1;

 +    }

      my $parser = HTML::TokeParser->new(\$content);

      while (my $token = $parser->get_tag('a')) {

          next unless $token->[0] eq 'S' || $token->[1]->{href} =~ /$re/;

perlのことをよく解ってないので、もっとスマートな解決方法があるかとは思いますが。

*1:BlogはほとんどRSSを配布しているでしょうが。