VoxのQotDを取得するCustomFeed::Script

いちいちVoxにログインしなくてもQotDがわかるようにPlagger用のCF::Scriptにしてみました。といっても、そのうちVoxでFeed出しそうですが。とりあえずWWW::MechanizeとHTML::TreeBuilder::XPathの練習ということで。

追記)IRCでmiyagawaさんよりhttp://questions.vox.com/library/posts/atom.xmlhttp://questions-jp.vox.com/library/posts/atom.xmlがあると教えてもらいました。

WWW::Mechanize面白いです。WWW::BugMeNotと合わせれば、(ry


このまま実行するとHTML/PullParser.pmが


Parsing of undecoded UTF-8 will give garbage when decoding entities

なんてエラーを表示しますので、気になる人は`/user/bin/perl'で実行してください。

assets/plugins/CustomFeed-Script/vox_QotD.pl

 #!/usr/bin/perl -w

use utf8;
use strict;
use WWW::Mechanize;
use HTML::TreeBuilder::XPath;
use Encode;
use YAML;
use DateTime::Format::W3CDTF;

my $email = 'Mail address';
my $password = 'password';

my $url = 'http://vox.com/';

my $mech = WWW::Mechanize->new();
$mech->agent_alias('Windows IE 6');

$mech->get($url);
$mech->follow_link(url_regex => qr!signin!);

$mech->form_number(0);
$mech->field(username => $email);
$mech->field(password => $password);
$mech->click();

$mech->follow_link(url_regex => qr/http:\/\/www\.vox\.com\/home/);

my $tree = HTML::TreeBuilder::XPath->new;
my $html = decode('utf-8', $mech->content());
$tree->parse($html);
$tree->eof;

my $feed = {
title => 'Vox - QotD',
link => $url,
};

my @questions = $tree->findnodes(q(//div[@class="question"]|//div[@class="question hidden"]));

for (my $i = 0; $i < @questions; $i++) {
my $text = $tree->findnodes(q(//div[@class="question-text"]))->[$i];
my $xid = $tree->findnodes(q(//input[@name="xid"]))->[$i]->attr_get_i('value');

(my $title = $text->as_text )=~ s!^(.*)(提案:|Submitted by).*$!$1!;
my $date = DateTime->today(time_zone => 'Asia/Tokyo',);
push @{$feed->{entry}}, {
title => $title,
body => $text->as_HTML,
link => $url.'compose/?xid='.$xid.'&rule=qotd',
date => DateTime::Format::W3CDTF->format_datetime($date),
}
}

binmode STDOUT, ":utf8";
print YAML::Dump $feed;

config.yaml

 global:

plugins:
- module: CustomFeed::Script
- module: Subscription::Config
config:
feed:
- script:/path/to/assets/plugins/CustomFeed-Script/vox_QotD.pl
- module: Filter::Rule
rule:
module: Deduped
compare_body: 1
- module: Filter::BreakEntriesToFeeds
config:
use_entry_title: 1
- module: Publish::Gmail
config:
mailfrom: mail@address