pages tagged nntp::portalENETDOWNhttp://enetdown.org//tag/nntp::portal/ENETDOWNikiwiki2013-08-20T17:02:03Zmotivation and design of NNTP::Portalhttp://enetdown.org//dot-plan/posts/2011/07/02/motivation_and_design_of_nntp-portal/zaki2013-08-20T17:02:03Z2011-07-02T08:32:52Z
<p>I have been recently working on and
<a href="http://www.catb.org/~esr/jargon/html/D/dogfood.html">dogfooding</a> a
project that I call <a href="http://enetdown.org/git/?p=nntp-portal_dogfood">NNTP::Portal</a>. It is
meant to be a way to merge the "oldskool" world of newsreaders with
content retrieved from other sources, mainly the <a href="http://www.w3.org/">World Wide
Web</a>.</p>
<h1 id="background">Background</h1>
<p>Usenet is network originally developed in the early 1980s that provides
a distributed system for the delivery of messages, called articles,
between servers that carry these messages (collectively called a news
feed) in hierarchical groups based on topics, called newsgroups. Quite a
few of the jargon and behaviours of Internet communication were
originated or developed by users of this network. Users of Usenet are
able to read and reply to articles on Usenet using software called
newsreaders. You can see an example of both the messages and interfaces
from the early days of Usenet <a href="http://olduse.net/">here</a>. It is also
worth noting that most of the software that runs the Internet today was
first announced via a Usenet
<a href="http://www.google.com/googlegroups/archive_announce_20.html">posting</a>.</p>
<p>Usenet's distributed nature allows for a mostly-free reign in terms of
content, so there are inevitably problems that arise from having a flood
of articles, many of which could be spam. Newsreaders have developed the
means to address this need by adding features that can filter out
unimportant messages; this is usually accomplished through the use of
<a href="https://secure.wikimedia.org/wikipedia/en/wiki/Scorefile">scoring</a> or
<a href="https://secure.wikimedia.org/wikipedia/en/wiki/Kill_file">kill files</a>.
There are some <a href="http://dx.doi.org/10.1145/192844.192905">other methods</a>
that have been developed, but they are not used as widely.</p>
<p>The Network News Transport Protocol is a standardised protocol developed
in the mid-1980s to facilitate the communication of Usenet traffic both
between peering servers and between a server and a client. It is
described in <a href="http://tools.ietf.org/html/rfc977">RFC 977</a> and has been
<a href="http://tools.ietf.org/html/rfc3977">updated</a> with extensions over the
years.</p>
<h1 id="motivation">Motivation</h1>
<p>Today, in the year 2011, the major protocol used online is by far
<a href="http://tools.ietf.org/html/rfc2616">HTTP</a>, which is the means of
transportation for data on the World Wide Web. As you may be aware, this
data is typically encoded in the HyperText Markup Language, or
<a href="http://www.w3.org/html/">HTML</a>, along with several other technologies
which determine how a Web browser will display the data.</p>
<p>Today's Web sites are moving toward bringing more ways for users to
participate in the generation and consumption of media by bringing the
ability to interact with pages through the use of uploads, collaborative
editing, ratings, recommendations, and comments. However, due to the
client-server nature of HTTP, these interactions generally stay on one
Web service and the only way to access this data is by going to that Web
site. As a response to these incompatibilites between sites, Web
feed<a href="http://enetdown.org//tag/nntp::portal/#fn:aka_newsfeed" id="fnref:aka_newsfeed" class="footnote">1</a> formats and <a href="http://www.programmableweb.com/">APIs</a>
have been used to get different Web services to talk to each
other<a href="http://enetdown.org//tag/nntp::portal/#fn:semantic" id="fnref:semantic" class="footnote">2</a>. Many people have called these tools the pipes of the
Web, in reference to Unix pipes, because they allow you to create a
chain of transformations to get from one set of data to another.</p>
<p>This is great, but many of the efforts I have seen to that go through
the work of moving data around are about either showing a cute
visualisation or providing a way bringing that data to either the
desktop or a mobile device. Each of these efforts have a different
interface as well as different levels of interaction. Few have any
extensibility to speak of and can only do a fixed action with the data.
These are useful, but severely limiting when it comes to what can be
done with a computer. A lack of a standard interface is not only
confusing for users (it takes time to navigate and learn a new
interface), but also negligent of accessibility needs. Missing
extensibility means that any data not exposed via a feed/API will remain
inside the browser or application.</p>
<p>Let us jump back for moment. Now, in the early days of computing, the
fastest way to interact with a computer was through the keyboard. This
is still true today for certain kinds of operations, specifically those
involving text. As such, newsreaders were (and still are) largely
keyboard-driven, allowing users to select and scan through large numbers
of articles efficiently. Typically, these newsreaders had way of
interacting with their environment, by having a built-in scripting
language or pipes. This way, by extending the reach of newsreaders, one
could build a toolkit that worked exactly how you wanted it to. For
example, one could write script that, at a single keystroke, would grab
a source code listing out of an article, compile it, and place it in the
executable path. You could have such a script for every key on your
keyboard. In the end, you are able to create an interface that matches
the way <em>you</em> work.</p>
<p>This is what I want to bring to the Web. I'm not the <a href="http://enetdown.org//dot-plan/posts/2011/07/02/motivation_and_design_of_nntp-portal/#relatedwork">only
one</a>.</p>
<h1 id="design">Design</h1>
<p>I decided to work with the NNTP protocol because</p>
<ol>
<li>It is standardised, therefore I do not need to write a specification
nor a client.</li>
<li>The standard is <a href="http://www.eyrie.org/~eagle/nntp/">simple</a> to
implement.</li>
<li><p>Newsreaders have a history of working with large conversations and
modern newsreaders have excellent threading support. There is also
the very much needed ability to mark messages as read so that they
are out of the way. Another action I find lacking on Web sites is
the ability to postpone a message, i.e. to save a reply so that I can
return to it later. I have lost many lengthy replies because I
accidently switched to another page.</p>
<p>Newsreaders have these features and many more that facilitate two-way
communication.</p></li>
<li>It uses the <a href="http://tools.ietf.org/html/rfc5322">Internet Message Format</a>
which has the advantage of being human-readable and, through
<a href="http://tools.ietf.org/html/rfc2045">MIME</a>, extra data can be
embedded in the message if necessary.</li>
</ol>
<h2 id="tools">Tools</h2>
<dl>
<dt><a href="http://www.perl.org/">Perl</a></dt>
<dd>
A scripting language that has an extensive library of modules
(<a href="http://www.cpan.org/">CPAN</a>), as well as powerful text-processing
tools (most notably, regular expressions).
</dd>
<dt><a href="http://moose.perl.org/">Moose</a></dt>
<dd>
An object framework for Perl that simplifies working with attributes,
roles, and meta-objects.
</dd>
<dt><a href="http://poe.perl.org/">POE</a></dt>
<dd>
A framework for event-driven programming. It abstracts away much of the code
that is common in network programming. I wish I had known about it when I
was first <a href="http://www.kohala.com/start/unpv12e.html">learning</a> the socket
API.
</dd>
<dt><a href="http://dbi.perl.org/">DBI</a></dt>
<dd>
A database abstraction library. It provides a consistent set of function
calls whenever you need to connect, query, and retrieve data from a
database.
</dd>
<dt><a href="http://www.sqlite.org/">SQLite</a></dt>
<dd>
A small SQL database that requires <em>no</em> configuration. I chose SQLite as my
first database backend for this very reason, but I may test other databases
such as BerkeleyDB.
</dd>
<dt><a href="http://p3rl.org/Mail::Box">Mail::Box</a></dt>
<dd>
A module for manipulating message headers and bodies.
</dd>
</dl>
<h2 id="serverlayout">Server layout</h2>
<p>Right now, there is not much to the server. All it does it take NNTP commands
and periodically requests new messages from plugins. I am currently playing
with the idea of using roles to implement extra features for each database
backend, such as the <code>OVERVIEW</code> capability which sends a set of message headers
to the client in a tab-delimited format.</p>
<p>The current server layout is shown below</p>
<p><a href="http://enetdown.org//dot-plan/posts/2011/07/02/gfx/current_server/"><img src="http://enetdown.org//dot-plan/posts/2011/07/02/gfx/current_server.svg" width="400" alt="Current server layout diagram" class="img" /></a></p>
<p>As you can see, it is rather simple, but it works. The only issue is that the
updating is not realtime. One reason for this is that POE uses a single
threaded model. So, future versions of the server will separate the server into
separate threads that will use the message database concurrently.</p>
<p>The future server layout design so far is shown below</p>
<p><a href="http://enetdown.org//dot-plan/posts/2011/07/02/gfx/future_server/"><img src="http://enetdown.org//dot-plan/posts/2011/07/02/gfx/future_server.svg" width="400" alt="Future server layout diagram" class="img" /></a></p>
<p>The purpose of the RPC server is to have a way for a user to query the
state of plugins and retrieve specific information (e.g. notifications)
that can not be done over the NNTP protocol alone (out-of-band).</p>
<p>The idea behind the job queue is that each plugin will be able to post jobs for
plugin-specific workers to process. These jobs will most likely be
scheduled so that they can poll for updates optimally (the specifics of
which I have not worked out yet).</p>
<p>Two parts of the server design that I have not yet figured out are the
message updating and the article posting mechanisms. By message
updating, I mean when the message changes in the original source. Normal
NNTP articles are meant to have immutable bodies, so I will need to see
what is the best way to present these changes to the user as well as how
the plugins will handle them. The other issue, article posting, has many
parts to it, including whether the user has permission to post and how
to indicate this without wasting the user's time.</p>
<h1 id="results">Results</h1>
<p>Well, I have to show what the program's output looks like, so here it
is in the slrn newsreader:</p>
<p><a href="http://enetdown.org//dot-plan/posts/2011/07/02/gfx/slrn_portal.png"><img src="http://enetdown.org//dot-plan/posts/2011/07/02/motivation_and_design_of_nntp-portal/400x-slrn_portal.png" width="399" height="192" alt="Screenshot of slrn showing Facebook posts" class="img" /></a></p>
<p>Currently, the database uses 1.6 KiB/message. I have also noticed that
the Graph API can not retrieve messages from certain people. This is a
<a href="http://forum.developers.facebook.net/viewtopic.php?id=98782">permissions
problem</a>,
so I will need to work on getting a way around it<a href="http://enetdown.org//tag/nntp::portal/#fn:ack_jesus" id="fnref:ack_jesus" class="footnote">3</a>. I would
also like to tag people in posts on Facebook, but the API does not have
a way of doing this, as far as I can tell.</p>
<h1 id="why">Why‽</h1>
<p>Well, I wanted to do some software engineering and I've recently
discovered how much I like writing network software (this is the third
server I've written in the past eight months and the first non-trivial
one). Also, I've been really frustrated with the trends towards moving
applications to the Web. One has to ask, how many of these applications
can you really trust? How many let you see the source to the backend so
that you can evaluate security? Few do<a href="http://enetdown.org//tag/nntp::portal/#fn:open_webservice" id="fnref:open_webservice" class="footnote">4</a>. In addition,
one rarely gets to pull all the data from these services and sometimes
the services have put in place terms of service that are
hostile<a href="http://enetdown.org//tag/nntp::portal/#fn:hostility_example" id="fnref:hostility_example" class="footnote">5</a> to third-party users. I want to see how far this
project will take me.</p>
<p>I may even try to develop my own distributed service on top of NNTP. It
will certainly be more lightweight and extensible than the current
webservice offerings.</p>
<h1 id="relatedwork">Related work</h1>
<p>The following are browser-based tools that make extending web sites in
the browser easy (if you know JavaScript):</p>
<ul>
<li><a href="https://wiki.mozilla.org/Labs/Ubiquity">Ubiquity</a></li>
<li><a href="http://userscripts.org/">Userscripts</a></li>
</ul>
<p>And outside the browser:</p>
<ul>
<li><a href="http://www.getmiro.com/">Miro</a></li>
<li><a href="http://scraperwiki.com/">ScraperWiki</a></li>
<li><a href="http://pipes.yahoo.com/">Yahoo! Pipes</a></li>
</ul>
<p>Organizations:</p>
<ul>
<li><a href="http://dataportability.org/">The DataPortability Project</a></li>
<li><a href="http://linkeddata.org/">Linked Data</a></li>
<li><a href="http://ostatus.org/">OStatus</a></li>
<li><a href="http://www.openwebfoundation.org/">Open Web Foundation</a></li>
<li><a href="http://www.opensocial.org/">OpenSocial</a></li>
<li><a href="http://surfraw.alioth.debian.org/">surfraw</a> :-)</li>
</ul>
<div class="footnotes">
<hr />
<ol>
<li id="fn:aka_newsfeed"><p>Confusingly, these are also called newsfeeds. The
similarity does not end there, as clients that pull Web feeds are
also called news readers. Furthermore, it is entirely
<a href="http://gwene.org/">possible</a> to use a NNTP newsreader to read
Web feeds.<a href="http://enetdown.org//tag/nntp::portal/#fnref:aka_newsfeed" class="reversefootnote"> ↩</a></p></li>
<li id="fn:semantic"><p>There is also work on the <a href="http://semanticweb.org/">Semantic Web</a>
and <a href="http://microformats.org/">microformats</a>, but these are not as widely used
today.<a href="http://enetdown.org//tag/nntp::portal/#fnref:semantic" class="reversefootnote"> ↩</a></p></li>
<li id="fn:ack_jesus"><p>Thanks to <a href="https://twitter.com/j3sus_h">@j3sus_h</a> for
pointing out the source of problem.<a href="http://enetdown.org//tag/nntp::portal/#fnref:ack_jesus" class="reversefootnote"> ↩</a></p></li>
<li id="fn:open_webservice"><p>Unless it is under a license such as the
<a href="http://www.gnu.org/licenses/why-affero-gpl.html">AGPL</a>.<a href="http://enetdown.org//tag/nntp::portal/#fnref:open_webservice" class="reversefootnote"> ↩</a></p></li>
<li id="fn:hostility_example"><p>See
<a href="https://secure.wikimedia.org/wikipedia/en/wiki/CDDB#History">CDDB</a> and
<a href="http://www.wired.com/epicenter/2011/03/twitter-third-party-clients/">Twitter</a>.<a href="http://enetdown.org//tag/nntp::portal/#fnref:hostility_example" class="reversefootnote"> ↩</a></p></li>
</ol>
</div>