Welcome to my homepage. The following are a list of posts from all the subsections of my blog (listed in the sidebar). To see more, go to the archives for each section.

When I saw the CPAN PR Challenge come up in my feed, I signed up immediately. I love giving back to FOSS and this challenge would push me to make contributions that are outside of the usual software that I contribute to.

For January, I was assigned Clone. I looked at the reverse dependencies and saw 181 packages. I immediately thought to myself, "I'd better be careful. I don't want to break things.". This means that testing is very important for this package.

I e-mailed the maintainer of Clone, Breno G. de Oliveira (garu), about my assignment and he shot me back an lengthy e-mail with all the things I could do. Some of them were easy, such as:

  • fix typos,
  • add continuous integration with Travis-CI, code coverage with Coveralls, and adding badges for each of those.

Others were a bit more involved:

  • benchmarking against other packages such as Clone::PP and Storable,
  • adding more tests for different types of Perl variables,
  • go through the bug queue and fixing the open tickets.

I went for the easy ones first. I knew that adding the Travis-CI integration was just a matter of creating a .travis.yml file, but what actually goes in that file can vary quite a deal. I had noticed that haarg had created a set of helper scripts that can grab various pre-built Perl versions and run tests against them all.

I cloned the Clone repository and copied over the example .travis.yml:

language: perl
  - "5.8"                     # normal preinstalled perl
  - "5.8.4"                   # installs perl 5.8.4
  - "5.8.4-thr"               # installs perl 5.8.4 with threading
  - "5.20"                    # installs latest perl 5.20 (if not already available)
  - "blead"                   # install perl from git
    - perl: 5.18
      env: COVERAGE=1         # enables coverage+coveralls reporting
    - perl: "blead"           # ignore failures for blead perl
  - git clone git://github.com/travis-perl/helpers ~/travis-perl-helpers
  - source ~/travis-perl-helpers/init
  - build-perl
  - perl -V
  - build-dist
  - cd $BUILD_DIR             # $BUILD_DIR is set by the build-dist command
  - cpan-install --deps       # installs prereqs, including recommends
  - cpan-install --coverage   # installs converage prereqs, if enabled
  - coverage-setup
  - prove -l -j$((SYSTEM_CORES + 1)) $(test-dirs)   # parallel testing
  - coverage-report

and enabled my fork of Clone in the Travis-CI and Coveralls settings.

After pushing this, the tests ran, but I kept seeing n/a code coverage on Coveralls. I was very confused because the code coverage was working just fine locally. I jumped on IRC and chatted with haarg. He pointed out that I was using prove -l as in the example, but since Clone is a compiled module, I needed to use prove -b.

Oh. Silly me! I had been using prove -b locally, but never changed the .travis.yml file. That serves me right for copying-and-pasting without looking! Something good came out of it though: this ticket for Test::Harness has suggestions that will help catch this error if anyone else makes the same mistake.

haarg also pointed me to an even simpler .travis.yml file that he was working on that just had the lines

  - eval $(curl https://travis-perl.github.io/init) --auto

and a list of the Perl versions to test. I used that and ran it through Travis-CI and everything just worked!

Now all I had to do was grab the HTML for the badges and put them in the POD and Markdown. I went to the Travis-CI and Coveralls pages and copied the Markdown for those badges and then went to http://badge.fury.io/for/pl and entered in Clone to get a version badge for Clone on CPAN.

I then made a few grammar fixes and converted the POD into Markdown for the README and I was done!

The pull request with my changes is at https://github.com/garu/Clone/pull/4 and my changes are in Clone v0.38.

Badges for Clone on GitHub

Badges for Clone on CPAN

Posted Sun Jan 25 04:36:09 2015 Tags:

A while back, I wrote Unicode::Number which was based on libuninum. This is a library that can convert numbers written in various languages to integers and vice versa. I also wrote a library to install libuninum automatically, Alien::Uninum, with the help of Alien::Base.

This all worked quite well, but I wanted to go a step further. libuninum can support the numbers stored with the GNU Multiple Precision Arithmetic Library (libgmp). This allows converting to and from arbitrarily long numbers. To support this, the computer must have libgmp installed.

So I thought to myself, why not write Alien::GMP and install it myself?

Well, Alien::GMP already exists and is authored by Richard Simões, but it bundles an old version of libgmp (v5.0.4). Alien::GMP should be able to download the latest version and install that.

So I created an issue to point out that it needed updates. That led to me getting co-maintainership on the package.

I went ahead and pointed Alien::GMP to the download page for the source code, but it needed HTTPS: https://gmplib.org/download/gmp/. Alien::Base didn't have support for HTTPS, so I added support.

I could finally get back to Alien::GMP.

I cleared out the original code and made Alien::GMP inherit from Alien::Base.

I also added support for using the tool with Inline so that it is easy to compile with other code. I just need to change the tests that look for gmp.h and libgmp.so and the module was good to go. The overall changes can be seen at https://github.com/zmughal/p5-Alien-GMP/compare/zmughal:v0.0.6...v0.0.6_01 and a new dev release of Alien::GMP is at https://metacpan.org/release/ZMUGHAL/Alien-GMP-v0.0.6_01.

Posted Sun Jan 25 03:47:56 2015 Tags:

Love R? Love Perl? Well, I've got a nice little present for you this echo $(calendar) 1 season! Now you can pass data in and out of R as easily as

use v5.16;
use Statistics::NiceR;
use Data::Frame::Rlike;
Moo::Role->apply_roles_to_package( q|Data::Frame|, qw(Data::Frame::Role::Rlike) );

my $r = Statistics::NiceR->new;
my $iris = $r->get('iris');

say "Subset of Iris data set";
say $iris->subset( sub { # like a SQL WHERE clause
                  ( $_->('Sepal.Length') > 6.0 )
                & ( $_->('Petal.Width')  < 2   )
        })->select_rows(0, 34); # grab the first and last rows

which outputs

Subset of Iris data set
      Sepal.Length  Sepal.Width  Petal.Length  Petal.Width  Species
 51   7             3.2          4.7           1.4          versicolor
 147  6.3           2.5          5             1.9          virginica

This is possible due to Statistics::NiceR and Data::Frame.

Statistics::NiceR is a C-level binding to the R interpreter that exposes all of R's functions as if they were Perl functions. It handles all the magic data conversion in the background so that you don't have to think about it.

Data::Frame is a container for PDL typed arrays that lets you think in terms of tabular data just like R's data.frames. It even prints out a table using Text::Table::Tiny. To support categorical data just like R's factor variables, it has a PDL subclass that keeps track of the levels of the data.

It's still an early release, so there may still be some kinks to figure out, but give it a try and be sure to ping me if there is something wrong.

Much thanks to the folks in #inline for helping out with very cool Inline::Module so that this code could hit CPAN (ingy++, ether++). You should definitely check it out as an alternative to writing XS.

There are already several interfaces to R on CPAN, but this is the first one that embeds R and provides a flexible data conversion mechanism. Hope you enjoy using it!

In my last post, I talked about how I found that my social media use has become rather unwieldy. I posted a link to my post to Facebook (of course) to generate discussion and I got plenty of great discussion out of it. I have much more to say on this theme, so here's another post.

I decided to write a little about how the Internet was about 15 years ago when I started using it (or at least how I remember it). I split it off from this article because I was starting to meander. I make references to it in the following text, but you don't need to read it to understand what I'm discussing. Caveat lector.

[Information intensifies]

Information overload is one of those perennial topics that everyone always seems to be worrying about, but nobody does anything about. Clay Shirky explains how it will only get worse unless we think about it differently in his talk titled It's Not Information Overload. It's Filter Failure.

To summarise, the phenomenon of information overload follows naturally from how cheaply we can disseminate information with the Internet. In the past, we had editors that had an economic incentive to be the gatekeepers of public discourse (dead trees cost money). I like the analogy of the Great Conversation where different authors respond to the thoughts of others by referencing previous works. The editorial process behind this is not very egalitarian and we have missed out on many ideas that were ahead of their time, but it does keep the quality high. Shirky argues that the problem of information overload is not going to be solved until we start thinking about information flow. Both the problems of privacy and quality can be addressed in this framework.

That the talk was given in 2008 and we don't see powerful filter features everywhere indicates that we still haven't changed our mindset. Perhaps most people don't want that change? I'll admit, it is a lot of work. Managing filters is a constant struggle because not only does the content change, but so does your idea of relevance. And many times that criteria isn't even constant over the course of a single day!

But ignoring the problem doesn't make it go away. The problem of information overload is at its root one of attention scarcity1. We simply can't look deeply at everything. When people use a search engine, people rarely go past the 7th page or so — beyond that, abandon all hope. Endless feeds are even worse: not only are there always new posts to look at, each of the posts can cover various topics which only makes selecting where to divert our focus harder. When an entertaining post is next to something that demands more contemplation, the entertaining post will win. Deep understanding is difficult and requires patience and dedication.

Most people aren't prepared for that kind of work. But there are professionals that are trained to deal with large amounts of data: librarians. Librarians have to select new books for a collection, organise them according to a system that makes sense for their patrons, and be able to guide patrons towards good sources. This is called curation — which brings me to the first topic I want to talk about.


When I say curation, I mean the deliberate selection of information that is meant to be shared with a specific audience. This is an editorial process — in fact, every person that shares a link to an online community or posts to their social network is curating information.

The reasons for sharing content can be different for each person and not everyone is in the audience for a particular item. This can lead to cluttering — people don't want to see things they aren't interested in. People have different opinions on what is interesting and novel. This can be solved by having the curator find the appropriate audience — as is done with newsgroups on Usenet, subreddits on reddit, groups on Facebook, communities on Google+, or topics on Quora2. These all allow anyone to approach an audience and get content in front of that audience.

But allowing anyone to share in these communities means you'll have to deal with spammy content. One way around that is to have moderators who approve posts (and sometimes even replies to those posts). This is a lot of work, but a good set of moderators can make a community a very enjoyable place to be. My favourite example of this is Metafilter. They consistently have higher quality discussion than most general discussion sites (see this recent post). They have some moderation of content and the user community takes an active part in flagging posts that aren't to a certain level of quality — a kind of community self-policing. That joining the site requires a one-time $5 fee seems to help make sure that people that do join are invested in the community.

Another approach is to attack the problem at the curator level — this is closest to what an editor would do. Some sites like Slashdot have user-submitted stories that go through editors that post them to the front-page. All this depends on having good stories to submitted by user and a dedicated group of editors to go over all the submissions. Trove (Rob Malda, founder of Slashdot, works there now) adds another layer to this and allows everyone to be an editor and curate content that is suggested by an algorithm.


But getting content in front of an audience only takes into account what the person submitting the content thinks is appropriate for the audience. To gauge how the audience feels about the content, many services use collaborative filtering. That's what voting on stories on reddit or hackernews is. Social networks also use this with likes/+1s and comments.

This works fine for things that are presented to large, diverse groups. The wisdom of the crowd does not do well when there is a bias due to groupthink. The groupthink only gets worse if most people are seeing the same types of items again and again. In my opinion, this brings into question how collaborative filtering is implemented. If you show the top ranked items to everyone, those will slowly gather a higher ranking, while newer, lower-ranked items will not get the same chance and may disappear completely, a process that is known as preferential attachment. Others have written about how to solve this by using randomness (1), (2), (3).

What happens when there isn't a large enough group to apply collaborative filtering? This can happen when the topic is obscure or requires specialist knowledge. For example, I don't think a paper on the history of land use and agriculture will get many readers on a social media site. That's when you have to start using more information retrieval techniques. We need to start looking more at the content and how the content relates to the rest of the Internet. One of the approaches is to try to do topic modelling and assign each item a topic. Google+ currently approaches this very simply, as far as I can tell, by just looking for keywords in posts and applying a tag based on that. There was a site called Twine that used natural language processing and semantic web data representation to classify content and present that to users based on their interest. There was another site called Kiffets developed by researchers at Xerox PARC that combined many of the ideas of semantic processing, editorial curation, and collaborative filtering into a single system (4). Both Twine and Kiffets are no longer available. Perhaps no one has figured out how to scale their approach both financially and technologically?

Whenever a tool disappears, this is a huge loss for all those that invested the time in using it. To avoid that, we need a way of sharing the information we put into the tool with other tools that can replace the first one. There are some formats such as the APML and FOAF specifications that try to encode interests in a way that can be shared, but these have not been widely adopted. That's not surprising, because specifying before getting industry support rarely works well.

What I miss from the Internet

As I mentioned in the appendix to the this post, I found that the early web had many more personal sites on it. This is important because, back then, each of those sites had an essence that is getting harder to find nowadays: passion. Each person had a thing that they wanted to share with the world — something that made them stand out, sometimes they were even an expert in that niche area. Coming across a page borne out of passion was like walking on a rocky beach. Each of the pages was a rock that you would pick up and see all the unique patterns on it. You could recognize it instantly even after you put it down and looked at another rock. Now, with the never-ending clamor of headlines that want you to read one page or another, it feels like the constant crashing of the waves of time have ground those rocks into sand where each is indistinguishable from the next.

There is no incentive to fix this. Pages like Metafilter that rely on advertising to run can't compete in a world where entertainment is what gets the most ad impressions.

I'm not an expert on media studies, but I'm very interested in how it drives society. I plan on reading Neil Postman's Amusing Ourselves to Death sometime. It appears very related to the idea of how entertainment has stifled public discourse. This isn't necessarily the most important problem that the world is facing, but since we live together on this planet, we must be able to understand one another as rationally as possible, with as many facts laid bare3. It seems that every new communication technology promises to connect humanity together, but we need to closely examine what kinds of relationships we are building.

To conclude, I want to talk about what I would like to do to address this problem (because solving problems is my passion). I had previously worked on a tool to let me read social media in a single format for ease of access. My work then was based around creating a protocol to share the data, but now I think it is more important to work on filtering. I'm going to try and will keep posting updates with my results.

  1. For more on attention, I recommend reading "Scrolling Forward" by David M. Levy. I reviewed it here. He specifically addresses the differences in attention between reading on paper versus reading on screen. ↩

  2. Tagging (and, in a way, Google+ circles) is a more free-form version of this. ↩

  3. I'm also interested in the related topic of diffusion of innovations and how the Internet can help the rate of adoption of new ideas. ↩


[1] Luu, Dan. Why HN Should Use Randomized Algorithms. 04 Oct 2013.

[3] Marlin, B., Zemel, R., Roweis, S., and Slaney, M.: Collaborative Filtering and the Missing at Random Assumption. 20 Jun 2012. (Note: see more on Benjamin M. Marlin's research page).

[4] Stefik, M., and Good, L.: Design and Deployment of a Personalized News Service. AI Magazine 33.2 (2012): 28.

Posted Sun Jun 8 05:17:20 2014 Tags:

This is an appendix to the post curation and filtering of the social media firehose.

I'm first going to go over a bit of history of the Internet as I experienced it (yawn) to get a grounding for where I'm coming from and some insight into where the Net is going.

Let's start at the beginning of my journey on the Internet. I got online not soon after I was able to read on my own. I had a couple of books on how to use the Internet and I read them so that I could learn not only about the World Wide Web, but also other tools like FTP, Gopher, Usenet, MUDs, and e-mail. Interactivity was limited to the <map> tag, RealPlayer, Java applets, minimal Flash, and JavaScript embellishments (remember the term DHTML?).

Back then, the distribution of websites was different. I don't have any empirical evidence, but you were much more likely to hit a personal page than you are now. Corporate sites were more of a way to have a simple web presence than an attempt at creating a full-blown marketing experience. The most forward-thinking sites came from publishers and other media outlets that saw the web as a way of extending whatever they were doing in print or on (non-computer) screens; however, rich multimedia wasn't a mainstay of the web yet as the bandwidth wasn't available (see this page from National Geographic for an example). Even with all the hype around the web, most entities didn't attempt to create a web presence and were happy to let fans create communities for them both on the web and other parts of the Internet. Many of these sites are gone now or completely different from how they appeared years ago. I have a copy of the book "Net-mom's Internet Kids & Family Yellow Pages" written by Jean Armour Polly which is an excellent snapshot of both the state of the Internet and the mentality of users at the time. At the time, it made sense to publish a book full of URLs — though linkrot did occur, there was not a large enough volume for it to be a big deal. Now those pages are gone and all we have is printed paper1 describing what would have been there (if that's not a warning to archive anything you like, I don't know what is).

Aside from books, finding other sites on the Internet was still in its infancy. Prior to the web, information retrieval datasets were usually not this large or diverse. There were search engines for FTP (Archie), Gopher (Veronica), and the Web (AltaVista, Lycos, among many others) and many of these returned very different results. This is why some people used metasearch engines like Dogpile to combine multiple results.

Instead of using search engines, many times I would start my searches with directories such as the WWW Virtual Library, Yahoo!, and DMOZ as these had lists of sites that were vetted by editors and were generally of better quality than the usual search results.

  1. Speaking of printed paper, I want to take a moment to remember one of my heroes of the Internet, Michael S. Hart, the founder of Project Gutenberg, a project to convert public domain works into e-books, something he invented. It was the first virtual volunteer project. I remember going that site and downloading many of their books in Plucker format so that I could fill up the megabytes of storage on my Palm. ↩

Posted Fri Jun 6 22:16:38 2014 Tags:

I usually don't write about short scripts that I've written, but this one might be useful to others. Link for the impatient.

I needed to download videos from Khan Academy so that I could watch them offline. That should be easy enough, right? The videos are hosted on YouTube, so it should just be a matter of finding a playlist and running get_flash_videos on all the URLs. Turns out this isn't the case: the playlists on YouTube do not match up with all the videos on the Khan Academy website. Argh.

I could try to go through each of the sections on the website and copy the URLs into a file, but doing that with 700 videos isn't my idea of a fun way to spend a couple hours. I looked around for a way to download videos, but all I found was this download page which had an old torrent. I looked for an API and found one that was a bit under-documented. After trying to figure out the easiest way of using the API, I decided that trying to unravel the 10 MB JSON file returned by http://api.khanacademy.org/api/v1/topictree wasn't worth it1. Time to scrape the site!

The final code as of this writing is here. The scraping code in download.pl isn't exactly great, but it does the job. It just recursively follows children URLs and records them in a data structure which is written out to ka-data.json. Then process.pl takes over and reads the data structure. The important thing here is that the files get written out with some way of maintaining the order of the playlist. I use the order of the children URLs on the page to assign a numberic prefix to every directory and file so that it will sort by name.

Finally, to download the videos, I took the laziest approach. I just print out the get_flash_videos shell commands needed to download each video and pipe the commands into a shell. That way I didn't have to deal with any error handling myself. Now all I need to do is finish watching these videos!

Posted Wed May 28 09:02:18 2014

  1. In retrospect, it wouldn't have been that hard to use the API, but I didn't like the feel of it. A recursive data structure isn't exactly the easiest to run through at a midnight hour. Below, I sketched out how I could have gotten the same results as the scraping part of my script now that I know more about the problem. Oh well.

Posted Wed May 28 07:08:54 2014 Tags:

I might be suffering from social media overload. In the past, I've used social media networks as a way to keep up to date with both what my friends/acquaintances are up to and follow updates from certain news sources. This has mainly been through Facebook and reddit 1. Now I'm starting to find that the rate of return from visiting these sites has fallen below a threshold where I no longer feel that visiting them produces value. Now you may say "you're doing it wrong!" and that social media is about entertainment, not productivity and you're certainly right. However, I had set up my social networking accounts with enough variety in the sources they display that every time I would visit their newsfeeds, I would find something new to learn.

But lately I've found that the content and layout of the newsfeeds no longer prioritise the things I want to see. I never really wanted to see pictures of food or check-ins or lists of the same as-old-as-the-hills reaction GIFs. For a long time, I felt that this aspect of the web puts undue emphasis on visuals and videos, which is very frustrating to a text person like me. I like, no..., love reading and writing. I take joy in playing with words, using idioms, and crafting unspoken, never-before-seen sentences. Places like Facebook are not the right venue for me to indulge in that. I have toyed with the idea of scoring posts based on the complexity of their grammatical constructions, but I don't know if it is worth the time.

The essential point that keeps me from enjoying social media use is that it is very disorganised. I could try to put in work by organising people into lists, but any such lists are going to lack granularity. People are complex and multi-faceted and it would be a disservice to group them so indiscriminately. There may, however, be a way to cluster posts in a way that is useful, but this will require playing with the data. I may do that someday.

But not now. Right now, I've blocked myself from social media on my computer. Since my computer is where I work, I need to isolate my work area from something that is decidedly non-work. And I feel better for it.

P.S. A year ago, I read a book called The Filter Bubble by Eli Pariser. It talks about how algorithms are automatically choosing what we want to see for us and thus deciding our world views: we only see the world that we already agree with. He talks about how this is a dangerous trend because it rapidly blurs the line between fact and opinion.

Eli Pariser started the Upworthy site which tries to break such bubbles by making social issues go "viral". Their posts often fit a specific formula which appears to work given the amount of times their stories are shared. Upworthy has been so adept at applying this formula that imitators have sprung up all over and now social media feeds are full of people trying too hard to get people to click on their posts. I have a feeling that they all get their stories from the same sources because I see the same story posted multiple times throughout the day by different pages. The constant pleas of "Click me! I'm important" (termed click-bait) are yet another reason why social media starts to feel less social and more abusive. If every site is pushing the same content, I'm not sure the goal of bursting the filter bubble worked. We just replaced algorithms with people, but the results are the same.

  1. I've also used Slashdot for a very long time, but that has always been more of a news site for me. ↩

Posted Tue May 27 02:38:30 2014 Tags:

I've recently been working on putting together modules for image processing in Perl. One thing that keeps coming up when I write code that I want others to run is that they don't have the same images on their system as I do. So I put together a module that wraps up access to some standard test images from the USC SIPI image database. So now, instead of telling people to go download the right set of images, all they have to do is install Data::TestImage from CPAN and it will in turn download and extract the image database to a shared location. Then you can just run

use Data::TestImage;

my $image = Data::TestImage->get_image('mandrill');

and $image will contain the path to a TIFF file with the mandrill test image. Simple!

Standard test image of mandrill

I put in a couple of nice tweaks in the install process too: it doesn't install all 130 MB of the USC SIPI database by default, but you can set an environment variable and it will install only the portions you ask for.

I got some inspiration from the MATLAB Image Toolbox's built-in images and the TestImages.jl Julia package. But mine is more extensible!

Posted Mon May 26 09:19:26 2014 Tags:
Posted Sun Feb 23 08:03:46 2014 Tags:

  • This device (check out the GIFs!), called the Organ Care System, extends the life of organ transplants by providing an environment for the organ by keeping warm blood flowing while it is in transport. There are other systems that do the same, but this looks more portable.

  • This Python with Braces project seems cool. Don't know if it's a joke or not, but they forked the CPython code (via Evan Lee).

  • Engineering Map of America from PBS' American Experience

  • I saw this article about how one of the people promoting people to learn to code doesn't actually know how to code. This isn't actually that big a problem. The actual problem is that too many people think that computer science is about programming computers. Ostensibly, yes, that is what you do when you program, but really, what you want to do when you program is to learn about algorithmic thinking. You want to be able to think about logic and control flow and state. Those skills can be learned without a computer. Many of the most prolific computer scientists did not need computers to do computer science. This is why I like the CS Unplugged materials. You really need to check out their videos on YouTube.

  • DeepField is a company that does big data analysis on cloud infrastructure (via this article about Twitch.tv may have more traffic than Facebook).

  • Some more info on the Marvel Comics API from last time: apparently they use a graph DB which makes sense for what they are doing. There is a video from the GraphConnect New York conference here. This other video is mainly the same but there is a QA session at the end (@ 34:40).

  • Two links on DIY tooling. Pat Delany has been working on making open-source machine tools for cheap. His work on the MultiMachine is driven by a desire to make toolmaking tools available to anyone around the world:

    It’s strange, but at my advanced age I realize that machine tools are about all that I believe in. The lathe, shaper, and mill built the foundation of our current standard of living and there is no reason why a cheap and easy-to-build multipurpose tool could not help the 500 million people that need simple water pumps or the billion people who live on a dollar a day or less. Thanks for getting a crazy old man started.

    Here's another cheap tool: Mini circular bench saw from scrap.

  • I got quoted in this newspaper article about UH's startup accelerator, RED Labs. As I said in the article, I would really like to see more CS, engineering, and tech-related students join the program and get involved. The Computer Science Entrepreneurship Workshop+Startup Lab - RED Labs was a good start for reaching out to the CS students and there are more initiatives underway for the next semester, but we need to grow a passion for creating new things — I know it's there, but we need more expression and drive.

  • This article titled Girls and Software, while written about the gender problem faced in the software industry, had a different effect on me. It reminded me why I love the Internet and online communities. When you can "hide" your AFK identity behind a pseudonym, people don't treat you with the same AFK prejudices. I remember that I was able to converse with people much older than me and they didn't know they were talking to a 12-year-old. This was quite a freeing feeling as I could push myself to do things that you wouldn't expect from someone so young.

  • Music video: Sinine - All The Same (instrumental) (via mind.in.a.box). And it's award-winning. :-)


  • I read this article by Peter Seibel about code reading a few weeks ago. I love the idea of literate programming, but often you can't code that way because there is too much clutter. Short pieces of code like Backbone are easier to read from beginning-to-end. A comment by dmunoz on a Hacker News post about a 55-line Python task queue (thread) really sums up the sentiment nicely:

    Absolutely. I'm always pleased when documentation includes some pseudocode for what the system generally does, without the overhead of configuration, exceptional control flow, etc. It's not always possible with large systems, but makes it a lot easier to see the forest, not the trees, in even mid-sized code bases.

    (via Which code to read?)

  • This video of an autonomous boat demonstrates mapping and path planning on water. Now I'm wondering if the open-source vehicle I shared last time could be augmented with drive-by-wire to make a cheap driverless car testing platform! (via Evan Lee)

Posted Thu Feb 13 02:56:17 2014 Tags:
  • I was looking at parsing of HTML1 and came across a paper on parsing XML with regex titled REX: XML Shallow Parsing with Regular Expressions. But what's even more interesting is a project by the author called Parabix which implements parallel text processing.

  • Tethne looks like a neat Python tool for bibliographic network analysis.

  • I was reading a blog post about how horribly one-sided the Terms of Service for the Marvel API are and I came across the Swedish API License which attempts to create a license that doesn't just force developers to give up many of their rights to the API providers.

  • OpenCatalog is a list of open source projects that are funded in part by DARPA.

  • I saw this open-source car and was reminded of how I've wanted to build my own car for quite a while. Imagine the learning possibilities! There is actually a high school team in Philadelphia that works on designing and racing hybrid cars. Here's an article in IEEE Spectrum and a video from PBS Frontline. That is way cool.

  • Bioimaging consortium that connects academic and industrial partners: Cyttron.

  • I love backing up design with numbers and this user study on how people hold their mobile devices makes me happy.

  • I came across the book CMDAS: Knowledge-Based Programming for Music Research while in freenode's ##prolog. Algorithmic composition with Prolog!

  • GitHub Education is a very good idea. More students need to learn about version control and testing while in school.

  1. without and with regex (for certain values of regular :-P) ↩

Posted Tue Feb 11 17:18:30 2014 Tags:

  • I was looking for a way to perform OCR on nutritional facts and I came across this handheld spectroscopy tool that gives you the content of food by measuring the chemical composition. Basically a tricorder?

  • Moravec's paradox

    "[I]t is comparatively easy to make computers exhibit adult level performance on intelligence tests or playing checkers, and difficult or impossible to give them the skills of a one-year-old when it comes to perception and mobility."

  • I was thinking about the nargout feature of MATLAB and wondering that was possible to do the same in Perl. Well, with the Want module, you can.

  • As part of my interest in mathematical expression recoginition from the previous linkdump, I came across this scan of the Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables that can be used as data for scientific document analysis.

    It's always fun to see names you recognise from elsewhere on the Internet as I did here

    Thanks to Bruce Miller of the National Institute of Standards and Technology, who sent me a clean new copy of the book for scanning.

    I recognise Bruce Miller from the LaTeXML project.

  • This is a neat book about OpenGL that spans many different techniques for visualisation with OpenGL. Code from the book is available here.

  • Using a Möbius strip as a track for a superconductor is brilliant idea:


Posted Sat Feb 8 03:45:02 2014 Tags:

This wiki is powered by ?ikiwiki.