hacktivityENETDOWNhttp://enetdown.org//hacktivity/ENETDOWNikiwiki2022-11-23T05:35:08ZBabble: finding contexthttp://enetdown.org//hacktivity/posts/2022/11/22/babble-finding-context/zaki2022-11-23T05:35:08Z2022-11-23T04:35:37Z
<p>In <a href="http://enetdown.org//hacktivity/posts/2022/09/04/earthworks-at-the-leaning-toothpick-of-babble/">a previous post on Babble's development</a>, I mentioned
that the <code>SubstituteAndReturn</code> plugin was slow. Part of this is because the
<a href="https://p3rl.org/PPR">PPR</a> grammar can find the substitution/transliteration operators
but not provide the information about where in an expression they are
which is needed to determine if the operation is contextual on
the <code>$_</code> variable.</p>
<p>However before I jump into optimising that, it is good to profile the code to
find where it is being the slowest. I fired up
<a href="https://p3rl.org/Devel::NYTProf">Devel::NYTProf</a> and found that the code was taking the longest under Perl v5.26 with the <code>CORE:regcomp</code> subroutine
(Perl's internal regexp compilation). Usually, when a regexp does not contain
any interpolated variables this is done at compile time; however, since <code>Babble</code>
constructs regexps at runtime, this <code>CORE:regcomp</code> is called many times. So I
<a href="https://github.com/zmughal/Babble/pull/1">added caching of the compiled regex</a>
in two places where the profiler output indicated the most time was being spent. This immediately led to a
30%-70% reduction in execution time for tests depending on how heavily the code generates grammar
regexps (with the greatest reduction for the <code>postfixderef</code> and
<code>substituteandreturn</code> tests). Processing <code>Dist::Zilla</code> and its dependencies
went from 15 minutes down to 9 minutes, but this result is not using the cache
to its full potential because I currently process each file in separate
interpreters so the cache is per-file instead of over the entire run.</p>
<p>Returning to the <code>SubstituteAndReturn</code> plugin,
the <a href="https://metacpan.org/release/ZMUGHAL/Babble-0.090008/source/lib/Babble/Plugin/SubstituteAndReturn.pm#L96-107">approach used in Babble-0.090008</a>
was looping over the document and inserting a <code>$_ =~</code> explicit binding
until there were no more places a binding could be inserted.
Upon looking more closely at how the code ran,
I realized that the combination of <code>each_match_within(Expression => ... )</code>
with the pattern that I used could only find one position to place that
explicit binding per iteration; it was doing too much and this was slowing it
down. So I replaced that with <code>each_match_of( Expression => ...</code> and extract the
substitution/transliteration position myself in <a href="https://github.com/zmughal/Babble/pull/2">this PR</a>.
This reduces the processing of <code>Dist::Zilla</code> from 9 minutes to 6 minutes.</p>
<p>I then <a href="https://github.com/zmughal/Babble/pull/3">created a continuous integration setup</a> to make sure that it worked
across Perl versions. This is is when I discovered something <strong>shocking</strong>! The
tests were running up to 10 times longer on Perl 5.22–5.36 relative to 5.10–5.18 (40
versus 4 seconds overall). I knew that this slowdown is <a href="https://metacpan.org/release/DCONWAY/PPR-0.001006/view/lib/PPR.pm#LIMITATIONS">documented as a PPR
limitation</a>
but I was not timing just how much slower it was until now. I ran the profiler
again and I saw that the slowdown was entirely attributable to regex
compilation not matching speed.</p>
<p>This means I should be able to use the caching that I added to make up for the
difference as long as all files are modified in a single process. I rewrote the
<code>bash</code> script that uses <code>GNU Parallel</code> in a single-file-per-process fashion
to a Perl script that uses <a href="https://p3rl.org/MCE">MCE</a> which uses
<code>fork(2)</code> by default on Unix-likes. While this does introduce IPC overhead, it
turned out to be faster than repeatedly spawning processes. Using <code>fork(2)</code>
opened up another option for optimisation: I could warm up the cache prior to
the <code>fork(2)</code> and use that cache in the child processes — now we're cooking. I
added cases to warm up the cache until the child processes were no longer
reporting cache misses.</p>
<p>When running <code>Babble</code> across many files, it quickly becomes apparent that most
files are unchanged by the plugins so you can gain a speed up by adding a small
check for the syntax being targeted by the plugin. I added these as "early" and
"late" bail outs which can be turned off using an environment variable. These
are meant to be fast checks that look for short matches without using the full <code>PPR</code> grammar.
Early bail outs look over the entire document while late bail outs are used
inside the transformation.
For example, the <code>SubstituteAndReturn</code> early bail out looks for <code>m/ \b (?: s|y|tr ) \b /xs</code>
in the entire document and the <code>PostfixDeref</code> late bail out looks
for <code>m/ \s* -> \s* [\@%\$] /xs</code> inside of matches for the <code>PerlTerm</code> rule.</p>
<p>Any purported optimisation needs to be benchmarked. So how do all these
optimisations interact? I could run the various options for caching, Perl
version, bailing out, etc. multiple times myself and find their mean, but it would be
nicer to script that. So I used <a href="https://p3rl.org/Permute::Named::Iter">Permute::Named::Iter</a>
to record elapsed time for various options. The speed ups from enabling caching
and bailing out are so great and preliminary tests showed significance so what
this left me with is testing (Perl version) x (warming cache) x (number of workers).
While I know that the older Perl version will run faster serially, I want to
compare to see if I did not inadvertently slow it down and if my optimisations
bring the elapsed time over newer Perl versions close to that of old Perl
versions. I get the following data sorted by mean elapsed time:</p>
<div class="highlight-txt"><pre class="hl">> model <- elapsed ~ version * workers * ( cache * warm_cache + bail_out_early + bail_out_late )
> agg <- aggregate(model, mean, data = data )
> agg.sort.elapsed <- agg[order(agg$elapsed),c('version','warm_cache','workers','elapsed')]
> print(agg.sort.elapsed)
version warm_cache workers elapsed
11 perl-5.18.4@babble TRUE 8 36.446
5 perl-5.18.4@babble FALSE 8 37.170
9 perl-5.18.4@babble TRUE 4 41.960
3 perl-5.18.4@babble FALSE 4 42.490
12 perl-5.34.0@babble TRUE 8 47.266
10 perl-5.34.0@babble TRUE 4 53.192
7 perl-5.18.4@babble TRUE 2 55.320
1 perl-5.18.4@babble FALSE 2 56.364
4 perl-5.34.0@babble FALSE 4 58.852
6 perl-5.34.0@babble FALSE 8 59.772
8 perl-5.34.0@babble TRUE 2 68.158
2 perl-5.34.0@babble FALSE 2 72.652
</pre></div>
<p>and the following boxplot:</p>
<table class="img"><caption>Box plot comparing elapsed time versus number of workers with presence/absence of cache warming.</caption><tr><td><a href="http://enetdown.org//hacktivity/posts/2022/11/22/gfx/elapsed-facet-version-boxplot.png"><img src="http://enetdown.org//hacktivity/posts/2022/11/22/babble-finding-context/600x-elapsed-facet-version-boxplot.png" width="600" height="381" class="img" /></a></td></tr></table>
<p>Since the data does not fit the assumptions for using ANOVA (elapsed time in
the data has non-equal group variances),
<a href="http://depts.washington.edu/acelab/proj/art/">Aligned Rank Transform (ART)</a>
is used (I would have liked to do more runs and to more carefully control the
environment for each run, but I didn't like waiting as they have to run
serially on the same computer in order to be independent samples!):</p>
<div class="highlight-txt"><pre class="hl">> model.reduced <- elapsed ~ version * workers * ( warm_cache )
> art.reduced <- art( model.reduced, data )
> fit.reduced <- anova( art.reduced )
> print(fit.reduced)
Analysis of Variance of Aligned Rank Transformed Data
Table Type: Anova Table (Type III tests)
Model: No Repeated Measures (lm)
Response: art(elapsed)
Df Df.res F value Pr(>F)
1 version 1 48 151.892 < 2.22e-16 ***
2 workers 2 48 202.148 < 2.22e-16 ***
3 warm_cache 1 48 148.590 2.6388e-16 ***
4 version:workers 2 48 56.041 2.7902e-13 ***
5 version:warm_cache 1 48 151.381 < 2.22e-16 ***
6 workers:warm_cache 2 48 101.621 < 2.22e-16 ***
7 version:workers:warm_cache 2 48 123.130 < 2.22e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
</pre></div>
<p>In terms of main effects, the number of workers is the greatest predictor of
elapsed time. Therefore with the current implementation, warming the cache
helps a little bit, but less than adding more workers which makes sense as those
workers will each fill the cache themselves as soon as they reach input that
can be transformed. Warming the cache might help more when there are many more
files processed (currently the code uses a plugin on one distribution at a time
instead of all files across all distributions).</p>
<p>All these changes are available as:</p>
<ul>
<li><a href="https://metacpan.org/release/ZMUGHAL/Babble-0.090009">Babble-0.090009</a></li>
<li><a href="https://github.com/zmughal-experiment/perl-leaning-toothpick-of-babble/tree/v0.0.3">zmughal-experiment/perl-leaning-toothpick-of-babble at v0.0.3</a></li>
</ul>
<h1 id="appendix:listofissuesandpullrequestsparticipatedinalongtheway">Appendix: List of issues and pull requests participated in along the way</h1>
<h2 id="babble">Babble</h2>
<ul>
<li><a href="https://github.com/zmughal/Babble/pull/1">Cache generated grammar regexps to improve performance by zmughal · Pull Request #1 · zmughal/Babble · GitHub</a></li>
<li><a href="https://github.com/zmughal/Babble/pull/2">SubstituteAndReturn: Faster search for contextual s///r, y///r by zmughal · Pull Request #2 · zmughal/Babble · GitHub</a></li>
<li><a href="https://github.com/zmughal/Babble/pull/3">GitHub Actions CI workflow by zmughal · Pull Request #3 · zmughal/Babble · GitHub</a></li>
<li><a href="https://github.com/zmughal/Babble/pull/4">Add no re 'eval' in order to keep flag within scope by zmughal · Pull Request #4 · zmughal/Babble · GitHub</a></li>
<li><a href="https://github.com/zmughal/Babble/pull/5">SubstituteAndReturn: Use (*MARK:NAME) control verb for lookup by zmughal · Pull Request #5 · zmughal/Babble · GitHub</a></li>
<li><a href="https://github.com/zmughal/Babble/pull/6">SubstituteAndReturn: extract chained regex outside loop by zmughal · Pull Request #6 · zmughal/Babble · GitHub</a></li>
<li><a href="https://github.com/zmughal/Babble/pull/7">Add heuristic optimisations and associated toggle flags by zmughal · Pull Request #7 · zmughal/Babble · GitHub</a></li>
</ul>
<h2 id="app::modulebuildtiny">App::ModuleBuildTiny</h2>
<ul>
<li><a href="https://github.com/Leont/app-modulebuildtiny/issues/44">Typo in <code>mbtiny dist --verbose</code> output · Issue #44 · Leont/app-modulebuildtiny · GitHub</a></li>
</ul>
ZMQ::FFI for IPerlhttp://enetdown.org//hacktivity/posts/2022/09/14/zmq-ffi-for-iperl/zaki2022-09-14T20:06:25Z2022-09-14T20:06:25Z
<p>Like many FOSS hacking stories, it all started on IRC. Somebody in <code>#perl</code>
mentioned that they were having issues with working with ZeroMQ and an event
loop. I mentioned that I had a module
<a href="https://p3rl.org/Net::Async::ZMQ">Net::Async::ZMQ</a> that could help. Turns out that they
had been reading the code for that but were trying to make it work with
<a href="https://p3rl.org/ZMQ::FFI">ZMQ::FFI</a>
instead of
<a href="https://p3rl.org/ZMQ::LibZMQ3">ZMQ::LibZMQ3</a> and
<a href="https://p3rl.org/ZMQ::LibZMQ4">ZMQ::LibZMQ4</a> which it was originally
designed for.</p>
<p>So I <a href="https://github.com/zmughal-CPAN/p5-Net-Async-ZMQ/pull/5">added support for <code>ZMQ::FFI</code></a>.
But for my CI build to pass, I needed Windows support and <code>ZMQ::FFI</code> did not
support Windows yet.</p>
<p>I then did the next obvious thing which is patch <code>ZMQ::FFI</code> to work on Windows.
In my first attempt, I found tests that were using <code>fork(2)</code>, <code>Sys::SigAction</code>,
and the ZeroMQ <code>ipc://</code> transport (<code>zmq_ipc(7)</code>) which do not work on Windows,
so I left those as is and just decided to have <code>ZMQ::FFI</code> look for <code>.dll</code> in
addition to <code>.so</code> and <code>.dylib</code> files. That's not great; those tests should
really pass before a release can be made.</p>
<p>As time went by, I went ahead and released <a href="https://metacpan.org/release/ZMUGHAL/Net-Async-ZMQ-0.002">Net-Async-ZMQ-0.002</a>.
No use in waiting for Windows to pass for <code>ZMQ::FFI</code> when it never did before.</p>
<p>Until one night I acquired a round tuit and went ahead and <a href="https://github.com/zeromq/perlzmq/pull/44">fixed all the <code>ZMQ::FFI</code> Windows compatibility issues</a>.
I also changed the <code>ipc://</code> transport to <code>inproc://</code> where possible, however,
this changes the tests on Unix-likes. So I went with another attempt that <a href="https://github.com/zeromq/perlzmq/pull/47">refactored
the tests based on OS capability</a>.
This and several CI build fixes got rolled into <a href="https://metacpan.org/release/GHENRY/ZMQ-FFI-1.18">ZMQ-FFI-1.18</a>.</p>
<p>While I was working on all this, I decided that I wanted to port
<a href="https://p3rl.org/Devel::IPerl">Devel::IPerl</a> to use <code>ZMQ::FFI</code> because it
could be easier to install for users on Windows than <code>ZMQ::LibZMQ3</code> which
requires a compiler to link the XS with the ZeroMQ DLL. This is currently a
pain point for <code>Devel::IPerl</code> that I work around by providing a script that sets up
<code>Alien::ZMQ::latest</code> to install the ZeroMQ DLL and then points the
<code>ZMQ::LibZMQ3</code> install script to use that DLL for linking. Removing that script
will make things less complicated overall.</p>
<p>Which is what I did in <a href="https://github.com/EntropyOrg/p5-Devel-IPerl/pull/113">this PR</a>.
It did not take much, just replace functions such as <code>zmq_init()</code> with <code>ZMQ::FFI->new</code> and
<code>zmq_socket($zmq_ctx, $type)</code> with <code>$zmq_ctx->socket($type)</code>.</p>
<p>This is now available on CPAN as <a href="https://metacpan.org/release/ZMUGHAL/Devel-IPerl-0.012">Devel-IPerl-0.012</a>.</p>
<h1 id="appendix:listofissuesandpullrequestsparticipatedinalongtheway">Appendix: List of issues and pull requests participated in along the way</h1>
<h2 id="devel::iperl">Devel::IPerl</h2>
<ul>
<li><a href="https://github.com/EntropyOrg/p5-Devel-IPerl/pull/113">Port from ZMQ::LibZMQ3 to ZMQ::FFI by zmughal · Pull Request #113 · EntropyOrg/p5-Devel-IPerl · GitHub</a></li>
</ul>
<h2 id="zmq::ffi">ZMQ::FFI</h2>
<ul>
<li><a href="https://github.com/zeromq/perlzmq/pull/44">Add MSWin32 support and cross-platform CI testing by zmughal · Pull Request #44 · zeromq/perlzmq · GitHub</a></li>
<li><a href="https://github.com/zeromq/perlzmq/pull/45">Add Docker test using GitHub Actions by zmughal · Pull Request #45 · zeromq/perlzmq · GitHub</a></li>
<li><a href="https://github.com/zeromq/perlzmq/pull/47">MSWin32 compatibility and test portability by zmughal · Pull Request #47 · zeromq/perlzmq · GitHub</a></li>
<li><a href="https://github.com/zeromq/perlzmq/pull/48">Optional support for Alien::ZMQ::latest by zmughal · Pull Request #48 · zeromq/perlzmq · GitHub</a></li>
<li><a href="https://github.com/zeromq/perlzmq/pull/49">Fix build needed for release by zmughal · Pull Request #49 · zeromq/perlzmq · GitHub</a></li>
<li><a href="https://github.com/zeromq/perlzmq/pull/50">GHA: disable fail-fast by zmughal · Pull Request #50 · zeromq/perlzmq · GitHub</a></li>
</ul>
<h2 id="alien::zmq::latest">Alien::ZMQ::latest</h2>
<ul>
<li><a href="https://github.com/zmughal-CPAN/p5-Alien-ZMQ-latest/pull/7">Only use Alien::gmake for shared install by zmughal · Pull Request #7 · zmughal-CPAN/p5-Alien-ZMQ-latest · GitHub</a></li>
<li><a href="https://github.com/zmughal-CPAN/p5-Alien-ZMQ-latest/pull/8">Update CI to use GHA by zmughal · Pull Request #8 · zmughal-CPAN/p5-Alien-ZMQ-latest · GitHub</a></li>
<li><a href="https://github.com/zmughal-CPAN/p5-Alien-ZMQ-latest/pull/9">Add workflow for Strawberry Perl by zmughal · Pull Request #9 · zmughal-CPAN/p5-Alien-ZMQ-latest · GitHub</a></li>
<li><a href="https://github.com/zmughal-CPAN/p5-Alien-ZMQ-latest/issues/10">Try to get CMake working on Windows · Issue #10 · zmughal-CPAN/p5-Alien-ZMQ-latest · GitHub</a></li>
<li><a href="https://github.com/zmughal-CPAN/p5-Alien-ZMQ-latest/pull/11">Add option for choosing build configuration tool by zmughal · Pull Request #11 · zmughal-CPAN/p5-Alien-ZMQ-latest · GitHub</a></li>
<li><a href="https://github.com/zmughal-CPAN/p5-Alien-ZMQ-latest/pull/12">Set make for autoconf build by zmughal · Pull Request #12 · zmughal-CPAN/p5-Alien-ZMQ-latest · GitHub</a></li>
<li><a href="https://github.com/zmughal-CPAN/p5-Alien-ZMQ-latest/pull/13">Use Download::GitHub by zmughal · Pull Request #13 · zmughal-CPAN/p5-Alien-ZMQ-latest · GitHub</a></li>
<li><a href="https://github.com/zmughal-CPAN/p5-Alien-ZMQ-latest/issues/14">CMake build not working on new Windows Strawberry Perl install · Issue #14 · zmughal-CPAN/p5-Alien-ZMQ-latest · GitHub</a></li>
<li><a href="https://github.com/zmughal-CPAN/p5-Alien-ZMQ-latest/pull/15">Set CMAKE_MAKE_PROGRAM by zmughal · Pull Request #15 · zmughal-CPAN/p5-Alien-ZMQ-latest · GitHub</a></li>
</ul>
<h2 id="net::async::zmq">Net::Async::ZMQ</h2>
<ul>
<li><a href="https://github.com/zmughal-CPAN/p5-Net-Async-ZMQ/issues/3">Look into supporting ZMQ::FFI backend · Issue #3 · zmughal-CPAN/p5-Net-Async-ZMQ · GitHub</a></li>
<li><a href="https://github.com/zmughal-CPAN/p5-Net-Async-ZMQ/pull/4">Switch from Test::Requires to Test::Needs by zmughal · Pull Request #4 · zmughal-CPAN/p5-Net-Async-ZMQ · GitHub</a></li>
<li><a href="https://github.com/zmughal-CPAN/p5-Net-Async-ZMQ/pull/5">Add support for ZMQ::FFI by zmughal · Pull Request #5 · zmughal-CPAN/p5-Net-Async-ZMQ · GitHub</a></li>
<li><a href="https://github.com/zmughal-CPAN/p5-Net-Async-ZMQ/pull/6">Update CI, use GHA by zmughal · Pull Request #6 · zmughal-CPAN/p5-Net-Async-ZMQ · GitHub</a></li>
<li><a href="https://github.com/zmughal-CPAN/p5-Net-Async-ZMQ/pull/7">Use ZMQ_POLLIN event by zmughal · Pull Request #7 · zmughal-CPAN/p5-Net-Async-ZMQ · GitHub</a></li>
</ul>
<h2 id="microsoftvcpkg">microsoft/vcpkg</h2>
<ul>
<li><a href="https://github.com/microsoft/vcpkg/pull/22681">(zeromq) Cherry pick patches to fix build issues by LilyWangLL · Pull Request #22681 · microsoft/vcpkg · GitHub</a></li>
<li><a href="https://github.com/microsoft/vcpkg/pull/23435">(zeromq) Download patch files for GitHub PRs by zmughal · Pull Request #23435 · microsoft/vcpkg · GitHub</a></li>
<li><a href="https://github.com/microsoft/vcpkg/issues/23461">zeromq build failure because of port file with invalid hash (inconsistent) · Issue #23461 · microsoft/vcpkg · GitHub</a></li>
</ul>
<h2 id="run-vcpkggithubaction">run-vcpkg GitHub Action</h2>
<ul>
<li><a href="https://github.com/lukka/run-vcpkg/issues/135">vcpkgGitURL seems to have no effect · Issue #135 · lukka/run-vcpkg · GitHub</a></li>
</ul>
<h2 id="githubthewebsiteitself">GitHub (the website itself)</h2>
<ul>
<li><a href="https://github.com/orgs/community/discussions/12531">Patch URL from pull requests should use full index to maintain stable contents · Discussion #12531 · community · GitHub</a></li>
</ul>
Occupational safety at the Leaning Toothpick of Babblehttp://enetdown.org//hacktivity/posts/2022/09/08/occupational-safety-at-the-leaning-toothpick-of-babble/zaki2022-09-08T14:29:56Z2022-09-08T14:26:34Z
<h1 id="qqalwaysinterpolates"><code>qq</code> always interpolates</h1>
<p>I was looking over some of the PPR tests for changes and noticed that the
non-interpolating quote-likes test had listed <code>qq''</code> (<code>qq</code> with single quote
delimiters) as non-interpolating. I double-checked the documentation and
implementation and noted that <code>qq</code> always interpolates then <a href="https://rt.cpan.org/Ticket/Display.html?id=143876">took this
information to Damian Conway</a>.
This was fixed in <a href="https://metacpan.org/release/DCONWAY/PPR-0.001005">PPR 0.001005</a>.</p>
<h1 id="cutlogntimes">Cut log₂(N) times</h1>
<p>CPANTS reports alerted me that the last development release was <a href="http://matrix.cpantesters.org/?dist=Babble%200.090007_01">failing for Perl ≥ v5.30.0</a>.
Looking at the reports and the line which was failing, I saw that it was happening during regex compilation.
There was <a href="https://rt.cpan.org/Ticket/Display.html?id=126285">previously a bug</a>
at that line that also had something to do with regex compilation when
interpolating, so I thought maybe it had something to do with that. Nope. It
was a new bug!</p>
<p>So I created a minimum example and ran it through my various <code>perlbrew</code> Perls
to see if I was still getting the bug. I was. I hopped on IRC and asked <code>#p5p</code>
how I could bisect Perl and find out when the behaviour changed
and <a href="https://metacpan.org/author/TONYC">TonyC</a> pointed me to the
<code>Porting/bisect.pl</code> and <code>Porting/bisect-runner.pl</code> scripts. Once I had those
pieces, all I needed to do was run</p>
<div class="highlight-sh"><pre class="hl"> perl .<span class="hl opt">/</span>Porting<span class="hl opt">/</span>bisect.pl <span class="hl kwb">--start</span> v5.28<span class="hl num">.3</span> <span class="hl kwb">--end</span> v5.30<span class="hl num">.3</span> \
<span class="hl kwb">--with-module</span><span class="hl opt">=</span>PPR <span class="hl kwb">--no-module-tests</span> \
<span class="hl kwb">--</span> \
.<span class="hl opt">/</span>perl <span class="hl kwb">-I</span> lib ppr-warn-min.pl
</pre></div>
<p>and wait 23 minutes and showed that the first commit that failed was
<a href="https://github.com/Perl/perl5/commit/7c932d07cab18751bfc7515b4320436273a459e2">7c932d07ca</a>.
I have <code>ccache</code> installed so that saved some time with
compilation. That line installs <code>PPR</code> (at 0.001005) and then runs my minimal
example.</p>
<p>I then reported my findings in a <a href="https://rt.cpan.org/Ticket/Display.html?id=144248">bug report to PPR</a>.</p>
<p>In the meantime, I wanted to make sure that wasn't the only issue, so I went ahead and released
<a href="https://metacpan.org/release/ZMUGHAL/Babble-0.090007_02">Babble 0.090007_02</a> which
<a href="https://metacpan.org/release/ZMUGHAL/Babble-0.090007_02/diff/ZMUGHAL%2FBabble-0.090007_01/lib/Babble/Grammar.pm">squashes the warning</a>.
This worked fine when running through <code>perlbrew</code> and CPANTS also had no issues.</p>
<p>In the meantime, Damian Conway <a href="https://metacpan.org/release/DCONWAY/PPR-0.001006">fixed PPR</a> to address the bug, so
I bumped the minimum version of PPR and reverted the squashed warning for
<a href="https://metacpan.org/release/ZMUGHAL/Babble-0.090007_03">Babble-0.090007_03</a>.</p>
<p>If that works out, I'll release a stable Babble release soon.</p>
<h1 id="appendix:listofissuesandpullrequestsparticipatedinalongtheway">Appendix: List of issues and pull requests participated in along the way</h1>
<h2 id="babble">Babble</h2>
<ul>
<li><a href="https://rt.cpan.org/Ticket/Display.html?id=126285">Bug #126285 for Babble: Regexp related test failures</a></li>
<li><a href="https://rt.cpan.org/Ticket/Display.html?id=132725">Bug #132725 for Babble: several warnings and broken translation of non-postfixderef array expression ($exe(0)) by PostfixDeref</a></li>
<li><a href="https://rt.cpan.org/Ticket/Display.html?id=132727">Bug #132727 for Babble: Order dependent translation can create illegal syntax</a></li>
<li><a href="https://rt.cpan.org/Ticket/Display.html?id=132728">Bug #132728 for Babble: empty signatures '()' result in illegal syntax:</a></li>
</ul>
<h2 id="ppr">PPR</h2>
<ul>
<li><a href="https://rt.cpan.org/Ticket/Display.html?id=143876">Bug #143876 for PPR: Interpolation within quote-like with single-quote delimiters (rt.cpan.org #143876)</a></li>
<li><a href="https://rt.cpan.org/Ticket/Display.html?id=144248">Bug #144248 for PPR: Regex compilation warning for (??{})*+ on Perl v5.30+ (rt.cpan.org #144248)</a></li>
</ul>
Earthworks at the Leaning Toothpick of Babblehttp://enetdown.org//hacktivity/posts/2022/09/04/earthworks-at-the-leaning-toothpick-of-babble/zaki2022-09-05T06:28:55Z2022-09-05T04:00:26Z
<p>Perl's parser is very flexible: its behaviour can be changed at runtime and
this makes it difficult to <a href="https://www.oilshell.org/blog/2016/10/22.html">statically parse</a>.
Several tools attempt to statically parse or recognise a useful
subset of real-world Perl code such as
<a href="https://p3rl.org/PPI">PPI</a> (handwritten recursive descent parser in Pure Perl),
<a href="https://p3rl.org/Compiler::Parser">Compiler::Parser</a> (handwritten recursive descent parser in C++),
<a href="https://p3rl.org/Guacamole">Guacamole</a> (parser based on BNF grammar built on the wonderful <a href="https://p3rl.org/Marpa::R2">Marpa::R2</a>),
and
<a href="https://p3rl.org/PPR">PPR</a> (grammar using regex recursive subpatterns).
This post will talk about some improvements to
<a href="https://p3rl.org/Babble">Babble</a> which extends <code>PPR</code> with a plugin
framework for source code transformation.
I will also cover a little bit about other approaches along the way.</p>
<p>What makes <code>PPR</code><a href="http://enetdown.org//hacktivity/#fn:three-little-words-talk" id="fnref:three-little-words-talk" class="footnote">1</a> interesting is that it is
implemented as a regex<a href="http://enetdown.org//hacktivity/#fn:regex-regular-language" id="fnref:regex-regular-language" class="footnote">2</a> that matches subpatterns of Perl syntax
rather than an actual parser (i.e., <code>PPR</code> does not create a data structure at this time).
This means instead of having to write code that traverses a tree, you can write
out the subpatterns that you are looking for and insert code actions to handle what
you want to do as the regex engine encounters various pieces of the
syntax<a href="http://enetdown.org//hacktivity/#fn:ppr-data-structure" id="fnref:ppr-data-structure" class="footnote">3</a>. The <code>PPR</code> distribution offers two regexen:</p>
<ul>
<li><code>$PPR::GRAMMAR</code>: a standard Perl grammar with all rules defined
in the form <code>PerlRuleName</code> and</li>
<li><code>$PPR::X::GRAMMAR</code>: roughly the same grammar as <code>$PPR::GRAMMAR</code>, but each
rule is defined as both <code>PerlRuleName</code> and <code>PerlStdRuleName</code> which allows for
overriding and extending the rules.</li>
</ul>
<p><code>Babble</code> uses the <code>$PPR::X::GRAMMAR</code> regex to
find and extract the positions
of submatches. To give an example:</p>
<div class="highlight-perl"><pre class="hl">
<span class="hl kwa">use</span> strict<span class="hl opt">;</span>
<span class="hl kwa">use</span> warnings<span class="hl opt">;</span>
<span class="hl kwa">use</span> feature <span class="hl str">qw(say)</span><span class="hl opt">;</span>
<span class="hl kwa">use</span> Babble<span class="hl opt">::</span>Match<span class="hl opt">;</span>
<span class="hl slc"># Define what we want to match against.</span>
<span class="hl kwc">my</span> <span class="hl kwb">$match</span> <span class="hl opt">=</span> Babble<span class="hl opt">::</span>Match<span class="hl opt">-></span><span class="hl kwd">new</span><span class="hl opt">(</span>
top_rule <span class="hl opt">=></span> <span class="hl str">'Document'</span><span class="hl opt">,</span>
text <span class="hl opt">=></span> <span class="hl str">q{</span>
<span class="hl str"> for my</span> <span class="hl ipl">$i</span> <span class="hl str">(0..7) {</span>
<span class="hl str"> say</span> <span class="hl ipl">$i</span><span class="hl str">;</span>
<span class="hl str"> }</span>
<span class="hl str"></span>
<span class="hl str"> for my</span> <span class="hl ipl">$j</span> <span class="hl str">(</span><span class="hl ipl">@list</span><span class="hl str">) {</span>
<span class="hl str"> say</span> <span class="hl ipl">$j</span><span class="hl str">;</span>
<span class="hl str"> }</span>
<span class="hl str"> },</span>
<span class="hl str">);</span>
<span class="hl str"></span>
<span class="hl str">my</span> <span class="hl ipl">@vars</span><span class="hl str">;</span>
<span class="hl str"># Look up the definition of the PerlControlBlock rule in PPR::X:</span>
<span class="hl str"></span><span class="hl ipl">$match</span><span class="hl str">->each_match_within( ControlBlock => [</span>
<span class="hl str"> # Match either for or foreach blocks</span>
<span class="hl str"> q{</span>
<span class="hl str"> for(?:each)?+</span> <span class="hl esc">\b</span>
<span class="hl str"> (?>(?&PerlOWS))</span>
<span class="hl str"> }</span><span class="hl opt">,</span>
<span class="hl slc"># With a simple `my` declaration (full grammar supports</span>
<span class="hl slc"># `my`/`our`/`state` here and some other syntax)</span>
<span class="hl str">q{</span>
<span class="hl str"> (?> (?: my ) (?>(?&PerlOWS)) )?+</span>
<span class="hl str"> }</span><span class="hl opt">,</span>
<span class="hl slc"># Capture a scalar variable name into the `variable` submatch</span>
<span class="hl opt">[</span> variable <span class="hl opt">=></span> <span class="hl str">'(?&PerlVariableScalar)'</span> <span class="hl opt">],</span>
<span class="hl slc"># Match the rest of the `for` block</span>
<span class="hl str">q{</span>
<span class="hl str"> (?>(?&PerlOWS))</span>
<span class="hl str"> (?> (?&PerlParenthesesList) | (?&PerlQuotelikeQW) )</span>
<span class="hl str"></span>
<span class="hl str"> (?>(?&PerlOWS))</span>
<span class="hl str"> (?>(?&PerlBlock))</span>
<span class="hl str"> }</span><span class="hl opt">,</span>
<span class="hl opt">] =></span> <span class="hl kwa">sub</span> <span class="hl opt">{</span>
<span class="hl kwc">my</span> <span class="hl opt">(</span><span class="hl kwb">$m</span><span class="hl opt">) =</span> <span class="hl kwb">@_</span><span class="hl opt">;</span>
<span class="hl kwc">push</span> <span class="hl kwb">@vars</span><span class="hl opt">,</span> <span class="hl kwb">$m</span><span class="hl opt">-></span><span class="hl kwd">submatches</span><span class="hl opt">->{</span>variable<span class="hl opt">}-></span><span class="hl kwd">text</span><span class="hl opt">;</span>
<span class="hl opt">});</span>
<span class="hl kwc">say</span> <span class="hl str">"variables found:</span> <span class="hl ipl">@vars</span><span class="hl str">"</span><span class="hl opt">;</span>
__END__
variables found<span class="hl opt">:</span> <span class="hl kwb">$i $j</span>
</pre></div>
<p>This snippet matches any <code>ControlBlock</code> that also matches against the
submatch specification that is passed in as an <code>ArrayRef</code>. This specification
captures the iterator variable which is the only part we are interested in, but to know
where that variable is as part of the entire <code>for</code>-loop syntax, we need to use
other rules that are part of the <code>PPR::X</code> grammar.</p>
<h1 id="motivation">Motivation</h1>
<p>What brought me to using <code>Babble</code> was that I had written a set of modules which
use some features from newer versions of Perl and use modules that implement
new keywords using the pluggable keyword API from Perl v5.12. I use <code>perlver
--blame</code> from <a href="https://p3rl.org/Perl::MinimumVersion">Perl::MinimumVersion</a> to keep track
of what features I am using so I had some idea of what I needed to be
able to backport. So I ran <code>Babble</code> on my code and wrote a plugin to handle the
new keywords — my tests passed, the code ran on Perl v5.8.9 and it was now Pure
Perl by removing a dependency on an XS-only module. I was happy, but that's not
the end of the story.</p>
<h1 id="breakingground">Breaking ground</h1>
<p>Typically when releasing libraries to CPAN, authors target the oldest version
of the Perl interpreter they can support without having to backport features,
syntax, and bug fixes while application code on CPAN has fewer constraints on
the minimum Perl version. Since the majority of code on CPAN is library code,
the following plot of minimum Perl interpreter versions shows that releases are
skewed towards supporting older versions:</p>
<table class="img"><caption>Count of latest CPAN releases that support inclusive minimum (≥) version of the Perl interpreter. Generated by <a href="https://github.com/zmughal-experiment/perl-leaning-toothpick-of-babble/blob/v0.0.1/bin/metacpan-find-perl-versions-agg.pl">metacpan-find-perl-versions-agg.pl</a>.</caption><tr><td><a href="http://enetdown.org//hacktivity/posts/2022/09/04/gfx/cpan-min-perl-version-buckets.png"><img src="http://enetdown.org//hacktivity/posts/2022/09/04/earthworks-at-the-leaning-toothpick-of-babble/600x-cpan-min-perl-version-buckets.png" width="600" height="382" class="img" /></a></td></tr></table>
<p>Some of the features are easy enough to backport such as the
<a href="https://metacpan.org/pod/Syntax::Construct#//"><code>//</code> (defined-or)</a>
operator which can be turned into a <code>?:</code> (ternary operator). Others are less
so, such as the <a href="https://metacpan.org/pod/Syntax::Construct#\b{}"><code>/\b{wb}/</code> word boundary assertion</a>
which is tied to properties in specific versions of the Unicode Standard.</p>
<p>Testing that source code transformation is correct by only running it on code
that I wrote is an “it works on my machine” approach. That's not very
satisfying. Let's find something a bit more challenging: grab some code off of
CPAN. Finding candidates for this is pretty easy with the MetaCPAN
API, but I already had something in mind: <code>Dist::Zilla</code><a href="http://enetdown.org//hacktivity/#fn:metacpan-candidates" id="fnref:metacpan-candidates" class="footnote">4</a>.
Hold that thought, for now, I'll get back to it later.</p>
<p>I started by implementing the <a href="https://metacpan.org/pod/Syntax::Construct#...">ellipsis statement (<code>...</code>)</a>.
The <a href="https://github.com/shadow-dot-cat/Babble/pull/4">changes for this are fairly simple</a>.
Just match any statement that entirely matches <code>/\.\.\./</code> and replace it with a <code>die</code>.</p>
<h1 id="structuralsupport">Structural support</h1>
<p>I then wanted to do something a little more daring: implement a keyword on top
of <code>Babble</code>. Well, not just a keyword, multiple keywords… that have their
semantics defined in another module. As before, this can not be statically
parsed, but keywords and custom syntax are still going to be used and people
want a solution. There are open issues about the API design to support this use
case on the <a href="https://github.com/Perl-Critic/PPI/issues/273">PPI (The Keyword Question)</a>
and the <a href="https://github.com/xsawyerx/guacamole/issues/23">Guacamole (Pluggable grammars)</a>
repositories.</p>
<p>The particular module that I needed this for is
<a href="https://p3rl.org/Function::Parameters">Function::Parameters</a> which
lets you create new keywords that occur in the same place in the grammar
as subroutine declaration (both named and anonymous)<a href="http://enetdown.org//hacktivity/#fn:install-sub" id="fnref:install-sub" class="footnote">5</a>. These keywords are
defined when imported and have per-keyword configuration options to
perform type validation, check argument count, and
automatically place the <a href="https://perldoc.perl.org/perlglossary#invocant">invocant</a>
in a variable (e.g., <code>method</code> uses <code>$self</code>,
<a href="https://perldoc.perl.org/perlglossary#class-method"><code>classmethod</code></a> uses <code>$class</code>).
In order for the code generator to convert those definitions into Pure Perl
code, I need to access the configuration options used for the keywords.</p>
<p>A pattern that some Perl distributions use is to share imports among all the
modules in a common setup module which exports those imports to whatever is
using the common setup module. An easy way to do this is to use
<a href="https://p3rl.org/Import::Into">Import::Into</a>. If we break the static parsing rule
of “no evaluation of code” by loading just the setup module so that the parser
can study what is imported, we can direct the parser and code
generator.</p>
<p>Once we have the keywords that <code>Function::Parameters</code> introduces (in my case, I
have the keywords <code>fun</code>, <code>method</code>, and <code>classmethod</code>), we need to extend the
PPR grammar to match the parameter list just like <code>Function::Parameters</code> does.
We can extend the <code>PPR::X</code> grammar by adding more rules to match
what a single parameter looks like (optional type, sigil variable, optional
default value) and how these make up a comma-separated parameter list.
This is the approach I take <a href="https://github.com/zmughal-CPAN/p5-Dist-Zilla-PluginBundle-Author-ZMUGHAL/blob/86c8ac100a3679258bbe6f45a0ae4724a7fbf79c/lib/Dist/Zilla/PluginBundle/Author/ZMUGHAL/Babble/FunctionParameters.pm#L8-L28">here</a> which
currently resides in my personal repository as I do not yet
want to release it for broader use.
I also wanted to be able to use the same rules I created for matching a
parameter list to help extract each part of the parameter, but to do this
I had to patch <code>Babble</code> so that <a href="https://github.com/shadow-dot-cat/Babble/pull/5"><code>::Submatch</code> objects use the same grammar as
<code>::Match</code> objects</a>.
To test this, I created a small grammar to introduce <code>class</code>, <code>extends</code>, and
<code>with</code> keywords using <code>Babble</code> that are transformed into code that respectively
uses <a href="https://p3rl.org/Moo">Moo</a> classes, inheritance, and roles.</p>
<h1 id="keepdigging">Keep digging</h1>
<p>I then looked at the PRs that were already there: one by <a href="https://github.com/2shortplanks">Mark Fowler</a>
presented a <a href="https://github.com/shadow-dot-cat/Babble/pull/1">failing test case </a>
which I <a href="https://github.com/shadow-dot-cat/Babble/pull/6">fixed</a>. I also reviewed
two PRs by <a href="https://github.com/djerius">Diab Jerius</a>
(<a href="https://github.com/shadow-dot-cat/Babble/pull/2">1</a>,
<a href="https://github.com/shadow-dot-cat/Babble/pull/3">2</a>).</p>
<p>I then read the code for some of the other plugins and noticed that the <code>SubstituteAndReturn</code>
plugin for <a href="https://metacpan.org/pod/Syntax::Construct#/r">non-destructive substitution</a>
could not handle cases where the substitution is implicit or chained. When used inside of <code>map</code>,
the code is often both:</p>
<div class="highlight-perl"><pre class="hl"><span class="hl slc"># chained implicit substitution</span>
<span class="hl kwc">map</span> <span class="hl opt">{</span> <span class="hl kwd">s/abc/1/gr</span> <span class="hl opt">=~</span> <span class="hl kwd">s/xyz/2/gr</span> <span class="hl opt">}</span> <span class="hl kwb">@strings</span><span class="hl opt">;</span>
<span class="hl slc"># is equivalent to making the implicit $_ variable explicit</span>
<span class="hl kwc">map</span> <span class="hl opt">{</span> <span class="hl kwb">$_</span> <span class="hl opt">=~</span> <span class="hl kwd">s/abc/1/gr</span> <span class="hl opt">=~</span> <span class="hl kwd">s/xyz/2/gr</span> <span class="hl opt">}</span> <span class="hl kwb">@strings</span><span class="hl opt">;</span>
<span class="hl slc"># is equivalent to creating a copy and then applying in-place substitutions</span>
<span class="hl kwc">map</span> <span class="hl opt">{ (</span><span class="hl kwc">my</span> <span class="hl kwb">$TEMP</span> <span class="hl opt">=</span> <span class="hl kwb">$_</span><span class="hl opt">) =~</span> <span class="hl kwd">s/abc/1/g</span><span class="hl opt">;</span> <span class="hl kwb">$TEMP</span> <span class="hl opt">=~</span> <span class="hl kwd">s/xyz/2/g</span><span class="hl opt">;</span> <span class="hl kwb">$TEMP</span> <span class="hl opt">}</span> <span class="hl kwb">@strings</span><span class="hl opt">;</span>
</pre></div>
<p>The above transformation is essentially what <a href="https://github.com/shadow-dot-cat/Babble/pull/8">this pull
request</a> does. First, it looks
for statements that start with a substitution (meaning they have no binding
operator <code>=~</code>) and inserts a <code>$_ =~</code> in front. Then it goes through and
replaces any chain of <code>$lvalue =~ s/../../r =~ s/../../r ...</code> with a <code>map</code> and <code>for</code>; for example
here is a test case:</p>
<div class="highlight-perl"><pre class="hl"> <span class="hl opt">[</span> <span class="hl str">'</span><span class="hl ipl">$foo</span> <span class="hl str">=~ s/foo/bar/gr =~ s/(bar)+/baz/gr'</span><span class="hl opt">,</span>
<span class="hl str">'(map { (my</span> <span class="hl ipl">$__B_001</span> <span class="hl str">=</span> <span class="hl ipl">$_</span><span class="hl str">) =~ s/foo/bar/g; for (</span><span class="hl ipl">$__B_001</span><span class="hl str">) { s/(bar)+/baz/g }</span> <span class="hl ipl">$__B_001</span> <span class="hl str">}</span> <span class="hl ipl">$foo</span><span class="hl str">)[0]'</span><span class="hl opt">, ],</span>
</pre></div>
<p>I also found syntax that <code>Perl::MinimumVersion</code> does not yet detect: the
transliteration operator also takes an <code>/r</code> flag:</p>
<div class="highlight-perl"><pre class="hl"><span class="hl kwc">my</span> <span class="hl kwb">@STRINGS</span> <span class="hl opt">=</span> <span class="hl kwc">map</span> <span class="hl opt">{</span> <span class="hl kwd">tr/a-z/A-Z/r</span> <span class="hl opt">}</span> <span class="hl kwb">@strings</span><span class="hl opt">;</span>
<span class="hl slc"># or y///r</span>
</pre></div>
<p>I added support for transforming this syntax as well in the same pull request
and added support for <a href="https://github.com/neilb/Perl-MinimumVersion/pull/25">detecting this to <code>Perl::MinimumVersion</code></a>.</p>
<h1 id="puttingtheblocksinplace">Putting the blocks in place</h1>
<p>Perl v5.12 introduced
<a href="https://metacpan.org/pod/Syntax::Construct#package-version">package version syntax</a>
and Perl v5.14 introduced <a href="https://metacpan.org/pod/Syntax::Construct#package-block">package block syntax</a>.
These both slightly change the syntax of how packages are declared.
The <a href="https://github.com/shadow-dot-cat/Babble/pull/10">implementation for transforming to pre-v5.12 syntax</a>
is quite straightforward.</p>
<p>For package versions, just find package declarations (both statements & blocks)
that have a version and turn the version into a string as in this test:</p>
<div class="highlight-perl"><pre class="hl"> <span class="hl opt">[</span> <span class="hl str">'package Foo::Bar v1.2.3;</span>
<span class="hl str">42'</span><span class="hl opt">,</span>
<span class="hl str">q{package Foo::Bar;</span>
<span class="hl str">our</span> <span class="hl ipl">$VERSION</span> <span class="hl str">= 'v1.2.3';</span>
<span class="hl str"></span>
<span class="hl str">42}, ],</span>
</pre></div>
<p>Converting the package blocks is also similar. Match any package declarations
that are package blocks and move the opening <code>{</code> of the block to right before
the <code>package</code> keyword and insert a semi-colon (<code>;</code>) after as in this test:</p>
<div class="highlight-perl"><pre class="hl"> <span class="hl opt">[</span> <span class="hl str">'package Foo::Bar v1.2.3 { 42 }'</span><span class="hl opt">,</span>
<span class="hl str">'{ package Foo::Bar v1.2.3; 42 }'</span><span class="hl opt">, ],</span>
</pre></div>
<p>Do you notice that there is no dependency between the two plugins? You can apply them
in either order.</p>
<h1 id="archimedes">μή μου τοὺς κύκλους τάραττε!<a href="http://enetdown.org//hacktivity/#fn:archimedes" id="fnref:archimedes" class="footnote">6</a></h1>
<p>Perl v5.20 introduced the <a href="https://perldoc.perl.org/5.36.0/feature#The-'postderef'-and-'postderef_qq'-features"><code>postderef</code> feature</a>
which lets one write dereferencing write code that dereferences by using a
postfix syntax:</p>
<div class="highlight-perl"><pre class="hl"><span class="hl slc"># ->@* is the postfix equivalent of the circumfix @{ }</span>
<span class="hl slc"># so</span>
<span class="hl kwb">$diagram</span><span class="hl opt">->{</span>watch<span class="hl opt">}{</span>out<span class="hl opt">}{</span>archimedes<span class="hl opt">}-></span>@<span class="hl opt">*</span>
<span class="hl slc"># is the same as</span>
@<span class="hl opt">{</span> <span class="hl kwb">$diagram</span><span class="hl opt">->{</span>watch<span class="hl opt">}{</span>out<span class="hl opt">}{</span>archimedes<span class="hl opt">} }</span>
</pre></div>
<p>Some of these can also be used inside of interpolated quote-likes (e.g., double
quotes) when the <code>postderef_qq</code> feature is enabled.</p>
<p>Turns out that <a href="https://metacpan.org/release/DCONWAY/PPR-0.000028">PPR v0.000028</a>
did not support all of the possible postfix dereferencing syntax. So I
submitted a <a href="https://rt.cpan.org/Public/Bug/Display.html?id=143877">bug to PPR</a>
which Damian Conway fixed within a week. This version is non-backwards
compatible because it changed the definition of the <code>PerlTerm</code> rule which I
<a href="https://github.com/shadow-dot-cat/Babble/pull/11">accommodated in <code>Babble</code></a>.
This also happened to fix the bug addressed by one of Diab Jerius' pull
requests! It also caused a regression in matching the <code>state</code> keyword
with attributes which was caught via the <code>Babble</code> test suite, but this was
fixed after
<a href="https://rt.cpan.org/Public/Bug/Display.html?id=143966">submitting a PPR bug report</a>.</p>
<p>I then added support for converting <code>postderef</code> syntax within quote-likes.
While running this on the <code>Dist::Zilla</code> code I found code which uses the match
operator with single quotes:</p>
<div class="highlight-perl"><pre class="hl"> <span class="hl kwc">my</span> <span class="hl opt">(</span><span class="hl kwb">$sigil</span><span class="hl opt">,</span> <span class="hl kwb">$varname</span><span class="hl opt">) = (</span><span class="hl kwb">$variable</span> <span class="hl opt">=~</span> m<span class="hl str">'^([$@%*])(.+)$'</span><span class="hl opt">);</span>
</pre></div>
<p>which does not interpolate the <code>$@</code> inside. I also read the Perl documentation
closely and found that some special variables do not interpolate at all in
regexes:</p>
<div class="highlight-perl"><pre class="hl"> <span class="hl slc"># From `perlop`:</span>
<span class="hl slc"># > (Note that $(, $), and $| are not interpolated because they look like end-of-string tests.)</span>
<span class="hl kwd">qr/ $( $) $| /x</span>
</pre></div>
<p>I also <a href="https://rt.cpan.org/Public/Bug/Display.html?id=143876">submitted these as bugs to PPR</a>.</p>
<p>I wanted to really stress-test the <code>postderef_qq</code> support by trying
different combinations. I started by reading the Perl interpreter source code
for interpolation in <code>toke.c</code> and found one undocumented case of
interpolation which is using <code>->$#*</code> to access the array length
so I sent a patch to Perl itself to <a href="https://github.com/Perl/perl5/pull/20049">document it</a>
(even though it isn't necessarily useful in the context of interpolation). I
also found that PPR was matching key-value slices (<code>->%[</code> and <code>->%{</code>) inside of
quote-likes when only value slices are allowed (<code>->@[</code> and <code>->@{</code>) so this was
also <a href="https://rt.cpan.org/Public/Bug/Display.html?id=143955">submitted as a bug report</a>.
Once all those bugs related to matching <code>postderef</code> were fixed in <code>PPR</code>, my
<a href="https://github.com/shadow-dot-cat/Babble/pull/12">tests related to syntax were passing and I opened another PR</a>.</p>
<h1 id="carefulwiththatsledgehammer">Careful with that sledgehammer</h1>
<p>At this point, I felt confident enough to start automating the conversion
of code. So I wrote a set of scripts that live here in the
<a href="https://github.com/zmughal-experiment/perl-leaning-toothpick-of-babble/tree/v0.0.1">perl-leaning-toothpick-of-babble</a>
repository
(alluding to
<a href="https://en.wikipedia.org/wiki/Leaning_toothpick_syndrome">1</a>,
<a href="https://en.wikipedia.org/wiki/Leaning_Tower_of_Pisa">2</a>,
<a href="https://en.wikipedia.org/wiki/Tower_of_Babel">3</a>).</p>
<p>Modifying code like this while debugging source code
transformation is a repetitive process where you make a modification and
then check it. To do this, I borrowed the
<a href="https://blog.moertel.com/posts/2013-02-18-git-second-order-diff.html">second-order-diff technique</a>
from Tom Moertel which essentially lets you look at the diff between runs of your
Big-Automated-SledgeHammer (BASH) so that you only have to review what has changed.
It is a technique I have used for automating and verifying code
refactoring before.</p>
<p>My BASH is a <code>bash</code> script called <code>setup.sh</code>.
I start by using
<a href="https://p3rl.org/Git::CPAN::Patch">Git::CPAN::Patch</a>
to pull down the latest tarball of a given release
and set this up as a <code>git</code> repository. I do this for
the releases of <code>Dist::Zilla</code> and its dependencies
which use syntax newer than Perl v5.8.9 which is checked
by trying to install and test each release using
an install of Perl v5.8.9; testing compatibility by
trying to run the test suite is more reliable
than using <code>Perl::MinimumVersion</code>.
I then</p>
<ul>
<li>remove comments and POD from the code so that
there is less text to process;</li>
<li>remove explicit Perl versions as in
<code>use v5.36;</code> in modules and <code>Makefile.PL</code> and
remove <code>MIN_PERL_VERSION</code> in <code>Makefile.PL</code> (though it would be better to only
remove versions greater than the v5.8.9 target); and</li>
<li>remove <code>no feature 'switch'</code> as several
modules have that as boilerplate which will not work under
Perl v5.8.9 as per <code>Module::CoreList</code>, <code>feature was first released with perl
v5.9.3</code>.</li>
</ul>
<p>After that preparation work, I can run <code>Babble</code> plugins on the code for each
release. It is important to note that I only run <code>Babble</code> automatically on code
in modules and not on testing scripts. This is because I want to make sure that
I leave the testing code as a control. This is possible because, as is best
practice with any testing code, the majority of testing code is written to be
as simple as possible and this means it is often already compatible with
Perl v5.8.9 syntax.</p>
<p>However, there are a couple of special cases which I run inside of <code>setup.sh</code> as
individual commands which are worth noting. First, there are two testing
scripts in <code>Dist::Zilla</code> which require being processed by <code>::DefinedOr</code> and
<code>::PostfixDeref</code> plugins. I looked over these to make sure that the test behaviours
do not change. Second, there are two modules which use
<a href="https://p3rl.org/Try::Tiny">Try::Tiny</a>
which conflicts with the <code>PPR::X</code> grammar's support for the
<a href="https://metacpan.org/pod/perl5340delta#Experimental-Try/Catch-Syntax">try/catch syntax introduced in 5.34.0</a>.
If the <code>try/catch</code> is not disabled in the grammar, <code>PPR</code> will consider the use of
<code>Try::Tiny</code> as invalid syntax and <code>Babble</code> can not process that code (not to
mention the semantics are very different). So I use a role to disable the
<code>try/catch</code> syntax as described in the <code>PPR</code> tests and individually process
those two modules. Note that this is essentially <em>removing</em> a keyword from the
grammar!</p>
<p>There are a couple more bugs that I reported and fixed as I repeatedly ran the
BASH followed by running <code>check-perlver.sh</code> and
<code>run-tests.sh</code>. They can be found in the Appendix.</p>
<p>Now after all that, I ran <code>dzil new</code> and <code>dzil build</code> under Perl v5.8.9 and I
got a releasable tarball. The whole process of applying the
BASH on all of <code>Dist::Zilla</code> and its dependencies takes
15 minutes on my computer. I have not yet tried to optimise it and in
particular the <code>::SubstituteAndReturn</code> plugin uses a slow, but correct
algorithm.</p>
<p>The version of Babble with all these changes is on CPAN as
<a href="https://metacpan.org/release/ZMUGHAL/Babble-0.090007_01">Babble v0.090007_01</a>.</p>
<h1 id="takeaways">Takeaways</h1>
<p>There are a few paths for further work that I want to highlight.</p>
<h2 id="wherethiscanbeused">Where this can be used</h2>
<p>Processing code to use older syntax is not necessarily useful for all types of
code. Furthermore, some features are not at the syntax level (e.g., Y2038
compliance) and other changes between versions of the interpreter fix bugs or
security issues (e.g., hash randomization). So where can this be used? Mostly
in toolchain code or code that works with toolchain code (e.g., for DevOps) as
the Perl toolchain <a href="https://github.com/Perl-Toolchain-Gang/toolchain-site/blob/master/lancaster-consensus.md#minimum-supported-perl">currently targets Perl v5.8.1 as a minimum</a>.
It is of course possible to write constrained code that supports that version
by hand, but sometimes it is more natural to write code with more flexibility
(and for more comfortable reading later). This is also the goal of libraries
such as <a href="https://metacpan.org/release/TOBYINK/Mite-0.010008/view/lib/Mite.pm#WHY-IS-THIS">Mite</a>
which are targeted to be used in code that can be used along with the
toolchain.</p>
<p>The ability to easily add (and in some cases remove) keywords to <code>PPR</code>
distinguishes it from the current implementations of other Perl static
parsing/recognition tools. This can make it possible to use it as part of more
general refactoring tools. While refactoring tools do exist, they are mostly
written on top of <code>PPI</code>. In some cases where I wanted to use refactoring with
those tools on code that uses a non-core keyword, I have patched <code>PPI</code> myself.</p>
<h2 id="regexcopyandpaste">Regex copy and paste</h2>
<p>Using <code>PPR</code> to extract smaller parts of syntax often requires copy-and-pasting
the contents of a set of regex alternative branches from inside the large <code>PPR</code>
regex. This could be fixed by adding more rule names for parts of the syntax
that are less meaningful on their own such as in <code>::PostfixDeref</code>, I need
all the alternatives inside the <code>PerlTerm</code> rule that can take a postfix
dereference. However, I understand why one would not want to do this as
it would mean expanding the API with names that are less guaranteed to be
supported across versions.</p>
<p>Having the regex inlined into the plugin does have a benefit in that it lets
you quickly see exactly what is being matched. Perhaps to fix this there needs
to be a way to parse the regex itself for inclusion so that it does not have to
be updated for small changes (because grammars generally do not change
drastically).</p>
<h2 id="mixedparsing">Mixed parsing</h2>
<p>Tools for static parsing of Perl do what they say on the tin: they will not run
the code. This is understandable for security reasons. Perhaps there can be a
tradeoff between static parsing and dynamic parsing by allowing the parser to
run only the <code>use</code> statements. Another additional approach is to record
information on the imports and parser state whenever the code is run, for
example, when the test suite is run.</p>
<p>The implementation I used above for <code>Function::Parameters</code> currently assumes
that the imports are at the top-level lexical scope of the file. Many keyword
modules are lexically scoped so they can be turned on and off throughout a
single file. To do this properly, we would need to walk the scopes (like in
<code>Code::ART</code>) and create a list of modules that are allowed to be imported for
the parser to study what they export. This can be done in a container for
security and cached for performance.</p>
<h2 id="operatorprecedence">Operator precedence</h2>
<p>In order to safely rewrite complex expressions that are not fully
parenthesised, it will be necessary to implement operator precedence parsing.
This is one of the difficult parts of parsing Perl statically as features such
as <a href="https://www.perlmonks.org/?node_id=861966">prototypes</a> can influence
precedence and these can be defined far from the point of use in the
expression.</p>
<p>Neither <code>PPI</code><a href="http://enetdown.org//hacktivity/#fn:ppi-op-prec" id="fnref:ppi-op-prec" class="footnote">7</a> nor <code>PPR</code><a href="http://enetdown.org//hacktivity/#fn:ppr-op-prec" id="fnref:ppr-op-prec" class="footnote">8</a> implements full
<a href="https://perldoc.perl.org/5.36.0/perlop#Operator-Precedence-and-Associativity">operator precedence</a>
as part of their current implementation. On the other hand, <code>Compiler::Parser</code> and <code>Guacamole</code> both do implement operator precedence.
In <code>Compiler::Parser</code>, this is currently implemented by the
<a href="https://metacpan.org/release/GOCCY/Compiler-Parser-0.10/source/src/compiler/parser/Compiler_completer.cpp#L7">order of calls to a recursive descent
parser</a>.
<code>Guacamole</code> currently implements operator precedence by using <a href="https://metacpan.org/release/XSAWYERX/Guacamole-0.008/source/lib/Guacamole.pm#L152">a hierarchy of
grammar rules</a>.</p>
<h2 id="metacpanapi">MetaCPAN API</h2>
<p>To find code that I could process using <code>Babble</code>, I used the MetaCPAN
API. It is very useful to be able to look up all the distributions that use a
particular module or <code>grep</code> over all the code on CPAN to find places where a
symbol or syntax is used.</p>
<h1 id="appendix:listofissuesandpullrequestsparticipatedinalongtheway">Appendix: List of issues and pull requests participated in along the way</h1>
<h2 id="metacpan-examples">metacpan-examples</h2>
<ul>
<li><a href="https://github.com/metacpan/metacpan-examples/pull/23">Use Elasticsearch 2.x client and update some params by zmughal · Pull Request #23 · metacpan/metacpan-examples · GitHub</a></li>
<li><a href="https://github.com/metacpan/metacpan-examples/pull/24">Add debug flag for less verbose output by zmughal · Pull Request #24 · metacpan/metacpan-examples · GitHub</a></li>
<li><a href="https://github.com/metacpan/metacpan-examples/pull/25">Update reverse dependencies examples by zmughal · Pull Request #25 · metacpan/metacpan-examples · GitHub</a></li>
</ul>
<h2 id="babble">Babble</h2>
<ul>
<li><a href="https://github.com/shadow-dot-cat/Babble/pull/1">add broken test for a return that uses // by 2shortplanks · Pull Request #1 · shadow-dot-cat/Babble · GitHub</a></li>
<li><a href="https://github.com/shadow-dot-cat/Babble/pull/2">optimize away empty signature; prevents syntax error on 5.10.1 by djerius · Pull Request #2 · shadow-dot-cat/Babble · GitHub</a></li>
<li><a href="https://github.com/shadow-dot-cat/Babble/pull/3">don't treat array- or hash- lookups as postfix dereferencers by djerius · Pull Request #3 · shadow-dot-cat/Babble · GitHub</a></li>
<li><a href="https://github.com/shadow-dot-cat/Babble/pull/4">Add support for the ellipsis statement by zmughal · Pull Request #4 · shadow-dot-cat/Babble · GitHub</a></li>
<li><a href="https://github.com/shadow-dot-cat/Babble/pull/5">Use parent ::Match grammar in ::SubMatch by zmughal · Pull Request #5 · shadow-dot-cat/Babble · GitHub</a></li>
<li><a href="https://github.com/shadow-dot-cat/Babble/pull/6">Unique match positions by zmughal · Pull Request #6 · shadow-dot-cat/Babble · GitHub</a></li>
<li><a href="https://github.com/shadow-dot-cat/Babble/issues/7">SubstituteAndReturn: contextual and chained substitute · Issue #7 · shadow-dot-cat/Babble · GitHub</a></li>
<li><a href="https://github.com/shadow-dot-cat/Babble/pull/8">SubstituteAndReturn: chained and contextual by zmughal · Pull Request #8 · shadow-dot-cat/Babble · GitHub</a></li>
<li><a href="https://github.com/shadow-dot-cat/Babble/issues/9">Package syntax: versions and blocks · Issue #9 · shadow-dot-cat/Babble · GitHub</a></li>
<li><a href="https://github.com/shadow-dot-cat/Babble/pull/10">Add plugins for package block and package version by zmughal · Pull Request #10 · shadow-dot-cat/Babble · GitHub</a></li>
<li><a href="https://github.com/shadow-dot-cat/Babble/pull/11">Update use of PerlTerm for PPR ≥ v0.001000 by zmughal · Pull Request #11 · shadow-dot-cat/Babble · GitHub</a></li>
<li><a href="https://github.com/shadow-dot-cat/Babble/pull/12">PostfixDeref: Add support for postderef_qq, postfix ->$#* by zmughal · Pull Request #12 · shadow-dot-cat/Babble · GitHub</a></li>
<li><a href="https://github.com/shadow-dot-cat/Babble/issues/13">PostfixDeref: translation using map construct does not work with push, etc · Issue #13 · shadow-dot-cat/Babble · GitHub</a></li>
<li><a href="https://github.com/shadow-dot-cat/Babble/pull/14">DefinedOr: operator precedence for assignment + ConditionalExpression by zmughal · Pull Request #14 · shadow-dot-cat/Babble · GitHub</a></li>
<li><a href="https://github.com/shadow-dot-cat/Babble/pull/15">PostfixDeref: direct dereference by zmughal · Pull Request #15 · shadow-dot-cat/Babble · GitHub</a></li>
</ul>
<h2 id="perl::minimumversion">Perl::MinimumVersion</h2>
<ul>
<li><a href="https://github.com/neilb/Perl-MinimumVersion/pull/25">Add check for y///r feature by zmughal · Pull Request #25 · neilb/Perl-MinimumVersion · GitHub</a></li>
</ul>
<h2 id="perl5">perl5</h2>
<ul>
<li><a href="https://github.com/Perl/perl5/pull/20049">Document postderef_qq support for ->$#* interpolation by zmughal · Pull Request #20049 · Perl/perl5 · GitHub</a></li>
</ul>
<h2 id="ppi">PPI</h2>
<ul>
<li><a href="https://github.com/Perl-Critic/PPI/issues/270">Extending PPI · Issue #270 · Perl-Critic/PPI · GitHub</a></li>
<li><a href="https://github.com/Perl-Critic/PPI/issues/273">The Keyword Question · Issue #273 · Perl-Critic/PPI · GitHub</a></li>
</ul>
<h2 id="ppr">PPR</h2>
<ul>
<li><a href="https://rt.cpan.org/Public/Bug/Display.html?id=122794">Bug #122794 for PPR: s///e RHS is not validated (rt.cpan.org #122794)</a></li>
<li><a href="https://rt.cpan.org/Public/Bug/Display.html?id=143876">Bug #143876 for PPR: Interpolation within quote-like with single-quote delimiters (rt.cpan.org #143876)</a></li>
<li><a href="https://rt.cpan.org/Public/Bug/Display.html?id=143877">Bug #143877 for PPR: More cases for postfix deref (rt.cpan.org #143877)</a></li>
<li><a href="https://rt.cpan.org/Public/Bug/Display.html?id=143955">Bug #143955 for PPR: Parsing of Perl*AccessNoSpace + postderef_qq (rt.cpan.org #143955)</a></li>
<li><a href="https://rt.cpan.org/Public/Bug/Display.html?id=143966">Bug #143966 for PPR: state declaration + attribute regression (rt.cpan.org #143966)</a></li>
</ul>
<p></pre></div></p>
<div class="footnotes">
<hr />
<ol>
<li id="fn:three-little-words-talk"><p>For more information on <code>PPR</code>, see the highly entertaining talk
<a href="https://www.youtube.com/watch?v=ob6YHpcXmTg">Keynote by Damian Conway - "Three Little Words"</a>
at The Perl Conference 2017 (Washington, DC).<a href="http://enetdown.org//hacktivity/#fnref:three-little-words-talk" class="reversefootnote"> ↩</a></p></li>
<li id="fn:regex-regular-language"><p>This regex, like the regex in many modern regex engines, is using
<a href="https://en.wikipedia.org/wiki/Regular_expression#Patterns_for_non-regular_languages">features</a>
that go beyond matching just the formal language theory definition of a regular
language.<a href="http://enetdown.org//hacktivity/#fnref:regex-regular-language" class="reversefootnote"> ↩</a></p></li>
<li id="fn:ppr-data-structure"><p>You can still use this to create a data structure and
recover some of the information that a parser would as <a href="http://enetdown.org//hacktivity/posts/2022/09/04/gfx/ppr-data-structure.pl">this script</a> does.
This is the approach that Damian Conway describes <a href="https://youtu.be/fVnmYzJfy5s?start=4196&end=4271">here</a> in the Q & A of his
"A Simple Matter Of Programming" keynote describing <a href="https://p3rl.org/Code::ART">Code::ART</a>.<a href="http://enetdown.org//hacktivity/#fnref:ppr-data-structure" class="reversefootnote"> ↩</a></p></li>
<li id="fn:metacpan-candidates"><p>I wrote a <a href="https://github.com/zmughal-experiment/perl-leaning-toothpick-of-babble/blob/v0.0.1/bin/metacpan-candidate-dists-by-author.pl">script to find candidates for source code
transformation</a>.
It uses MetaCPAN to find distributions by a given CPAN author that have a
minimum Perl version ≥ v5.20.0 sorted by the number of reverse dependencies.
Running the script on <a href="https://metacpan.org/author/RJBS"><code>RJBS</code>'s distributions</a>
gives <code>Dist::Zilla</code> as a top candidate for this experiment.<a href="http://enetdown.org//hacktivity/#fnref:metacpan-candidates" class="reversefootnote"> ↩</a></p></li>
<li id="fn:install-sub"><p>It also supports use with subroutine wrappers such as those
provided by <a href="https://p3rl.org/Class::Method::Modifiers">Class::Method::Modifiers</a>.
These get turned into statements that pass in the subroutine block as an
anonymous subroutine.<a href="http://enetdown.org//hacktivity/#fnref:install-sub" class="reversefootnote"> ↩</a></p></li>
<li id="fn:archimedes"><p>See <a href="https://en.wikipedia.org/wiki/Noli_turbare_circulos_meos!">“Noli turbare circulos meos!”</a>.<a href="http://enetdown.org//hacktivity/#fnref:archimedes" class="reversefootnote"> ↩</a></p></li>
<li id="fn:ppi-op-prec"><p>As of <a href="https://metacpan.org/release/OALDERS/PPI-1.276">PPI v1.276</a>, the expression <code>2 + 4 * 8</code> parsed as:
<div class="highlight-txt"><pre class="hl"> # cpanm App::PPI::Dumper
$ echo "2 + 4 * 8" | ppi_dumper /dev/stdin
PPI::Document
PPI::Statement
PPI::Token::Number '2'
PPI::Token::Whitespace ' '
PPI::Token::Operator '+'
PPI::Token::Whitespace ' '
PPI::Token::Number '4'
PPI::Token::Whitespace ' '
PPI::Token::Operator '*'
PPI::Token::Whitespace ' '
PPI::Token::Number '8'
PPI::Token::Whitespace '\n'<a href="http://enetdown.org//hacktivity/#fnref:ppi-op-prec" class="reversefootnote"> ↩</a></p></li>
<li id="fn:ppr-op-prec"><p>As of PPR v0.001004, the subpattern
<a href="https://metacpan.org/release/DCONWAY/PPR-0.001004/view/lib/PPR.pm#(?&PerlInfixBinaryOperator)"><code>PerlInfixBinaryOperator</code></a>
is defined as several operators across precedence levels.<a href="http://enetdown.org//hacktivity/#fnref:ppr-op-prec" class="reversefootnote"> ↩</a></p></li>
</ol>
</div>
January PR Challenge: Clone: adding continuous integration and badgeshttp://enetdown.org//hacktivity/posts/2015/01/25/cpan-pr-challenge-january-clone/zaki2015-01-25T04:45:41Z2015-01-25T04:36:09Z
<p>When I saw the <a href="http://neilb.org/2014/11/29/pr-challenge-2015.html">CPAN PR Challenge</a>
come up in my feed, I signed up immediately.
I love giving back to FOSS and this challenge would push me to make
contributions that are outside of the usual software that I contribute to.</p>
<p>For January, I was assigned <a href="http://p3rl.org/Clone">Clone</a>. I looked at the
<a href="https://metacpan.org/requires/distribution/Clone">reverse dependencies</a>
and saw 181 packages. I immediately thought to myself, "I'd better be careful.
I don't want to break things.". This means that testing is very important for
this package.</p>
<p>I e-mailed the maintainer of Clone, Breno G. de Oliveira
(<a href="https://metacpan.org/author/GARU">garu</a>), about my assignment and he shot me
back an lengthy e-mail with all the things I could do. Some of them were easy,
such as:</p>
<ul>
<li>fix typos,</li>
<li>add continuous integration with <a href="https://travis-ci.org/">Travis-CI</a>,
code coverage with <a href="https://coveralls.io/">Coveralls</a>, and
adding badges for each of those.</li>
</ul>
<p>Others were a bit more involved:</p>
<ul>
<li>benchmarking against other packages such as Clone::PP and Storable,</li>
<li>adding more tests for different types of Perl variables,</li>
<li>go through the bug queue and fixing the open tickets.</li>
</ul>
<p>I went for the easy ones first. I knew that adding the Travis-CI integration
was just a matter of creating a <code>.travis.yml</code> file, but what actually goes in
that file can vary quite a deal. I had noticed that <a href="https://metacpan.org/author/HAARG">haarg</a>
had created a set of <a href="https://github.com/travis-perl/helpers">helper scripts</a>
that can grab various <a href="https://github.com/travis-perl/builds">pre-built Perl versions</a>
and run tests against them all.</p>
<p>I cloned the Clone repository and copied over the example <code>.travis.yml</code>:
<!-- format yaml --></p>
<div class="highlight-txt"><pre class="hl">language: perl
perl:
- "5.8" # normal preinstalled perl
- "5.8.4" # installs perl 5.8.4
- "5.8.4-thr" # installs perl 5.8.4 with threading
- "5.20" # installs latest perl 5.20 (if not already available)
- "blead" # install perl from git
matrix:
include:
- perl: 5.18
env: COVERAGE=1 # enables coverage+coveralls reporting
allow_failures:
- perl: "blead" # ignore failures for blead perl
before_install:
- git clone git://github.com/travis-perl/helpers ~/travis-perl-helpers
- source ~/travis-perl-helpers/init
- build-perl
- perl -V
- build-dist
- cd $BUILD_DIR # $BUILD_DIR is set by the build-dist command
install:
- cpan-install --deps # installs prereqs, including recommends
- cpan-install --coverage # installs converage prereqs, if enabled
before_script:
- coverage-setup
script:
- prove -l -j$((SYSTEM_CORES + 1)) $(test-dirs) # parallel testing
after_success:
- coverage-report
</pre></div>
<p>and enabled my fork of Clone in the Travis-CI and Coveralls settings.</p>
<p>After pushing this, the tests ran, but I kept seeing <code>n/a</code> code coverage on
Coveralls. I was very confused because the code coverage was working just fine
locally. I jumped on IRC and chatted with haarg. He pointed out that I was
using <code>prove -l</code> as in the example, but since Clone is a compiled module, I
needed to use <code>prove -b</code>.</p>
<p>Oh. Silly me! I had been using <code>prove -b</code> locally, but never changed the
<code>.travis.yml</code> file. That serves me right for copying-and-pasting without
looking! Something good came out of it though: <a href="https://rt.cpan.org/Public/Bug/Display.html?id=101601">this ticket</a>
for <code>Test::Harness</code> has suggestions that will help catch this error if anyone else
makes the same mistake.</p>
<p>haarg also pointed me to an even <a href="https://github.com/haarg/Devel-Confess/blob/4a2f970b060b9bf23eed61afc22a7dc5459db688/.travis.yml">simpler <code>.travis.yml</code>
file</a>
that he was working on that just had the lines
<!-- format yaml --></p>
<div class="highlight-txt"><pre class="hl">before_install:
- eval $(curl https://travis-perl.github.io/init) --auto
</pre></div>
<p>and a list of the Perl versions to test. I used that and ran it through
Travis-CI and everything just worked!</p>
<p>Now all I had to do was grab the HTML for the badges and put them in the POD
and Markdown. I went to the Travis-CI and Coveralls pages and copied the
Markdown for those badges and then went to <a href="http://badge.fury.io/for/pl">http://badge.fury.io/for/pl</a> and
entered in <code>Clone</code> to get a version badge for Clone on CPAN.</p>
<p>I then made a few grammar fixes and converted the POD into Markdown for the README
and I was done!</p>
<p>The pull request with my changes is at <a href="https://github.com/garu/Clone/pull/4">https://github.com/garu/Clone/pull/4</a>
and my changes are in <a href="https://metacpan.org/release/GARU/Clone-0.38">Clone v0.38</a>.</p>
<p><a href="http://enetdown.org//hacktivity/posts/2015/01/25/gfx/clone-badges-github.png"><img src="http://enetdown.org//hacktivity/posts/2015/01/25/gfx/clone-badges-github.png" width="835" height="244" alt="Badges for Clone on GitHub" class="img" /></a></p>
<p><a href="http://enetdown.org//hacktivity/posts/2015/01/25/gfx/clone-badges-cpan.png"><img src="http://enetdown.org//hacktivity/posts/2015/01/25/gfx/clone-badges-cpan.png" width="412" height="206" alt="Badges for Clone on CPAN" class="img" /></a></p>
converting Alien::GMP to use Alien::Basehttp://enetdown.org//hacktivity/posts/2015/01/25/converting-alien-gmp-to-use-alien-base/zaki2016-06-13T07:17:08Z2015-01-25T03:47:56Z
<p>A while back, I wrote <a href="http://p3rl.org/Unicode::Number">Unicode::Number</a> which
was based on <a href="http://billposer.org/Software/libuninum.html">libuninum</a>. This is
a library that can convert numbers written in various languages to integers and
vice versa. I also wrote a library to install libuninum automatically,
<a href="http://p3rl.org/Alien::Uninum">Alien::Uninum</a>, with the help of
<a href="http://p3rl.org/Alien::Base">Alien::Base</a>.</p>
<p>This all worked quite well, but I wanted to go a step further. libuninum can
support the numbers stored with the <a href="https://gmplib.org/">GNU Multiple Precision Arithmetic
Library</a> (libgmp). This allows converting to and from
arbitrarily long numbers. To support this, the computer must have libgmp installed.</p>
<p>So I thought to myself, why not write Alien::GMP and install it myself?</p>
<p>Well, <a href="http://p3rl.org/Alien::GMP">Alien::GMP</a> already exists and is authored
by Richard Simões, but it bundles an old version of libgmp (v5.0.4).
Alien::GMP should be able to download the latest version and install that.</p>
<p>So I created <a href="https://github.com/rsimoes/Alien-GMP/issues/1">an issue</a> to point
out that it needed updates. That led to me getting co-maintainership on the
package.</p>
<p>I went ahead and pointed Alien::GMP to the download page for the source code,
but it needed HTTPS: <a href="https://gmplib.org/download/gmp/">https://gmplib.org/download/gmp/</a>. Alien::Base didn't have
support for HTTPS, so I <a href="https://github.com/Perl5-Alien/Alien-Base/pull/98">added support</a>.</p>
<p>I could finally get back to Alien::GMP.</p>
<p>I cleared out the original code and made Alien::GMP inherit from Alien::Base.</p>
<p>I also added support for using the tool with <a href="http://p3rl.org/Inline">Inline</a>
so that it is easy to compile with other code. I just need to change the tests
that look for <code>gmp.h</code> and <code>libgmp.so</code> and the module was good to go.
The overall changes can be seen at <a href="https://github.com/zmughal/p5-Alien-GMP/compare/zmughal:v0.0.6...v0.0.6_01">https://github.com/zmughal/p5-Alien-GMP/compare/zmughal:v0.0.6...v0.0.6_01</a>
and a new dev release of Alien::GMP is at <a href="https://metacpan.org/release/ZMUGHAL/Alien-GMP-v0.0.6_01">https://metacpan.org/release/ZMUGHAL/Alien-GMP-v0.0.6_01</a>.</p>
Using Alien::Base, Dist::Zilla, and EU::MMhttp://enetdown.org//hacktivity/posts/2013/12/24/using-alien-base-dist-zilla-and-eu-mm/2014-09-28T18:55:29Z2013-12-24T21:28:11Z
<p>Since I use <a href="http://p3rl.org/Dist::Zilla">Dist::Zilla</a> to help manage my
Perl distributions, I wanted to use it with the XS package that I am working
on. This post is just a small note how how to do that if you are using
<a href="http://p3rl.org/Alien::Base">Alien::Base</a> to build your native library.</p>
<p>Dist::Zilla usually writes it's own <code>Makefile.PL</code> so that
<a href="ExtUtils::MakeMaker">ExtUtils::MakeMaker</a> will know how to build, test, and
install the code. However, since I'm using Alien::Base, I need to pass the
compiler and linker flags to ExtUtils::MakeMaker as well. To do that, I grabbed
the <a href="http://p3rl.org/Dist::Zilla::Plugin::MakeMaker::Awesome">Dist::Zilla::Plugin::MakeMaker::Awesome</a>
plugin. Setting that up in your <code>dist.ini</code> is relatively straightforward:</p>
<div class="inlinepage">
<div class="inlineheader">
<span class="header">
<a href="http://enetdown.org//hacktivity/posts/2013/12/24/gfx/dist.ini">dist.ini</a>
</span>
</div>
<div class="inlinecontent">
</div>
<div class="inlinefooter">
<span class="pagedate">
Posted <span class="date">Sun Sep 28 18:55:29 2014</span>
</span>
</div>
</div>
<p>The line
<code>
[=inc::MyLibMakeMaker]
</code>
specifies that the code that will be used to generate the <code>Makefile.PL</code> will be
in a module called <code>inc/MyLibMakeMaker.pm</code>. Now, in that file, I'll need to specify
the compilation flags by calling the <code>cflags</code> and <code>libs</code> methods on my Alien::Base
subclass (Alien::MyLib). But this needs to happen when <code>Makefile.PL</code> is run by
the user, not when Dist::Zilla writes out the file. The following code does that
by appending our own options to the string we write out to in <code>Makefile.PL</code>.</p>
<div class="inlinepage">
<div class="inlineheader">
<span class="header">
<a href="http://enetdown.org//hacktivity/posts/2013/12/24/gfx/MyLibMakeMaker.pm">MyLibMakeMaker.pm</a>
</span>
</div>
<div class="inlinecontent">
</div>
<div class="inlinefooter">
<span class="pagedate">
Posted <span class="date">Sun Sep 28 18:55:29 2014</span>
</span>
</div>
</div>
<p>We use the <code>CONFIGURE</code> option to set <code>CCFLAGS</code> and <code>LIBS</code> instead of setting
<code>CCFLAGS</code> and <code>LIBS</code> directly because these need to be set after the
<code>Alien::MyLib</code> prerequisite has been met.</p>
Unicode encodings and endianness — writing libuninum bindingshttp://enetdown.org//hacktivity/posts/2013/12/23/unicode-encodings-and-endianness-writing-libuninum-bindings/zaki2014-09-28T19:17:11Z2013-12-23T19:32:33Z
<p>The past few days I've been learning how to write bindings for Perl using
<a href="http://perldoc.perl.org/perlxs.html">XS</a> so that I can use the many great
libraries out there that I normally use in C or C++. Native bindings are very
magical things because they glue together different languages that often don't
have a direct mapping of semantics with respect to each other. XS is a bit
quirky in that, while most language binding APIs require writing calls directly
in C or C++, it is actually it's own DSL for making bindings. There is a
preprocessor called <a href="http://perldoc.perl.org/xsubpp.html">xsubpp</a> that
generates the actual API calls to glue the Perl interpreter with the native
code.</p>
<p>I actually wanted to start learning XS a few months back. In the past,
I would put together rudimentary bindings using <a href="http://www.swig.org/">SWIG</a>,
but the results weren't very pleasant to use. It ends up creating bindings that
look very much like calling C code and force you to deal with pointers and context
directly. That pretty much defeats the purpose of creating a binding! So now that
I have a bit more <a href="http://en.wiktionary.org/wiki/round_tuit">tuits</a>, I started looking
around for documentation on using XS. Coincidentally, I found a
<a href="https://github.com/Perl-XS/notes">project</a> that gathered many of the same
notes I was using. Seems that I timed my learning process just
<a href="http://www.nntp.perl.org/group/perl.xs/2013/12/msg2749.html">right</a> and I've
been learning a great deal about Perl internals from the newly relaunched <code>#xs</code>
channel on <a href="http://www.irc.perl.org/">irc.perl.org</a>.</p>
<p>As I usually do when I'm learning something new, I jump right into making
something as I'm picking things up. I chose to work on something that was both
simple, but non-trivial. Years ago on Freshmeat, I came across a project called
<a href="http://billposer.org/Software/libuninum.html">libuninum</a> that converts
different number system strings into integers. Once you have these integers,
you can use them in operations for arithmetic and sorting. Pretty useful if you
have to deal with data in different languages.</p>
<p>Before I actually hack on the bindings, I need to think about how I'm going to
distribute this code. Most people's systems aren't going to have access to the libuninum
source code to build these bindings, so I'll need to somehow get the source
code and build it on those systems. That's where <a href="https://metacpan.org/release/Alien-Base">Alien::Base</a>
comes in. It's a neat module that will download a tarball, extract it, build
it, and place the dynamic library and headers in a place that can be accessed
by other modules. I made a subclass of Alien::Base called
<a href="https://github.com/zmughal/p5-Alien-Uninum">Alien::Uninum</a> that will do just
that for libuninum. I even got a small <a href="https://github.com/jberger/Alien-Base/pull/31">patch</a> in to
Alien::Base to fix some issues I had. All I needed now to start hacking on the XS code is a
way to tell the compiler where all the libuninum files are. With Alien::Base,
I just send those to the package build process using the <code>cflags</code>
and <code>libs</code> methods which is pretty much like using <code>pkg-config</code>
(<a href="https://github.com/zmughal/p5-Unicode-Number/blob/dfe5abea501a830e159f8271be188cfc129baa0e/inc/UninumMakeMaker.pm">code</a>).</p>
<p>I got to hacking and started on the simplest task: getting the list of all the
number systems. I first approached this by just making a list of hashes that
contained the name and ID of each number system
(<a href="https://github.com/zmughal/p5-Unicode-Number/blob/86b5951d0e2a4b3956e6806331ea0a7f2a3a8734/Number.xs#L26">code</a>).
Not too bad. I then added caching of that list by storing that as a private
attribute of my <code>Unicode::Number</code> class
(<a href="https://github.com/zmughal/p5-Unicode-Number/commit/e0625c2ecf2c7a448c174fe78ed409456b93b2da">code</a>).
Then I built on that and created a <code>Unicode::Number::System</code> class to
store the number system name and ID so that I could return instances of that
instead (<a href="https://github.com/zmughal/p5-Unicode-Number/blob/71d77361ad780574a8ae235061089befe23d5e9f/Number.xs#L88">code</a>).</p>
<p>I then moved on to to the actual main function of the library: converting a
Unicode number to an integer. This was a bit tricky because Unicode comes in
many different encodings (e.g. UTF-8, UTF-16, UTF-32) and these encodings can
also have different endianness. Since the libuninum library expects all strings
to be in UTF-32, I converted Perl strings from UTF-8 to UTF-32 and sent them to
the XS code, but the library was giving me an "illegal character" error. To
debug this, I grabbed some of the data from an example file that came with
libuninum and put it in my XS. Still not working. This didn't make sense
because I could get it working in plain C, but not in the XS. So I put together
a small script using <a href="https://metacpan.org/pod/Inline::C">Inline::C</a> that let
me call the libuninum function directly.</p>
<div class="inlinepage">
<div class="inlineheader">
<span class="header">
<a href="http://enetdown.org//hacktivity/posts/2013/12/23/gfx/inline-test.pl">inline-test.pl</a>
</span>
</div>
<div class="inlinecontent">
</div>
<div class="inlinefooter">
<span class="pagedate">
Posted <span class="date">Sun Sep 28 18:55:29 2014</span>
</span>
</div>
</div>
<p>It still wasn't working. So, as you can see above, I grabbed a function from
<code>uninum.c</code> and renamed it to <code>MyLaoToInt</code> and called it directly. Still wasn't
working. Only when I started to print out the contents of each character did I
realise what was happening. In libuninum's <code>unicode.h</code>, the <code>UTF32</code> typedef is
defined as an <code>unsigned long</code>, however <code>sizeof(unsigned long)</code> is 8 (64-bits)
on my system, not 4 (32-bits).</p>
<div class="inlinepage">
<div class="inlineheader">
<span class="header">
<a href="http://enetdown.org//hacktivity/posts/2013/12/23/gfx/libuninum-2-7_unicode_snip.h">libuninum-2-7 unicode snip.h</a>
</span>
</div>
<div class="inlinecontent">
</div>
<div class="inlinefooter">
<span class="pagedate">
Posted <span class="date">Sun Sep 28 19:05:51 2014</span>
</span>
</div>
</div>
<p>That means that as the library iterates over each character, it is actually
looking at two characters instead of one and of course, none of the comparisons
were working. What it actually needed to use was a <code>uint32_t</code> from <code>stdint.h</code>.
However, even though this typedef is in the C99 standard, there are some portability
issues with using it. Instead, I used the integer type that Perl detected to be
32-bits wide and patched the code when I built it using Alien::Uninum
(<a href="https://github.com/zmughal/p5-Alien-Uninum/blob/6d28c2fab8e22d1164309de23a92a724982fb1d6/inc/Alien/Uninum/ModuleBuild.pm#L75">code</a>). Now the file looked like this:</p>
<div class="inlinepage">
<div class="inlineheader">
<span class="header">
<a href="http://enetdown.org//hacktivity/posts/2013/12/23/gfx/libuninum-2-7_unicode-patched_snip.h">libuninum-2-7 unicode-patched snip.h</a>
</span>
</div>
<div class="inlinecontent">
</div>
<div class="inlinefooter">
<span class="pagedate">
Posted <span class="date">Sun Sep 28 19:05:51 2014</span>
</span>
</div>
</div>
<p>Yay! Now the XS code was working on the test data. All I had to do now was get
my string to libuninum and pass the result back. I tried that and libuninum was giving me errors again.
Now what?! I decided I need to look at what the C was accessing, so I grabbed a
hex dump routine from
<a href="http://c2.com/cgi/wiki?HexDumpInManyProgrammingLanguages">here</a> and looked at it:</p>
<p><code>
00 00 fe ff ...
</code></p>
<p>As soon as I saw the first character, I knew what was going on. What I was
looking at was the <a href="http://en.wikipedia.org/wiki/Byte_order_mark">byte-order mark</a> or BOM. Remember, I had
converted the UTF-8 string to UTF-32 in Perl before sending it to C, but I
never specified the endianness, so Perl used big-endian as the <a href="http://perldoc.perl.org/Encode/Unicode.html#by-endianness">default endianness</a>.
Well, since the C code was using the native endianness of the machine, I needed to
find the machine's endianness and encode either a little-endian or
big-endian version of UTF-32. I just had to ask Perl the byte order it
detected at compile time and use that (<a href="https://github.com/zmughal/p5-Unicode-Number/blob/89cfb5471235dc2d23d9a490417d9b7e558266cf/lib/Unicode/Number.pm#L122">code</a>).</p>
<p>Once I did that, my code was working and all my tests passed! There are still a
couple of things I need to do in order to clean it up, but it's mostly
done for now.</p>