hacktivity

	December 2022
S	M	T	W	T	F	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

zaki

The Babble SubstituteAndReturn plugin needs optimisation. Time to break out the profiler and see where the code journey takes us.

Posted Wed Nov 23 04:35:37 2022 Tags: code parsing perl programming language regex

zaki

Porting Devel::IPerl to use ZMQ::FFI instead of ZMQ::LibZMQ3 for easier installs.

Posted Wed Sep 14 20:06:25 2022 Tags: code iperl jupyter perl programming language vcpkg zeromq

zaki

Fixing a couple more bugs on Babble and investigating a regex compilation warning

Posted Thu Sep 8 14:26:34 2022 Tags: code parsing perl programming language regex

zaki

Perl is difficult to parse. Let us do it anyway and transform code we did not write.

Posted Mon Sep 5 04:00:26 2022 Tags: code parsing perl programming language regex

zaki

How I added automatic testing and code coverage to the Perl Clone package on GitHub.

Posted Sun Jan 25 04:36:09 2015 Tags: perl

zaki

How Alien::Base can simplify building Perl5 Alien modules

Posted Sun Jan 25 03:47:56 2015 Tags: perl

Since I use Dist::Zilla to help manage my Perl distributions, I wanted to use it with the XS package that I am working on. This post is just a small note how how to do that if you are using Alien::Base to build your native library.

Dist::Zilla usually writes it's own Makefile.PL so that ExtUtils::MakeMaker will know how to build, test, and install the code. However, since I'm using Alien::Base, I need to pass the compiler and linker flags to ExtUtils::MakeMaker as well. To do that, I grabbed the Dist::Zilla::Plugin::MakeMaker::Awesome plugin. Setting that up in your dist.ini is relatively straightforward:

Posted Sun Sep 28 18:55:29 2014

The line [=inc::MyLibMakeMaker] specifies that the code that will be used to generate the Makefile.PL will be in a module called inc/MyLibMakeMaker.pm. Now, in that file, I'll need to specify the compilation flags by calling the cflags and libs methods on my Alien::Base subclass (Alien::MyLib). But this needs to happen when Makefile.PL is run by the user, not when Dist::Zilla writes out the file. The following code does that by appending our own options to the string we write out to in Makefile.PL.

Posted Sun Sep 28 18:55:29 2014

We use the CONFIGURE option to set CCFLAGS and LIBS instead of setting CCFLAGS and LIBS directly because these need to be set after the Alien::MyLib prerequisite has been met.

Posted Tue Dec 24 21:28:11 2013 Tags: perl

zaki

The past few days I've been learning how to write bindings for Perl using XS so that I can use the many great libraries out there that I normally use in C or C++. Native bindings are very magical things because they glue together different languages that often don't have a direct mapping of semantics with respect to each other. XS is a bit quirky in that, while most language binding APIs require writing calls directly in C or C++, it is actually it's own DSL for making bindings. There is a preprocessor called xsubpp that generates the actual API calls to glue the Perl interpreter with the native code.

I actually wanted to start learning XS a few months back. In the past, I would put together rudimentary bindings using SWIG, but the results weren't very pleasant to use. It ends up creating bindings that look very much like calling C code and force you to deal with pointers and context directly. That pretty much defeats the purpose of creating a binding! So now that I have a bit more tuits, I started looking around for documentation on using XS. Coincidentally, I found a project that gathered many of the same notes I was using. Seems that I timed my learning process just right and I've been learning a great deal about Perl internals from the newly relaunched #xs channel on irc.perl.org.

As I usually do when I'm learning something new, I jump right into making something as I'm picking things up. I chose to work on something that was both simple, but non-trivial. Years ago on Freshmeat, I came across a project called libuninum that converts different number system strings into integers. Once you have these integers, you can use them in operations for arithmetic and sorting. Pretty useful if you have to deal with data in different languages.

Before I actually hack on the bindings, I need to think about how I'm going to distribute this code. Most people's systems aren't going to have access to the libuninum source code to build these bindings, so I'll need to somehow get the source code and build it on those systems. That's where Alien::Base comes in. It's a neat module that will download a tarball, extract it, build it, and place the dynamic library and headers in a place that can be accessed by other modules. I made a subclass of Alien::Base called Alien::Uninum that will do just that for libuninum. I even got a small patch in to Alien::Base to fix some issues I had. All I needed now to start hacking on the XS code is a way to tell the compiler where all the libuninum files are. With Alien::Base, I just send those to the package build process using the cflags and libs methods which is pretty much like using pkg-config (code).

I got to hacking and started on the simplest task: getting the list of all the number systems. I first approached this by just making a list of hashes that contained the name and ID of each number system (code). Not too bad. I then added caching of that list by storing that as a private attribute of my Unicode::Number class (code). Then I built on that and created a Unicode::Number::System class to store the number system name and ID so that I could return instances of that instead (code).

I then moved on to to the actual main function of the library: converting a Unicode number to an integer. This was a bit tricky because Unicode comes in many different encodings (e.g. UTF-8, UTF-16, UTF-32) and these encodings can also have different endianness. Since the libuninum library expects all strings to be in UTF-32, I converted Perl strings from UTF-8 to UTF-32 and sent them to the XS code, but the library was giving me an "illegal character" error. To debug this, I grabbed some of the data from an example file that came with libuninum and put it in my XS. Still not working. This didn't make sense because I could get it working in plain C, but not in the XS. So I put together a small script using Inline::C that let me call the libuninum function directly.

Posted Sun Sep 28 18:55:29 2014

It still wasn't working. So, as you can see above, I grabbed a function from uninum.c and renamed it to MyLaoToInt and called it directly. Still wasn't working. Only when I started to print out the contents of each character did I realise what was happening. In libuninum's unicode.h, the UTF32 typedef is defined as an unsigned long, however sizeof(unsigned long) is 8 (64-bits) on my system, not 4 (32-bits).

Posted Sun Sep 28 19:05:51 2014

That means that as the library iterates over each character, it is actually looking at two characters instead of one and of course, none of the comparisons were working. What it actually needed to use was a uint32_t from stdint.h. However, even though this typedef is in the C99 standard, there are some portability issues with using it. Instead, I used the integer type that Perl detected to be 32-bits wide and patched the code when I built it using Alien::Uninum (code). Now the file looked like this:

Posted Sun Sep 28 19:05:51 2014

Yay! Now the XS code was working on the test data. All I had to do now was get my string to libuninum and pass the result back. I tried that and libuninum was giving me errors again. Now what?! I decided I need to look at what the C was accessing, so I grabbed a hex dump routine from here and looked at it:

00 00 fe ff ...

As soon as I saw the first character, I knew what was going on. What I was looking at was the byte-order mark or BOM. Remember, I had converted the UTF-8 string to UTF-32 in Perl before sending it to C, but I never specified the endianness, so Perl used big-endian as the default endianness. Well, since the C code was using the native endianness of the machine, I needed to find the machine's endianness and encode either a little-endian or big-endian version of UTF-32. I just had to ask Perl the byte order it detected at compile time and use that (code).

Once I did that, my code was working and all my tests passed! There are still a couple of things I need to do in order to clean it up, but it's mostly done for now.

Posted Mon Dec 23 19:32:33 2013 Tags: c perl unicode