|

Source: ONLamp.com Working with complex, nested data structures takes practice and patience. It helps to be able to visualize your data. Data::Dumper is one of the oldest and most widely used modules because it does what it says – it serializes a Perl data structure to its equivalent Perl code.
It’s not a perfect module, though. Its default output is a little verbose (if customizable), it can use a lot of memory, and it can be slow. It also doesn’t handle complex references well.
Data::Dump::Streamer is a newer alternative that works better in some cases. Here’s what I learned from playing with it one afternoon.
Inside DDS
The API is similar, but not exactly equivalent, to the Data::Dumper interface. In particular, there’s a separate analysis and output phase. The Dumper() equivalent seems to be:
print Dump( $some_var )-Out();
You must always call Dump() or some alternative before Out().
Don’t Lose Lexical Names
For the simple case, the two modules are roughly equivalent. There are some other nice features, too. DumpLex() is very handy if you have PadWalker installed:
use Data::Dump::Streamer;
my $some_hash = { foo = 1, bar = 2, baz = 3 }; print DumpLex( $some_hash )-Out();
… produces:
$some_hash = { bar = 2, baz = 3, foo = 1 };
This doesn’t always work though; you have to have access to the pad:
use Data::Dump::Streamer;
{ my %closed_over;
sub foo { $closed_over{foo}++; }
sub bar { $closed_over{bar}++; }
sub get_co { return \%closed_over; } }
foo(); bar(); print DumpLex( get_co() )-Out();
… prints:
Use of uninitialized value in substitution (s///) Use of uninitialized value in substitution (s///) $HASH1 = { bar = 1, foo = 1 };
That is, if you’re outside of the scope of the variable, DDS can’t (easily or reliably) get the lexical’s name.
If you want to serialize the code with something slightly better than the global variable case (do somefile.pl), use the Declare() method to declare lexicals:
use Data::Dump::Streamer;
my ($x, $y); ($x, $y) = \($y, $x);
print Dump( $x, $y )-Declare( 1 )-Out();
… produces:
my $REF1 = 'R: $REF2'; my $REF2 = \$REF1; $REF1 = \$REF2;
(This is probably more useful than this example makes it seem.)
Peek Inside Subroutine References
One of the nicest features is that dumping objects containing closures works:
use Data::Dump::Streamer;
my %closed_over; my %held_subs = ( foo = sub { $closed_over{foo}++ }, bar = sub { $closed_over{bar}++ }, );
my $object = bless \%held_subs, 'Some::Class';
print DumpLex( $object )-Out();
… produces:
my (%closed_over); %closed_over = (); $object = bless( { bar = sub { use warnings; use strict 'refs'; $closed_over{'bar'}++; }, foo = sub { use warnings; use strict 'refs'; $closed_over{'foo'}++; } }, 'Some::Class' );
Yes, that does imply that dumping subroutine references works too. (The extra use lines in the dumped subroutines come from B::Deparse, not Data::Dump::Streamer.)
Trying to Break Things
Arguably, DDS handles a few pathological cases better than Data::Dumper:
use Data::Dumper; use Data::Dump::Streamer;
my ($x, $y); ($x, $y) = \($y, $x);
print Dumper( $x, $y );
print "\n";
print Dump( $x, $y )-Out();
… produces:
$VAR1 = \\$VAR1; $VAR2 = ${$VAR1};
$REF1 = \$REF2; $REF2 = \$REF1;
When and Why Streaming Matters
One of my biggest frustrations with Data::Dumper is that it builds the entire serialized string in memory first before writing it. That can take a while. I don’t have a good example of this, but here’s a test program that builds a deep data structure and serializes it.
use Data::Dumper; use Data::Dump::Streamer;
my $data = {}; my $top = $data;
for ( 1 .. 5000 ) { $data = $data-{foo} = {}; }
print Dumper( $top ); # print DumpLex( $top )-Out;
I ran this a couple of times with 1000 iterations and a couple of times with 5000 iterations. (I also redirected STDOUT to /dev/null to remove some of the IO timing.) This is not scientific and barely a benchmark, but the results are interesting.
With the 1000-level hash reference, Data::Dumper finished in under a second, while Data::Dump::Streamer took just over two seconds. For the larger hash reference, DDS took around 14 seconds, while Data::Dumper took between 48 and 60 seconds. It also used around twice as much memory, at least according to top (both virtual and resident).
I don’t deal with data structures that complex very often (and the overhead of that much IO probably matches the overhead of visiting such data structures), but the other convenience features of DDS make it compelling.
A better benchmark might also test the latency of requests — that is, I expect DDS to start producing output sooner, which can be important in some contexts — say, web programming. (I ran the test again without redirecting the output. I interrupted the Data::Dumper version after almost 14 seconds and there was no output. I interrupted the Data::Dump::Streamer version at almost the same time and it had finished its output.)
Concluding Thoughts
I usually use YAML for peering at complex data structures, but DDS works really well as a code-aware serialization module. If you use Data::Dumper often, try Data::Dump::Streamer for a few days instead. Its documentation explains a few other convenient features you might not have realized that you missed.

Source: ONLamp.com One of the most powerful features of PostgreSQL is its support for user-defined functions. The language to learn is PL/pgSQL, an unpronounceable but powerful way to write UDFs. David Wheeler introduces the language and demonstrates why UDFs are useful.

Source: ONLamp.com A few user-friendly distributions of FreeBSD have appeared lately. PC-BSD is one suitable for the corporate and home desktops, even those of users unfamiliar with Unix. Dru Lavigne walks through the installation and configuration of PC-BSD to provide a modern, powerful workstation.

Source: ONLamp.com Ubuntu Dapper Flight 7 came out earlier this week and I decided to do a fresh install on my laptop. Part of the reason for a fresh install is that I’m ready for a new Ubuntu to install and automagically configure my system as a few things have gotten unconfigured and I can’t seem to re-configure them properly. Another reason is that I’d like to get Windows on this laptop as well as Ubuntu and the easiest way to do that is to repartition the hard drive and do a from-scratch install, anyway. When I booted up after installing Dapper, my laptop was set at a proper 1920×1200 and it looked great. The video was configured to do direct rendering, but was using the xorg ati driver rather than the ATI proprietary fglrx driver. The 3d performance was pitiful and the quality of playing videos was poor, so I decided to try the fglrx driver. It only took a one-line change to my xorg.conf file (changing “ati” to “fglrx”) and it just worked. Another thing X related that just worked without any configuration on my part was the Synaptics touch pad scroll area. I’ve gotten this working in the past, but only with some hacking on the xorg.conf file. Of course, none of the video files I tried to play worked since their respective codecs don’t come installed with Ubuntu. I Installed win32codecs and VLC was able to play everything I tried. xine is having some trouble seeing any codecs I give it, so I’ll keep plugging away at xine and use VLC in the meantime. Or maybe I’ll just stick with VLC. One of the things that had come “un-configured” was the usability of the multimedia keys on the front of the laptop, particularly the volume buttons. I could raise and lower the volume some, but couldn’t get the volume all the way down or all the way up and mute wouldn’t work, either. I had to use the volume control in order to adjust the volume. The multimedia keys now work. For the first time ever, both the installed suspend and hibernate routines work with my laptop. Previously, I have had to use a hacked suspend script to get my system to suspend, but now I don’t have to. I’ve never had a real interest in hibernate and this is the first time I’ve seen it working. My system hibernates and awakes much faster than I was expecting. I’ll probably start using this option more, especially as I boot between Windows and Ubuntu. Overall performance is really good. Not only does the system feel more responsive than when I installed Breezy, but memory usage appears much lower now, too. Dapper installed with the typical set of applications: Firefox for browsing, Evolution for email, Totem for video, Ekiga and Gaim for chat, the OpenOffice suite, Gimp and Gthumb for graphics and a few other odds and ends. It really is amazing what you can fit on one CD. The quality of these open source applications compared to their closed source, non-free alternatives is also quite amazing. As I mentioned earlier, this is now a dual-booting laptop. I installed Windows XP Home edition first, then Ubuntu. (Just as a random note at this point, I am using Grub to manage booting. As I was installing Dapper, it recognized that Windows XP Home was already installed and asked if I wanted it to manage booting that OS as well. I told it “yes” and everything works perfectly with regard to booting.) It is interesting that Ubuntu recognized more of my hardware and configured it properly than Windows did. Granted, I was able to download drivers from Dell’s website and installing them isn’t that big of a deal, but it is really nice to have my system “just work”. This release of Dapper is the easiest, most usable OS install I think I have ever experienced. Nothing is nagging me that I feel I need to put on a to-do list of “must get working”. Sure, xine isn’t working right, but I have VLC as a good alternative. Not bad if that is my worst complaint about the install. To sum things up, this was an excellent install of Ubuntu, probably the best OS install I’ve ever experienced both in that installation itself and the usability afterward. It is honestly difficult for me to imagine things getting any better, but I’m sure they will.

Source: ONLamp.com The latest issue of Dr. Dobb’s Journal arrived today. Frankly, I’m not a fan of this magazine but they keep offering me a free subscription and I keep hoping I’ll get something useful out of it. The problem with Dr. Dobb’s is that it’s just not relevant to me as a programmer. With its heavy focus on Java, C++, .Net and similar languages, I find it useful for little more than taking up room in my recycling bin. With the latest issue, though, there’s hope. The cover story is the luridly titled “Ruby On Rails - Java’s Successor?”
First, let me make clear that I’m biased. Like many in my field, I latch onto an idea and I tend to filter new ideas through the old one. Admittedly, this means I don’t always see things accurately and I freely admit that my bias could be wrong, but it’s there and I seem to be stuck with it. You see, I’m biased in favor of economics. I went to college to be an economist and came out a programmer. Go figure.
How does economics influence my decisions? Well, in the case of programming languages, I am convinced that “dynamic” and type inference languages are going to crush “static” languages (yes, I’m quite aware that there’s a huge amount of ambiguity in those terms. Hopefully those familiar with the debates will cut me some slack.) Why do I believe this? Let’s have a history lesson..
Even though I’m in my thirties, I used to be a mainframe programmer. I had lots of fun (cough) working on programs written before I was born. I was in a pretty standard environment where our COBOL programs would be called by JCL, a primitive scripting language which forced the programmer to worry about such trivial detail such as how many tracks, cylinders or blocks on a disk to allocate to a program, how much extra space could be allowed if the allocated space is exceeded, the record length of the file (newlines did not typically delimit records) and so on. Further, some of this information gets duplicated in your COBOL program so it has to be kept in synch with the JCL. The idea of just opening a file and using it is pretty foreign to COBOL.
The difficulty of working with files (ahem, “datasets”) in COBOL is not just some quirk of the language. There are many things in COBOL which are tough to do. While I’m not a fan of the “For Dummies” series, I just happen to have a copy of the book COBOL for Dummies. The last chapter is entitled “Ten Tasks That Are Really Hard To Do in COBOL” and then proceeds, amusingly enough, to list nine tasks:
Determing the Actual Size of a Record Arranging Data into Columns Extracting Part of a Text String Combining Text Strings Writing Comma-Delimited Text Reading Comma-Delimited Text Converting Between Upper-and Lowercase Finding a Square Root Generating Random Numbers
Can you believe that? Those are hard to do in COBOL! However, there’s a logical reason for that. Back when COBOL was first introduced, computers were very expensive relative to programmers. Programmers would carefully desk-check their programs to avoid bugs. They would go through them line by line looking for problems. None of this “run the damned thing and see if it breaks” tomfoolery. Computers were so expensive that it was important that as much of the work be shifted to the programmer as possible. As a result, COBOL didn’t do a lot of work to “just open the file and use it”. It didn’t offer a lot of built-in string processing. Difficult math wasn’t available. You usually read records directly into the variables you needed, you did some very simple processing and wrote the new data back out. That would be an entire, simple program but it would take a long time to write compared to today’s languages.
With the languages most of us were familiar with in the 80s and early 90s, programmer productivity gained enormously. For example, C, C++ and Java were all much faster languages to write in than COBOL. Because computers are so much cheaper and the languages made the programmers more productive, more interesting software could be built. However, you remember the brouhaha over Java’s automatic memory management? People claimed that it wouldn’t work. If you didn’t manage memory manually, your software wouldn’t be as efficient. The supporters argued that if the programmer didn’t have to worry about memory management, they’d be less likely to have memory leaks and the programmer would be more productive. As it turns out, the supporters were right and most newer programming languages offer some form of automatic memory management.
Even though there were a lot of folks dubious about the merits of Java’s memory management and JVM architecture and how “simple” the language seemed to be compared to C and C++, Java took off. Now, perversely, Java programmers often join the ranks of C and C++ programmers to sneer at “dynamic” programming languages such as Python, PHP, Ruby and Perl. Often dismissed as mere “scripting” languages, more and more programmers are starting to appreciate their power. This power can be summed up in a response noted Perl guru Randal Schwartz made in response to a Java enthusiest (a student, I believe) asking him how he dealt with Perl’s lack of “strong” typing. He replied “I just smile and move my program into production before the Java programmer has his first compile.”
These languages are not quirks. They’re a natural continuation of the economic forces which have shifted the productivity burden from the programmer to the computer. Programming languages 40 years from now will likely have less in common with today’s languages than today’s languages have in common with COBOL. The dynamic languages make programmers so much more productive that even conservative business types are forced to sit up and notice. That’s why I love Ruby on Rails, despite having not used it.
David Hansson, love him or hate, has created a killer app which is turning even diehard Java enthusiests to dynamic languages. There’s a reason why Amazon, LiveJournal and Slashdot rely so heavily on Perl. There’s a reason why Yahoo! decided to start using PHP. There’s a reason why Rails is written in Ruby and not Java. I think we’ve finally hit the turning point where the economic forces at work are too great too ignore. Of course, Java will be around for a long time to come — COBOL is still widely used, for example — but it’s simply math. The faster your programmers can turn out good applications, the more money you save (and can therefore earn).
|