|
Source: ONLamp.com Many programmers feel they have bragging rights if they’ve written large systems. This isn’t always fair as many times a quick twenty-something line program might save the day and programmers who can crank them out shouldn’t be undervalued. Be that as it may, sometimes we need to write large systems and we need to know how to do it. But what if you’re just writing a small system? What’s small? And as many of us know, small systems stick around and often grow. While rules which affect larger systems don’t always seem as important on small systems, it’s fair to say that if you want your small systems to be able to grow to large systems, it doesn’t hurt to start with sane rules.
As many readers of The Daily WTF know, many large systems are written terribly and are hard to maintain and extend. In fact, the problem is pervasive enough, even with well-known software projects, that ultimately the conclusion is reached that the software must be rewritten from scratch. Usually this is a mistake. But if you stay with the existing system, programmers often wind up trapped in fear-based programming.
“We can’t alter that table because too many things rely on it!”
“We can’t get rid of that global, we don’t know what it might break!”
“We don’t know what this does, but we think it might be important!” (I’ve heard this far too many times.)
While those lamentations often drive programmers crazy, they’re often completely rational. If you don’t know what’s going to happen, you have to figure out if the risk is worth the reward. If the risk is difficult to quantify, code paralysis sets in. Servicing technical debt is painful and often embarrassing. However, here’s the hard bit: while fear-based programming may be rational, refusing to figure out a way out of the corner you’ve coded yourself into is a bad idea for an actively maintained system. But I’m not going to talk about paying off technical debt. I’m going to talk about not incurring it in the first place. That’s the trick to building large systems that are a joy to work on.
Eat Your Own Dog Food
Here’s a little secret that many “test-infected” developers know: testing makes you a better programmer. It’s not just that your code works. It’s that if you find something is hard to test, that’s a code smell. Maybe your superWunderFunction() which takes 13 arguments isn’t designed terribly well. That’s not saying that all hard-to-test code has a design flaw (GUIs, for example), but as you test more, you start writing code that’s easier to test.
Your functions will take fewer arguments. Your functions won’t try to do too many things. Your functions are more likely to be loosely coupled. You’ll have less reliance on global variables. The list goes on and on.
When you start writing code that is easier to test, do you know what you’re doing? You’re eating your own dog food. You’re using your code and you start writing code which is easier to use. It starts becoming better-designed code. As an added benefit, if programmers are unsure how to use your code, they can always read the tests. Tests are not a substitute for documentation, but they are an excellent supplement to it.
Other End Up
There’s a long-standing joke about how [insert favorite group you like to pick on] has to drink from beer bottles with “other end up” stamped on the bottom. Sometimes it’s funny, often it’s offensive, but it hides an interesting truth: bottles are simple. It’s awfully tough to not figure out which way you’re supposed to drink from an open beer bottle. Good user interface. Ever put a condom on the wrong way? Bad user interface. User interface is important. Unfortunately, many programmers just throw together the first interface they think of. It becomes intuitive to them, but like a remote control you can’t figure out how to use, it’s a source of frustration for everyone else. This is a design flaw.
Don’t just sit down and start writing code. Sit down and think “how would my dream code work?” How can I write something so simple and easy to use that I can’t get it wrong? Then write the code for it. As an example from Perl (I’d write pseudo-code, but this is language specific), there’s a great module called Data::Dumper which allows you to print out variable contents as valid Perl code, even if they’re complex data structures. However, you get output like this:
$VAR1 = 'bob'; $VAR2 = [ 'one', 'two' ];
What’s $VAR1? If you dump out a lot of variables, they rapidly get tough to follow. I wanted the variables to have their correct variable names. To do that with Data::Dumper you have to do something like this:
print Data::Dumper-Dump( [ $name, \@numbers ], [ qw/$name *numbers/ ] );
That’s just ugly, but here’s what I wanted to do:
print Dumper( $name, @numbers );
Turns out that’s not that easy, but I figured out how to do it and now, with Data::Dumper::Simple, you get this:
$name = 'bob'; @numbers = ( 'one', 'two' );
This is incredibly useful for debugging and I’ve gotten a lot of thanks for it. I did this by imagining my “dream” code and figuring out how to make it happen (I’m not usually this smart. It was a good day). This is a principle you want to always follow. Heck, one thing which helps is just scribbling down some ideas and asking an unsuspecting developer “does this make sense?” If they have to ask any questions, maybe you can make it simpler still.
What? That’s not enough? Here’s how to print something to a file in Perl:
open FH, "", "somefile.txt" or die "Can't open file: $!"; print FH "This is written to a file\n" or die "Can't print to file: $!";
Now let’s look at one way to do this in Java:
import java.io.*;
class WriteFile { public static void main(String args[]) { FileOutputStream foStream; PrintStream pStream;
try { foStream = new FileOutputStream("somefile.txt"); pStream = new PrintStream( foStream ); pStream.println ("This is written to a file"); pStream.close(); } catch (Exception e) { System.err.println ("Error writing to file " + e); } } }
Which do you think needs “other end up” instructions? (To be fair, the Perl API isn’t perfect, but damn, it’s one hell of a lot easier to use).
One Click to Rule Them All
I generally work for companies that do a lot of Web-based development. To deploy a new version of the Web site, the process is almost always a variation of:
Find the text file you saved the deploy instructions in. Start following the steps, one by one. Note when any steps are optional. Note which steps have special instructions for them. Curse vehemently when you’ve missed an instruction. Undo the last three instructions. Follow the missed instruction. Continue with the rest of the instructions. Go home and have a few drinks over a successful launch. Get paged at 3:30 in the morning when you the Web site crashes. Work for an hour to fix the bug. Find out you had an old copy of the deployment instructions. Admit defeat. Work for two hours reverting the Web site and database.
This is wrong. You need one-click install, one-click rollback. If you have to do something repeatedly, find a way to automate it, particularly if getting it wrong will cost more money than fixing it.
Need to deploy the next version of code, including database changes? Automate it. Need to roll back those changes? Automate it. Need to check out a new code base and build a test website and database for it? Automate it. Your boss wants weekly status reports? Automate it.
I’m not kidding about automating status reports (well, maybe a little). It can’t always be done, but if you can figure out a way to automate it, you’ll be much happier. One strategy is to make your source control commit messages meaningful and then writing code which reads them and emails them. Make your email subjects meaningful and you can include in your status reports “Emailed Nancy about the ‘Smell in the Bathroom’”. If something needs to be done repeatedly and you can figure out how to automate it, you’ll save yourself much pain and headache later.
The Price You Pay
OK, your code is well-tested. Your code is better designed. You have intuitive APIs that anyone can use. Most processes are automated to remove bug-prone grunt work. You’re well on your way to making a system that’s easy to use, refactor, and extend. But you’ve paid a price. You’ve front-loaded your costs.
Though some testing advocates deny it, writing tests can mean you’ve spent longer developing features. Sometimes it’s because you’re figuring out how to test something. Sometimes it’s because you’ve exposed a design flaw which requires a bunch of refactoring. Testing can simply take longer. And spending time up front creating a “dream” API can take longer and sometimes they’re more difficult to implement than the quick hack. And trying to figure out how to automate something can take longer than just doing the actual task. For small systems, these costs can add up rapidly.
But I meant “front-loading” your costs. You’ve incurred less technical debt which means you have less to pay later on. For most large systems I’ve worked on, the maintenance phase lasts much longer than the development phase so everything you can do to reduce costs in the maintenance phase can pay off wonderfully, but you frequently have deadlines you have to meet. Writing tests means that if you change something, you’ll probably find out quicker if you break something, so changes to the system are are easier to implement. Because you have good design and “dream” APIs, other developers can understand your code better. Because you automate everything, repetitive tasks don’t waste labor hours and are less fragile. But you need to save money now.
Don’t Sweat the Small Stuff
OH. MY. GOD! I can’t believe you wrote that dreck!
Ever heard a variation of that? For many programmers, you might be tempted to say than when you see something like this:
for i in array1 for j in array2 if i == j duplicates.add(i)
That’s just awful. If you have ten thousand elements in each array, this could be an awfully expensive routine.
So what? I don’t care. There’s an old saying that a sufficiently encapsulated hack is no longer a hack. When you see something like that, ask yourself three questions.
Does is do what it’s supposed to do? Is it sufficiently encapsulated so that it’s easy to change if needed? Am I able to read the code easily?
If you answer “yes” to those three questions, ignore the “problem” and move on. You have work to do and squabbling about little issues and fixing problems which might not be problems is a waste of time (note that you probably shouldn’t answer “yes” to the first question if you don’t have tests for it.)
Now the above code might seem like a newbie mistake and I confess that I have an almost pathological aversion to it, but I deliberately chose an example of code I despise to demonstrate an example of code which I’ll ignore, despite my feelings.
Here’s the problem: it’s not a performance issue until you’ve proven it’s a performance issue. What if it turns out each array can only have three elements? It’s probably not a performance issue, but that’s not obvious by just looking at it. Until you’ve proven there’s a problem, don’t fix it. I know this is a terribly controversial point for many programmers, but we shouldn’t forget that we have jobs to do. Constantly rewriting working code means we’re not getting new features written (refactoring is an obvious exception).
Reduce Features
You’ve heard the joke. “Fast, good, or cheap. Pick two.” That’s three things: deadline, quality, and cost. Most people admit that you can’t get the best of all three of those. Rarely do bosses say “don’t worry if it’s any good”, so we take quality off the list. Sometimes there are legal or market reasons to beat a deadline, but often it’s a simple matter of “I want it done in three weeks”, so we take the deadline off the list. It’s common to have a boss say “I won’t pay for a Mac”, so you can’t easily test if your code compiles on OS X. Now we’ve taken cost off the list. We need it fast and cheap and good. We all know which of those three we can hide from the boss.
There’s another way, though. Do your spreadsheet really need the 3-D VRML graphs when you first launch? Does that your screenplay authoring software really need to support remote collaboration at first? Does your budget management software really need to support MySQL, PostgreSQL, SQLite, Oracle, CSV files and cuneiform tablets? You’re not saying you won’t add these features, you’re just trying to focus on the features you need when you first launch. And guess what? You might out that your customers really aren’t crying out for cuneiform support after all!
Plenty of times I’ve met deadlines by delaying less critical features until after the launch. If you’ve followed the above rules, you’ll often find out that those features are easy to add later.
Conclusion
What I’ve outline above is mostly the coding side. Building large systems might involve making appropriate hardware choices. It might involve carefully designing a network, understanding load balancing, choosing appropriate database software and a host of other things I’ve not covered, so the above list isn’t enough, but it’s a great start for coders. Many of us have seen horror stories about how small projects have grown to large projects, but we didn’t plan them to grow. I’ve written some of those horror stories. Sometimes we’ve wanted to start out large but we don’t know enough about building those systems to get there. By spending a little time up front testing, making things easy to use, and automating everything, you too can build large systems.
Source: ONLamp.com If you really want to make something a habit, find a way to do it without thinking about it. I like to automate the things I value so I never do them incorrectly, incompletely, or infrequently. Thus Test::Perl::Critic allows you to add customizable Perl::Critic tests to your test suites, so you can ensure that you’ve followed local style.
I’ve been part of the Perl QA group for around five years. In that time, we’ve built dozens of wonderful test modules around a common backend library and a common protocol, evangelized testing and quality to the Perl 5 and Perl 6 developers, spread the expectation and understanding of good testing to CPAN contributors and more, and even built automated systems to check various quality measures of public code.
Tests aren’t the only measure of quality; maintainability for code you expect to maintain is also highly important.
Damian Conway’s Perl Best Practices is a good start for evaluating your coding style and practices. Any maintainability guidelines should start there.
One of the most interesting developments in Perl 5 in recent years is Adam Kennedy’s PPI, a Perl document parser. It can parse most Perl code without actually executing it. This has powerful implications for static code analysis.
Shortly after the advent of PBP and PPI, Jeffrey Ryan Thalhammer created Perl::Critic, an extensible PPI-based static analysis tool. It builds on the recommendations of PBP and adds other metrics to identify source code that varies from local standards.
If you really want to make something a habit, find a way to do it without thinking about it. I like to automate the things I value so I never do them incorrectly, incompletely, or infrequently. Thus Test::Perl::Critic allows you to add customizable Perl::Critic tests to your test suites, so you can ensure that you’ve followed local style.
Installation
I used the CPAN shell to install the module. Everything went well. Although I already had Perl::Critic and PPI installed, both distributions move frequently, so I needed to install new versions. There were a few other dependencies, including Perl::Critic::Utils, Test::Object, and Test::Simple (0.64). The instalation worked flawlessly.
Usage
The default usage is simple. I added the example from the documentation as t/critic.t to my Test::MockObject directory. That’s all it took to get started.
#! perl
use strict; use warnings;
use Test::Perl::Critic; all_critic_ok();
That found some violations:
$ perl -Ilib t/critic.t 1..2 not ok 1 - Test::Perl::Critic for "blib/lib/Test/MockObject.pm" # Failed test 'Test::Perl::Critic for "blib/lib/Test/MockObject.pm"' # in /usr/lib/perl5/site_perl/5.8.8/Test/Perl/Critic.pm at line 95. # # Perl::Critic found these violations in "blib/lib/Test/MockObject.pm": # Stricture disabled at line 126, column 3. See page 429 of PBP. (Severity: 5) # Stricture disabled at line 301, column 3. See page 429 of PBP. (Severity: 5) # Stricture disabled at line 314, column 3. See page 429 of PBP. (Severity: 5) not ok 2 - Test::Perl::Critic for "blib/lib/Test/MockObject/Extends.pm" # Failed test 'Test::Perl::Critic for "blib/lib/Test/MockObject/Extends.pm"' # in /usr/lib/perl5/site_perl/5.8.8/Test/Perl/Critic.pm at line 95. # # Perl::Critic found these violations in "blib/lib/Test/MockObject/Extends.pm": # Stricture disabled at line 54, column 2. See page 429 of PBP. (Severity: 5) # Stricture disabled at line 71, column 3. See page 429 of PBP. (Severity: 5) # Stricture disabled at line 127, column 5. See page 429 of PBP. (Severity: 5) # Stricture disabled at line 150, column 3. See page 429 of PBP. (Severity: 5) # Stricture disabled at line 163, column 2. See page 429 of PBP. (Severity: 5) # Looks like you failed 2 tests of 2.
There are six places in the distribution where I disabled strictures. This is okay; it’s doing complex things. Of course, I wanted to browse the code to see if there were any errors. This exercise showed me one spot where I disabled symbolic reference checking in far too wide a scope. I fixed that potential bug. How nice!
Customizing the Behavior
Unfortunately, all of these symbolic references are necessary for the modules to work properly. If I want to keep these tests, I would need some mechanism to allow their use but still pass the check. Test::Perl::Critic uses the .perlcriticrc file, as documented in Perl::Critic. I added an empty file in t/perlcriticrc and added a line to the test file:
use Test::Perl::Critic -profile = 't/perlcriticrc';
The configuration file needs something in it, and to prevent these particular policies from running, I need to know the name of the policy being violated. Test::Perl::Critic supports a -verbose flag to change its output message. I chose to pass a format string to show only the name of the failed policy:
use Test::Perl::Critic -verbose = '%p';
I recommend this only for debugging; it changes the entire debugging message. Another option was verbosity level 9, but I wanted only the name of the policy to disable. I could have come up with my own custom format string (and that’s probably the right answer), but this worked for me.
My resulting t/perlcriticrc file is:
[-TestingAndDebugging::ProhibitNoStrict]
Unfortunately, my tests continued to fail until I noticed that I had used policy in the test file instead of profile. Oops. Beware.
There are other customizations too. For example, the default seems to report only failed policies at severity 5 (the least severe). I was curious about how the code looked at a lower severity, so I added to the use line:
-severity = 3;
That gave many more warnings. Unfortunately, I found no easy way to set the default severity in my t/perlcriticrc file, but that would have been useful. (I think this is a shortcoming — or at least a deliberate design decision — of Perl::Critic instead of the test module.)
Conclusions
I recommend Test::Perl::Critic, especially if you’ve already decided which policies you do and do not support. If you haven’t discussed that yet with your team, the default severity of 5 (again, the least severe level) is a decent choice.
This module made the right choices. Adding a separate test and policy file to each distribution may seem like busy work, but it allows you to change the severity and rules per project. Using a global .perlcriticrc file would be a mistake. Not only can you improve the compliance of separate projects individually as necessary, but I have some distributions that use some constructs very deliberately but not others. Distributing a policy file (with the appropriate severity) with each distribution helps prevent nasty failures elsewhere.
One question might arise. If you distribute your code, should you distribute these test files? I agree with the authors; it’s inappropriate. These are developer-side tests and represent little value for users. They represent tremendous potential value for authors, however. It’s worth experimenting with what this reveals about your code.
|