CSV Manager 1.1.11 Released

We're hard at work on CSV Manager 1.2 – coming out Real Soon Now! But in the meantime we're bringing out a bug fix release of 1.1. This release (1.1.11) fixes bug #0013: comment characters at the start of a data field were causing parse errors.

This was a real nasty bug. We added comment support recently as it was a common feature request from customers. And this bug is a really classic example of the law of unintended consequences. It's a great new feature, but it introduced a hard-to-find bug. We had no test cases for the standard comment character '#', in the case where it was just ordinary data. In fact we have no test cases to check that all valid characters can in fact be parsed. We sould have had them. So this is a classic lesson: think hard about regression testing, and try really really hard to check against all input conditions.

One easy way to perform this type of test is something called fuzz-testing. Basically you chuck a load of random data at your program and see it it breaks. We actually do have a test like this. We use massive randomly generated CSV files to check performance and parsing. But here's the thing. At no point did any of those files ever have a '#' character in just the right place to trigger this bug. The laws of probability are against you on this one.

I think this shows the importance of a combined testing approach. You must apply all the techniques. None of them are silver bullets. You must create tests based on really hard thinking about test conditions, and you should also have randomised testing. And don't forget your test coverage either. Or multi-platform testing. etc. etc. Erik Sink had some good points on this stuff recently. Anyway, the long and the short is, it's not easy to cover all the bases and you have to put in the leg work. No wonder software products take ten times more work than software projects.

Back to the bug fix release. This is an important one to download and install. If your input data ever contains a '#' you could be in trouble with older versions of CSV Manager. So login to your user account and grab the latest release. We strongly encourage you to do this. And let me apologise for introducing such a nasty bug. Data integrity and loss prevention is a design mantra for CSV Manager, so we're not happy campers at all about messing that up.

Finally, thanks to Dan for discovering the bug. He also got our first bug-bounty – a $15 ThinkGeek gift cert.

tag-gen:Technorati Tags: Del.icio.us Tags:




Posted in General | Leave a comment

How To Make Your Globs Specific

Let’s say you’ve got a list of globs: '*', 'a*',
'*a', '*a*'. Now sort them. Sort them based on how
specific they are. More specific globs match fewer things. Less specific
globs match more things.

If Dr. Seuss had had anything to say about globs, what would he have said?


Glob on String
Blob on Thing
Glob on String for Blog on Thing
Blog on Glob for Thing on String

When Globs on Strings match Blogs on Things
Then… Blogs on Globs match Strings on Things

And When Blogs match Globs and Strings match Things
Then… Blogs on Globs match Blogs on Things on Globs on Strings!

So how do you sort globs? And why would you want to sort them? I’m
working on a project where you use globs to pick out the error
conditions that you are interested in. These error conditions have
names. So you can match entire classes of error by using globs. For
example 'foo.*' matches the errors 'foo.bar' and
'foo.baz'.

But what if I also add '*.*' to the mix. This also matches
'foo.bar' and 'foo.baz'. But I want 'foo.*' to match
first. It’s more specific. For some definition of specific.

Again, 'a*' is more specific than '*a*'. 'a*' can
match fewer things. Let’s look at this more closely. Say we have the
set of words: [aa,ab,ba,bb]. Then


* matches aa,ab,ba,bb
a* matches aa, ab
*a matches aa, ba
*a* matches aa, ab, ba

So we can order the globs like so: 'a*', '*a',
'*a*', '*', in order of increasing matches. The ones at
the front are more specific. They match fewer things.

What about 'a*' and '*a'. How do we decide which comes
first. Rather arbitrarily, lets say that the more specific prefix of
'a*' makes it the winner. Prefixes are more specific than
suffixes. Hmm. That’s getting way too philosophical. Take Java
packages: 'com.ricebridge.csvman.*' is more specific than
'*.csvman.test'. The latter can occur in any top level
package, but the former in only one.
Yeah it’s pretty weak, but hey!

Now here’s the thing: what is the rule for sorting globs? I want a rule which:

1. is human computable 'just by looking' — that is, the rule is obvious
2. creates a full ordering of any set of globs — no arbitrary ordering of non-identical globs
3. has reasonable efficiency

The first idea: Use the number of stars. More stars means more
specific. When the number of stars is the same, sort
alphabetically. Neat and easy. And wrong. Sure it puts 'a*b*c'
before 'a*c', but it also puts '*a*' before 'a*',
which is incorrect.

Let’s look at globs. There are stars, and then there are normal
characters. They always form an alternating pattern: 'a*b*c'. Or
'*a*b*'. So this looks like an essential (and not accidental)
feature of globs. The number of stars and the number of
'normals'. Yes, in general, the more stars you have, the more
constrained you are, but there’s more going on.

After thinking about this for a bit, I realised that the outside stars
are special. Very special. A glob with a star at the start or end, or
both, can match a lot of things. Way more things that a glob with
stars inside other characters. What if we just define an ordering for
the outside stars. And we have one already. Form the preceding
discussion: 'a', 'a*', '*a', '*a*', '*'.

Now that 'a' can stand for anything inside the outside
stars. Other globs in fact. But those globs can’t have outside stars!
The glob '**' is the same as the glob '*'. Adjacent stars
merge into one. So '*a*', with a = '*b*' becomes
'**b**' becomes '*b*'. On the other hand with a = 'b*c', '*a*'
becomes '*b*c*'. A different beast entirely.

Now at this point you might be saying, “hold on a sec, you’re
trying to order regular expressions! That's insane! You need to
evaluate them to do that!” Because globs are just regular
expressions: 'a*' is really /a.*/ in regex land. True. But I
think that for the special case of globs built from stars only (let’s
drop the use of '?'), we can in fact define an ordering without
requiring evaluation. The problem is sufficiently complex to require
some thought, but it's not a 'hard' problem.

I would think that due to back-tracking, solving the sorting problem
for standard regular expressions is probably going to involve solving
the halting problem, since you have to evaluate each regex. But I
could be wrong. I often am.

Back to the globs. We can sort based on the outside stars. Fine. We
have five subsets to sort now: 'a', 'a*',
'*a', '*a*', '*'. '*' is a special case —
it’s always last. It's the least specific glob of all, matching
anything. The glob with no stars, 'a', is also easy. Just sort
alphabetically.

For the other three we note that a will always take the forms
'b', 'b*b', 'b*b*b', …, and so on. That is, the essential thing is
the number of inside stars. In this case, the more stars you have, the
less you can match. For example 'a*a' will match
'azbza', 'azcza'. But 'a*b*a' will only match
'azbza'. More stars are more specific. Problem solved!

Nope. Not quite. What about globs having the same number of inside
stars? What about 'a*a' versus 'a*ab'? Which is more
specific? I think 'a*ab' is. It has more information, more
constraints on what it can match. For any given finite set of words,
there are more ways to end in 'a' alone than 'ab'. So the
next criteria is: more normal characters, more specific.

You can see the next catch, can't you? What if we have the same number
of normal characters? Alphabetic sort? No, doesn’t work – the
characters may be spread out between in the stars in different ways:
'a*ab' versus 'ab*a'. Which is more specific? Hmm. Let’s
invoke our prefix rule again. This makes 'ab*a' more specific
than 'a*ab', because it has a longer prefix. The criteria is:
whoever’s got the mostest on the leftest.

And finally, if the character spread is identical, then we go
alphabetic. So 'aa*a' precedes 'ab*a'. Let's see what we have…

The human rules are:

1. Prefixes are more specific than suffixes
2. Outside stars first: 'a', 'a*', '*a','*a*'
3. Inside stars: more stars means more specific
4. Normal characters: more characters, more to the left, is more specific
5. Otherwise go alphabetic

Here’s a sample list of globs, starting with
the most specific. See if this list matches your idea of how they
should be ordered.


com.ricebridge.csvman.CsvManager
com.*.*Manager
com.ricebridge.*.CsvManager
com.ricebridge.csvman.*
*.ricebridge.csvman
*.ricebridge.*
*

This is almost right. But really I think com.ricebridge.*.CsvManager
is more specific than com.*.*Manager. How can we arrange this? Do more
stars really mean more specific? Or do normal characters provide far
more specificity?

The more normal characters you use, the more specific you are
being. Stars provide very little information. But strings of normal
characters really narrow things down. But in which set of words? What is the
total set of words that we are matching in the com.ricebridge
scenario? Infinite? Is there some structure? If these are the names of
Java classes (my target domain), then com.ricebridge.*.CsvManager is
definitely more specific (matches fewer items) than
com.*.*Manager. Seems like prefixes really do rule. I wonder is this
somehow connected to Zipf's Law

Anyway, let’s drop rules 3 and 4, and replace them with:

Inside characters: longest prefix wins

So even if you have loads of stars, if you have a short prefix, you lose. This gives:


com.ricebridge.csvman.CsvManager
com.ricebridge.*.CsvManager
com.*.*Manager
com.ricebridge.csvman.*
*.ricebridge.csvman
*.ricebridge.*
*

That’s a lot better. Let’s restate the human rules:

1. Outside stars first: 'a', 'a*', '*a', '*a*'
2. Inside characters: longest prefix wins
4. Prefixes same length? more stars wins
3. Otherwise go alphabetic

That new rule number 4 means that 'a*b*c' is more specific than 'a*b'. So you only compare the minimum number of prefixes. If they're are equal, take the longest glob as more specific.

A much better ruleset! And it pretty much conforms to the criteria for the
sorting algorithm: human computable, full ordering, and reasonable
performance.

Time to go code it up. Post a comment if you want to save me from myself!




Posted in Java | Leave a comment

Berlin Drain Cover

I was in Berlin recently and while wandering around the Brandenbürger Tor I came across this drain cover. Another example of european design I suppose…




Posted in General | Leave a comment

Back To The Typing

I'm trying out all the free typing lessons on the web in the vain hope that something will stick and I'll eventually be able to touch-type. Here’s all the typing posts, including site reviews, if you're interested.

I've moved on from goodtyping.com. Now I'm with learn2type.com. Whereas goodtyping eventually moves you to a pay–per–use model, learn2type is purely ad&ndashsupported. And that's fine. The ads don't really get in the way.

Where the sites do differ is their approach to learning. goodtyping is much easier when you're getting started. It follows the traditional model. You start with the home row and learn progressively more keys over time and the typing drills use real words and sentences (mostly). learn2type on the other hand dumps you right in at the deep end with all the keys. This is great if you are already comfortable with the basics, but I don't think an absolute beginner would find it much fun. I'm glad I started with goodtyping.

That said, learn2type is a much better site once you are beyond the beginner stage. The drills are more challenging, you get nice graphs showing your progress (love those!), and you can learn the numeric keypad as well. The site also claims that it uses a learning algorithm to figure out which keys are giving you trouble so that you can concentrate on them in the drills. Not sure how well that works (then again, I find most of the keys tricky!).

On the whole this “learning to touch–type” project is taking a while. I originally started so that I would be able to write more for this blog. My “natural” typing is pretty fast but has a very high error rate. Writing blog entries just takes too long and the constant backspacing kills my flow. I don't seem to have this problem when I'm coding, but then I use auto&ndashcomplete a lot, so I never have to type that much. Also, when coding, you're mostly thinking, not bashing on the keyboard, so reduced speed is not a big deal.

The biggest challenge is finding enough time to do the drills. I don't know about you but I'm not exactly blessed with much free time between work, clients, kids and required downtime. I get about 5–10 minutes drilling in on a weekday. Probably not enough.

The other thing that is frustrating is the transition to full–time touch–typing. That is really hard. My touch–typing is nowhere near fast enough to use on a daily basis. So I stick with old bad habits with most of my typing. I hope that is not “unwiring” my touch&ndashltyping. I'm kind of hoping that my brain will regard touch–typing as a different “language” and store it in a new brain module (apparently this is the way natural language learning works).

If you know of any good online typing resources, let me know! I intend to try them all out…

tag-gen:Technorati Tags: Del.icio.us Tags:




Posted in General | Leave a comment

Be the Best Available

There are only two (profitable) markets. The cheapest available and the best available. Play the middle ground and you play to lose. Seth hits the mark again. Imagine how much it would have been to hire this guy before blogs? Now you can get his head for free.

OK, I know this is not exactly rocket science. High volume versus high margin. Clean and simple. Lots of people have written about it. But it's nice to get a reminder. It gives you motivation and inspiration.

When I started Ricebridge I had this in mind, if not in words. Give something a name and you give it power. Give an idea a name and you manifest it. “Be the best available”. What a brilliant way to give yourself focus.

In my own market (software components) I have really tried to live this idea. Yes, there are loads of Open Source alternatives. That's the high–volume competition. And I go for the high–end business. When you really need to get the best there is, I'm there. People need this option.

Of course, it's a lot more work to build full-service components. All that other stuff outside the code. All the additional support that comes with the product. That's really where my market is, not the code itself. I've been thinking about this truth for quite a while now. Fully embracing it means providing high-level support for my components, as standard.

Pretty soon I'm going to announce an entirely new licensing model. Prices will change, but what you get will be far more than just a piece of code. Instead of providing support as separate (expensive) products, I'm going to make support a part of the core offering. When you buy a Ricebridge component, you will get high-level support as standard. Your questions will be answered directly and quickly. No more searching through documentation or FAQs. What you buy is another team member who just happens to be a complete specialist in one part of your project.

I really think that the lack of support is what puts people off buying software components. Commercial components are usually so badly supported that people are driven to Open Source. We've all had our nightmares. At least Open Source has some sort of support structure. It may be random, but at least you have some hope. If you look at the component vendors that are well-respected, you find that they are the exception that proves the rule. Everybody raves about them.

The thing is, supported components are what is needed, not software components. There is no such thing as plug–and–play when it comes to building software. You always need a helping hand.

It's time to start playing to my strengths.

tag-gen:Technorati Tags: Del.icio.us Tags:




Posted in Business | Leave a comment

Further Versioning Contortions

Yesterday I had a plan to use reflection to solve the compatibility versioning problem with my software components.

Basically I need to change an interface that existing customers have implemented. And I want to maintain compatibility so that no-one has to recompile anything.

I did try to anticipate this problem by providing for changes in an abstract support class. Users were advised to extend this class rather than implement the interface directly so that future changes could be hidden from them.

Well it looks like I might be able to get away with it after all. And without any reflection. Here is a concrete example of the problem, and a proposed solution.

The solution satisfies the requirement that existing code must not be changed. Existing binaries must still work, and existing code must still compile without errors. But the interface will change. Here goes…

First, here's the current situation, demonstrated in code. I've used a simple example to show the essence of the problem.

We have ColorManager class, to which Colors can be added. Colors have names and numbers. To implement a new Color you extend the ColorSupport abstract support class which in turn implements the Color interface. Instead of implementing the interface methods directly, you implement protected *Impl methods instead. These are called by ColorSupport. This insulates you from changes to Color.

The change is that the Color.getName method is to be renamed Color.getCode, and the ColorSupport.getNumberImpl method is to be made protected (it was accidentally released as public).

Anyway, here's the initial code:

public class ColorManager {

  public void addColor( Color pColor ) {
    System.out.println( "color:"+pColor.getName()
                        +","+pColor.getNumber() );
  }

  public static final void main( String[] args ) {
    ColorManager cm = new ColorManager();

    Red red = new Red();
    cm.addColor(red);
  }
}


public interface Color {
  public int getNumber();

  // this will be changed to getCode
  public String getName();
}


// this is the abstract support class
public abstract class ColorSupport implements Color {
 
  public int getNumber() {
    return getNumberImpl();
  }
  
  public String getName() {
    return getNameImpl();
  }
  
  // this needs to be made protected
  public abstract int getNumberImpl();
  
  protected abstract String getNameImpl();
}


public class Red extends ColorSupport {

  public int getNumberImpl() {
    return 0;
  }

  protected String getNameImpl() {
    return "red";
  }
}

These classes mirror the current design of the LineListener and LineProvider callback interfaces in CSV Manager.

So the class Red represents a user created class. It cannot change. And neither can any existing Red.class bytecode.

The solution has to take into account the following: The old ColorSupport will be deprecated, but still supported. The next major version will use a changed ColorSupport and remove all compatibility code. This is allowed as compatibility can be changed on a major release. ColorSupport is a name we want to keep (for consistency across product lines). So if we use a new support class in the meantime, we have to insulate new users who implement the new, correct, Color methods. We must make sure that their code required no changes when we move to the next major version!

So here's the basic idea: apply the changes to Color, which breaks the old ColorSupport and Red Classes. Detach ColorSupport from Color and make it a standalone class. Add a method to ColorManager that can accept ColorSupport. This ensures that old Color implementations that extend ColorSupport still work with ColorManager.

Next, create a ColorSupportImpl class. This is the new ColorSupport. It will replace the old ColorSupport with the next major version. ColorSupportImpl extends the new Color interface directly. It works just the same as the old design. But we know that the name ColorSupportImpl is temporary and will be dropped. So we need to place an insulation class in between the concrete color classes and ColorSupportImpl. To do this we change the recommended way to implement colors. For every color, there is a specific color support class. For example, Green will extend GreenSupport which then extends ColorSupportImpl.

That still leaves one little problem. What about colors that we have not defined? What about user-defined colors? We need to specify that custom colors extend an insulation class rather than ColorSupport, as is currently the case. We'll use CustomColor. So this suggests a change to the standard policy across product lines. Custom concrete user classes extend an abstract custom class which extends an abstract support class that implements the interface in question.

Wow, that seems like a really complicated way to do something simple. In a normal environment you would never do this. You would refactor and modify client code. And for released software components you can't do this. Releasing commercial software creates an entirely different set of issues. In this case it is far far more important to support existing customers than it is to refactor to a clean design. The vendor has to accept the responsibility for maintaining compatibility for reasonable periods and between clear boundaries. You only need to take a look at the situation with plugins to Eclipse or Firefox to see how difficult this problem is. And they get it mostly right!

Here's the code for the new version. Watch out, we've got lots more classes!

public class ColorManager {

  public void addColor( Color pColor ) {
    System.out.println( "color:"+pColor.getCode()
    +","+pColor.getNumber() );
  }

  // this keeps the old colors working
  public void addColor( ColorSupport pColorSupport ) {
    addColor( new ColorSupportFixer(pColorSupport) );
  }

  
  public static final void main( String[] args ) {
    ColorManager cm = new ColorManager();

    // this is an old color
    // old custom colors will work this way as well
    Red red = new Red();
    cm.addColor(red);

    // this is a new standard color
    Green green = new Green();
    cm.addColor(green);

    // this is a new custom color
    Blue blue = new Blue();
    cm.addColor(blue);
  }
}


public interface Color {

  public int getNumber();

  // this is the new version
  public String getCode();
}


// this is the same as before, but no longer
// implements Color
public abstract class ColorSupport {
 
  public int getNumber() {
    return getNumberImpl();
  }
  
  public String getName() {
    return getNameImpl();
  }
  
  public abstract int getNumberImpl();
  
  protected abstract String getNameImpl();
}


// this is unchanged - just what we want!
public class Red extends ColorSupport {

  public int getNumberImpl() {
    return 0;
  }

  protected String getNameImpl() {
    return "red";
  }
}


// this is the new verion of ColorSupport
public abstract class ColorSupportImpl 
  implements Color {

  public int getNumber() {
    return getNumberImpl();
  }

  public String getCode() {
    return getCodeImpl();
  }

  protected abstract int getNumberImpl();
  
  protected abstract String getCodeImpl();

}


// an insulation class, currently does nothing
public abstract class GreenSupport 
  extends ColorSupportImpl {}


// a new standard color
public class Green extends GreenSupport {

  protected int getNumberImpl() {
    return 1;
  }

  protected String getCodeImpl() {
    return "green";
  }

}


// an insulation class for custom colors
public abstract class CustomColor 
  extends ColorSupportImpl {}


// a custom color
public class Blue extends CustomColor {

  protected int getNumberImpl() {
    return 2;
  }

  protected String getCodeImpl() {
    return "blue";
  }
}


// this hooks up the old and new interfaces
public class ColorSupportFixer 
  extends ColorSupportImpl {

  private ColorSupport iColorSupport;
  
  public ColorSupportFixer( ColorSupport pColorSupport ) {
    iColorSupport = pColorSupport;
  }
  
  protected int getNumberImpl() {
    return iColorSupport.getNumber();
  }

  // convert getCode to getName
  protected String getCodeImpl() {
    return iColorSupport.getName();
  }

}

Like I said, it's not pretty. But it does allow the API to move forward with full backwards compatibility.

The insulation classes (GreenSupport and CustomColor) are empty in the example above and will probably also be empty in the next CSV Manager release (1.2). Their purpose is to allow ColorSupportImpl to change its name in release 2.0.

And they serve another very important purpose. If in the future further changes arise that require more compatibility workarounds, they allow for the use of a reflection-based solution in ColorSupport and/or the insulation classes. Thus one layer of changes can be applied on the interface side, and one on the implementation side. This “feels” like the right solution.

Of course, some types of changes (for example, changing method access from public to protected) may not be amenable to a reflection-based solution. They may require a third layer of insulation. We'll cross that bridge when we come to it, if we cross it at all. I rely on the belief that as the API converges on an acceptable design, these types of changes will become less of a problem. Once the API has been in use for a longer period, changes become so exponentially expensive that it is better to put up with design mistakes. This is what happened with the standard Java API.

I reckon I am still able to pull this off at this stage in the life-cycle of CSV Manager. The cost will be more complex documentation until 2.0, when the compatibility code can be ditched. And the cost will be increased code complexity inside CSV Manager, which means more work for me to bug fix it all. Of course, I have a large set of unit tests so this should not be a big problem.

Well, it looks like we're set. Any final thoughts before I dive into the code?




Posted in Java | Leave a comment

More On Versioning

Brian Smith made some good points on my last post about software component versioning. They're too detailed to reply to in a comment so I'm posting a reply as a full entry.

You'll probably need to read that last post for this one to make sense. Also, I kind of have a strategy now. More on than at the end of this post.

I would just change the version number to 2.0, and then tell your customers that they can have a free upgrade even though this isn't the normal policy.

There's not enough new stuff to go to 2.0. It's not really fair to customers either. I think it has to be 1.2. I'd like 2.0 to be a bigger bang.

Unless you are taking away functionality I have a hard time understanding why you cannot maintain backward-compatibility. Compatability is something that everybody expects, and doubly so if they are paying for the product. It seems like the problem could be solved by adding a new interface instead of replacing an existing one.

When adding methods, this is the case. In fact I will be adding methods and maintaining the old ones. But if you want to change method access permissions it's basically impossible. I want to move a public method to protected.

Also, adding a new interface is not an option as I need to keep the name of the interface consistent across product lines. Bit of a catch-22 really. It's a bit of a pain when you have to manage more than one product. I'm starting to have some sympathy for Sun and the mess that is java.*.

The nice thing about backward-compatibility is that there is helps encourage people to upgrade to the latest version. If people refuse to upgrade then you end up supporting old versions longer. If a customer finds a bug in version 1.1 then they will ask you for a 1.1.1 to fix it, instead of upgrading to 2.x where it was already fixed.

Ah. Yes. But I do this in any case. All major versions will be supported for as long as possible. This is important. Part of the reason developers hate external components is because they can't be trusted and a lot of vendors just don't care if there are bugs. Well I care, and I won't let bugs stand if I can help it.

I don't really like the idea that the first component of the version number only changes when there is an incompatible change. If you maintain compatibility for five years then the version number would be something like 1.9. But, 1.9 probably would have much, much more functionality than 1.0. Yet, the numbers 1.9 and 1.0 don't seem that much different from each other (it is easy to misread it as 1 & 23/100). The result is that this scheme is counterproductive to marketing the product: the main version number increases when something bad happens.

The worst case would be when you publish an interface in, say, version 1.6.1 that for some reason has to change immediately (e.g. the way the interface was structured facilitates some kind of security problem). To fix this problem you want to release a fixed version of the API. But, now you have to call this version 2.x even though it might be a very, very small change. Maybe the jump from 1.0.1 to 1.6.1 was a huge improvement in functionality. Yet, the increase from 1.6.1 to 2.0.0 was a single bugfix, perhaps just a few characters changed in the source code. It is counter-intuitive.

This is a very strict interpretation. I would say that one can bump to 2.0 at anytime, for marketing reasons say. It would only be forced by a compatibility issue.

In any case, I think I am going to drop the minor version restriction and say that minor numbers can accept compatibility changes. This means that you can only be 100% sure of compatibility on the same major.minor revision. But that's pretty much OK.

It works okay for open source products because marketing for them is totally different (often ignored).

Very true. And there is something to learn from Open Source. Version numbers are an important way to give a quick overview of the state of the project. I like that and it's very useful, even necessary, for software components. It's much less important for applications, where the version number is just a marketing gimmick. But for my stuff the version number is, in a way, part of the customer service. It's very important that it has a crystal clear meaning.

So this all seems like a very difficult problem to solve. However I was forgetting one thing about Java: Just-In-Time compilation!

How does that help? Well, it means that reflection is not as bad as you think it is. The “common knowledge” is that reflection is slow. Well sure, the lookup is slow, but once you have the method reference the JIT compiler will optimize the calls for you. In the case of data processing where you perform the same operation many times, this effect comes into play very quickly.

So reflection is OK! I can use it! I'm going to look at trying to rewrite the support classes to handle this situation. They will introspect themselves to see if the old methods are declared. They can thus recognize the old interface and work with it. Old code should just keep working.

Yes this will make my support classes messy and more complex. Tough for me. That's the price to be paid. If you want to maintain compatibility for paying customers, as Brian noted above, you must do this. Anything else is just lazy.

Of course, this may not work completely. We'll see. But even if I can reduce the changes required, that's a good thing.

Thank You Blogosphere, and Thank You Brian!




Posted in General | Leave a comment

Versioning Cistern

My versioning system is broken. It's painful and I'm not entirely sure what to do about it. Time for a blog entry so!

Previously I wrote about my vision for the Ricebridge Versioning System. All Ricebridge components have three version numbers. For example, 1.2.3 means that you have major version 1, minor version 2, and build number 3.

Major versions allow for incompatible changes. Minor versions add new functionality but keep compatibility. Build numbers track bug fixes and minor changes. I think it works very nicely. It's pretty clear and easy to follow. And you know where you stand and what you can upgrade to. You know that a major version change will probably break your code and you'll have some extra work to do. But you also know that you can upgrade to a bug fix release or take advantage of a new method in a minor release by just dropping the jar in and continuing on your way.

I'm very attached to the three number versioning system. And so are my customers who've got very used to it. Changing versioning system is in itself a major version change really (given that you are breaking semantic compatibility), and I don't really feel like it at the moment!

So what's broken? Well I have new version of CSV Manager coming out soon. This will add a few new things, including Java Beans support. Nothing that will affect the existing code, so no problems there.

However, based on the experience gained building XML Manager, and my future plans for the product set, it looks like some of the interfaces in CSV Manager will have to change.

I want to create a coherent set of components that all work the same way. That's very important. There has to be interop, both at the code level and at the user level. That means that the APIs should be the same, Once you learn the API of one Ricebridge component, you can then apply it to all Ricebridge components (pretty much).

But the original CSV Manager API is not right for this. It needs to be modified. That breaks compatibility. So fine. We go to CSV Manager 2.0.

Except, not so fine. Not at all. You see, Ricebridge customers get an upgrade path. When you buy from us, you get the right to upgrade, for free, to a release that has the same major version. Right now CSV Manager is at major version 1. If I bump it up to major version 2, then all those customers will loose out. Ouch. Not very nice at all. Definitely not the right way to go. I want existing customers to get the full benefit of the new API as well.

It looks like we can’t bump the major version. So let's drop the compatibility restriction on minor versions. That could work. Except now, when you look at a version number, you can't tell right away whether it will work with your current setup. Not so good either.

How about using four version numbers? We can add an extra one for compatibility: major.release.minor.build. So we bump the release number every time there's an incompatibility change. Except this is not very user-friendly. Four version numbers is really pushing it. Three is just about as much as anyone can take. In any case, it's a change I don’t want, as noted above.

Another option is to stick with the old API until the real version 2.0. We can add the new stuff, but keep the old stuff in and deprecate it. This is the standard way of doing things and it is the approach that I use normally. It works especially well for adding and removing methods from API classes. But it doesn't work for interfaces that the user implements.

If you want to change an interface that a user has implemented, then you have to force the user to change their implementation class. There's no easy way round that (sadly we're in Javaland, so dynamic solutions are awkward). This happens because the interaction model between the interface and client code changes. The implementation class must provide the new services required by the client code. In some cases it might be possible to work around this, but mostly it's very messy and there is no way you can ask people to do this.

In the end, we are forced to choose between a new implementation interface and an old one. So if we want to wait until 2.0 for the big change, then we must stick with the old API for now. This works, but it increases the amount of people using the old API and makes the changeover much more painful for many more people. Microsoft has been badly stung by this many times: The long and sad story of the Shell Folders key. It seems better on the whole to take the hit now and impact the smallest number of people.

So where does this leave us? The next release of CSV Manager will include the new API. It will break old code. I'm really sorry to have to do this to my customers. It's very nasty. But the benefit will be the many ways in which you will be able to use Ricebridge components together to solve all sorts of tricky problems. It seems like a good tradeoff in the longterm.

Actually it's not even as bad as all that. The main CSV Manager methods will not be changing much and 90% of existing code should still work fine. The change will be occurring in a part of the code that is not even used by most customers (LineProviders). So maybe all this agonising isn't even necessary!

That still leaves the problem that version 1.1 and version 1.2 will be incompatible. We still need a way to communicate this and to allow users to control versioning in their own projects. It looks like an additional stream of version information is needed. We need to track API changes separately.

One idea is to have API revisions. This means publishing a detailed description of the API and assigned it a revision number. All CSV Manager releases with the same API revision number are going to be API compatible. That means you can copy in the new jar and things will still just work.

When incompatible changes have to be made to the API, then the revision number changes. A new API description is published, showing the changes, so that users can track what happened. So each version of CSV Manager has an API revision. You can show this in a table and it should then be easy for people to follow.

API revisions would not change very frequently. In fact, as we move further along with building Ricebridge components, things should start to settle down very solidly. I don't think that there will be many API revisions.

In this system, the minor version number then means: new functionality and possibly a new API revision. This creates extra work in that API revision changes have to be made clear to users, and handled carefully.

What really gets me about all this is that I had actually put in place a system to deal with it. After reading David Bau's posts about a theory of compatibility, I created a design for the user-implemented interfaces that could accommodate certain types of changes.

When you implement a Ricebridge interface, you are actually advised to extend a designated abstract support class. This class has the job of insulating you from future API changes by translating changes into the older version of what you expect. Except that this only works in some cases. To solve my existing problem it looks like I would have to write a lot of reflection code and that would probably have a bad effect on the performance of the system. It also means that new customers would find the support class to be a confusing mess. I have in fact used this technique already on one minor change to CSV Manager.

One problem that it does not solve however is that I made a mistake with the method signatures when I first released CSV Manager. Some *Impl methods which should be protected are actually public. Bugger!

So the way forward is still not clear. I am inclined to take the hit now and offer free support to all customers who need it to make the change.

What do you guys and gals think I should do?

In proving foresight may be vain;
The best-laid schemes o' mice an 'men
Gang aft agley,
An'lea'e us nought but grief an' pain,
For promis'd joy!

Robert Burns

tag-gen:Technorati Tags: Del.icio.us Tags:




Posted in Java | Leave a comment

Make $97 With One Blog Post

If Jonathan can give away free servers, then I can give away free licenses! I've been inspired by Sun's T2000 promotion to try something like it myself.

Here's the deal. You blog about one of our products, after trying the trial version out, and you get a free single-developer license.

You can write whatever you like. Tear us to shreds or sing our praises. It's all good. We just want links :)

Well, you should make sure that your audience is OK with doing something like this. Full disclosure is a good idea. So don't do anything you're not comfortable with.

Before you start writing, here's the full details of the blog promotion.

Our products are data-munging widgets for Java: CSV Manager (for CSV files, surprisingly), and XML Manager (for XML files, again, a surprise there). The single dev licenses are worth $47 and $97 dollars respectively.

Oh, and the $15 gift cert you get for every bug found, that's still valid. So you might even end up with something nice from Think Geek.

No idea when I'll end this promotion, so don't hang around if you want one, get writing!

And no, I won't apologise for the blatantly linkbaiting title! :)

tag-gen:Technorati Tags: Del.icio.us Tags:




Posted in Business | Leave a comment

Rant: Maven Muppetry

Hani was right. Maven really is a pain in the arse.

I have just managed to install the cobertura plugin. Not by following the instructions mind. On no.

The install instructions are to run the following command:

maven plugin:download
  -Dmaven.repo.remote=http://maven-plugins.sourceforge.net/repository
  -DgroupId=maven-plugins
  -DartifactId=maven-cobertura-plugin
  -Dversion=1.2

Well shucks ain't that nice and easy. And it used to work, because I used that very same command a few months ago and it just worked. Tip for the documentation writer: put this command all on one line and then it's much easier to cut-and-paste into a command window.

But now it doesn't work anymore. Wanna know why? Because it references a load of dependencies inside http://maven-plugins.sourceforge.net/repository/ that do not exist. Did someone delete them? Huh?

In order to install you have to drop the cobertura plugin jar into your repository manually, and then run maven. Then it picks up the dependencies from berlios.

Am I missing something here?

If you're going to release open source stuff, make sure your install just works. No arsing around. People lose interest pretty quickly when you do that sort of thing.




Posted in General | 1 Comment