Much of my thinking about the future of Perl 5 stems from the following principles:
- New versions of Perl 5 should not break your existing software
- Backward compatibility must not stop Perl 5 from evolving
The message linked here discusses lots of insights on perl 5.16 and beyond, based on the talk Jesse Vincent has been busy giving at various conferences this year. It's a great read if you're interested in the future of perl the language.
When Perl and Ruby get compared, it is often mentioned that Perl takes a lot of care and efforts to be as backward compatible as possible while Ruby (the language and its ecosystem such as rubygems, rails etc.) do not care that much and prioritize on evolving faster by making drastic change more frequently.
I think this holds true in some sense - for example, scripts written for perl 5.8 (released more than 8 years ago) will most likely just work on perl 5.14, without any changes. That's probably not the case with Ruby 1.6 and 1.9.
This makes people feel much less worried about upgrading perl, which is a great thing, but I could imagine that the restriction ("upgrading perl should not break existing code, even a major upgrade") definitely makes the development and evolution in perl harder than otherwise.
I can say that the similar thing is happening outside the language development as well: in CPAN modules.
CPAN module's policies
Most (but not all, I know) of the CPAN modules, once uploaded to CPAN, try to keep the API interface stable and as backward compatible as possible. That's the de-facto virtue of CPAN modules. If a module keeps changing API with every release, it will be rated horribly bad, and people would eventually stop using the module/framework, thinking it's unstable/fragile.
While generally it is a good thing to keep the interface stable, and I appreciate all the heroic efforts of perl module developers putting so much efforts to keep the backward compatibilities, I could say it is unfortunate if that prevents their software from evolving as fast.
And I personally guess that this is simply because there's no good/easy way to manage this situation.
Example
Let's take some example. Although this is not a real example (notice the version numbers that are not real :D), this is pretty much what's happening today.
In 201X, Catalyst 8 introduced some feature X, and developer Joe wrote an extension called CatalystX::Foo based on the new feature. It declares the dependency on Catalyst by saying:
requires 'Catalyst', 8.0;
in Makefile.PL, which will later be written down to MYMETA.yml. He shipped the extension to CPAN. When an end user requires CatalystX::Foo, it will pull down the latest version of Catalyst and it works fine. So far, so good.
Few months later, Catalyst dev team decided that the feature X was a good idea but needs some modifications, and had to make some API incompatible changes that breaks CatalystX::Foo, and shipped Catalyst version 9.0.
So now, CatalystX::Foo stops working when an end user upgrades Catalyst to version 9. As long as the user keeps version 8 it's fine, but it's like a time bomb now.
How do we prevent this kind of thing from happening? Today, here's what they do.
- Catalyst developers keep the list of affected downstream modules and notify authors to upgrade to the newer API.
- Downstream (e.g. CatalystX::Foo) author uploads the new version to CPAN, declaring the dependency on Catalyst version 9.
- Catalyst gets shipped with a giant list of conflicts table (search for %conflicts) which warns users that they have to upgrade these downstream modules after installations. They can't just depend on the new version.
While I respect the effort by the upstream developers (i.e. Catalyst developers) for maintaining the list, to me it looks like a non-ideal situation here, because:
- it is odd that upstream modules have to take care of downstream, not the other way round
- this doesn't scale, and doesn't work for modules that are not on CPAN (i.e. DarkPAN, github)
- the warnings generated by the upstream (Catalyst) is useful, but there are CPAN installer that ignores Makefile.PL output (like cpanminus :)) and the output is not given attention in the case of automatic scripted installs anyway
Personally, I don't want to maintain the list of Plack::Middleware modules once (say) I decide to break the API compatiblity in Plack 2.0.
Solutions
I don't have a magic bullet that solves this situation right away, but I have some ideas, and the actual code to implement that. Here's the gist of it:
- Downstream dependencies are advised to actually use version ranges in dependency declarations
- Upstream should use semantic versioning or something similar, so that downstream can somehow predict API incompatibilities using version numbers.
- CPAN installers should install MYMETA.yml files into site_perl library path so that it can be used for later instropections.
- Every project should use a separate local::lib to have an isolated library path.
- Write a tool to rescan the whole (per project) library path to ensure there's no conflicts, and urges end users to upgrade/downgrade if there are.
Overall, this is what bundler does for Ruby gems, and I'm trying to make carton do the same thing for CPAN and local::lib.
It will make the end user happy by allowing them to lock the dependencies, and will make the module authors happy by allowing them to evolve fast, without worrying too much about backward incompatibilities on downstream (although they really care about that to some extent anyways).
I plan to incorporate all of this into Carton 1.0 release, which should happen before or during YAPC::Asia 2011 in Tokyo, which is in 2 weeks.
In other words, this is how conference driven development works :)
I think that the way to do this generally is to have a clear versioning and deprecation policy. i.e. stuff marked deprecated will be removed in the next major version or for 3 versions or whatever.
There is no way to get around backwards incompatibility and I think that the Ruby way of moving forward and not worrying about backwards incompatibility is terribly naive. It forces you to think about compatible versions at every step of the development process. It also keeps you from upgrading because of some library that you depend on doesn't work on newer versions of Ruby. Just getting something like redmine running is a nightmare of incompatibilities.
I think Django does the right thing here. The deprecation policy is clear and communicated widely. It's also followed consistently. Making a policy and breaking it every other release breaks trust.
https://docs.djangoproject.com/en/1.2/internals/release-process/
Probably languages could do something similar if not go further. The farther down the development stack the less often things should change. I think that's the nature of things in general.
Posted by: Ian Lewis | 2011.09.28 at 21:35
Ian: your points are valid, I'm not sure if that applies to what I discuss in my post. I'm talking about the specific tools (for Perl) that we have to have, as well as the best practices for the libraries and downstream dependencies like you suggest.
It is more important and practical here for us to have right tools than expecting all libraries to behave like we hope, however - since CPAN (like PyPI) is an open code repository with flat namespace, and we can't force developers to follow one rule. This is Perl, where it allows and embraces TIMTOWTDI.
Yes, most frameworks/modules with certain amount of downstream dependencies have deprecation policy and versioning scheme, although the semantic versioning is not as widely accepted as in Ruby.
Well I'mt not sure if I agree. Most Ruby libraries do have semantic versioning, and because of excellent bundler tool, most developers don't need to catch up with upstream changes unless they're required to. Developer trust is a separate issue and yes, libraries/framework with lots of downstream users should really think twice about deprecating stuff and should provide a migration path, but that's out of scope for this post.
Posted by: miyagawa | 2011.09.28 at 22:06
Conference Driven Development! I love it! I think you buried the lead.
Posted by: oylenshpeegul | 2011.09.29 at 04:23
I would be very interested to see how this turns out. I have created a module that is somewhat similar, that builds a dependency graph to track all module dependencies based on META.yml and Makefile.PL. As for carton, the dependency graph should be checked into the repository and can be used to install all modules in the graph in the correct order.
When adding a module to the graph all necessary dependencies are added as well and any conflicts are worked out. If any modules in the graph needs to be update this is reported as well and they then have to be update explicitly.
My experience with using the program for almost a year is that the approach is ok, but my module has some bugs that needs to be fixed and some missing features (deleting a dependency for instance).
I can upload the code to Github if anyone is interested.
Posted by: Øystein Torget | 2011.09.29 at 07:01
I hope that carton will collaborate and integrate with Rex( http://rexify.org/ ).
Rex is a very promising Perl equvalent to Ruby's Capistrano/Puppet/Chef or Python's Fabric/Kokki.
Posted by: Account Deleted | 2011.09.29 at 07:09
Øystein Torget: Yes, carton creates the graph and the source data (carton.lock) is checked into the repository, so that it can be used elsewhere to reproduce the installation. The code needs some refactoring but it's all committed to the github repository.
carton also handles the deletion of dependencies, as well as comparing the build file (Makefile.PL or Build.PL) and actual library path and warns users to uninstall unnecessary dependencies, etc.
Posted by: miyagawa | 2011.09.29 at 09:30
Aer0: carton is a command line tool, with an object oriented backend API - so hopefully it will be easy to migrate with whatever sysadmin tools like chef, puppet etc.
Posted by: miyagawa | 2011.09.29 at 09:32
extlib should be EMPTY
we have like 20 modules in there (what a mess)
Posted by: bill george | 2011.09.29 at 14:53
bill george: i don't know what you're talking about, but if you mean extlib with cpanminus -L setup, that is to make sure you have a valid Module::Build and all of its dependencies. It is a bootstrap problem, just to ensure you have the right build toolchain.
Posted by: miyagawa | 2011.09.29 at 15:53
How does it handle module updates based on new dependencies. For instance if I already have a lock file which says I depend on one version of module A and then I add a new dependency on module B which requires a newer version of module A. Will then module A be auto-upgraded?
Posted by: Øystein Torget | 2011.09.29 at 23:27
Øystein Torget: That's the only part I haven't implemented correctly yet, but yes, the module A will be auto-updated, and after that all the metadata are rescanned and ensured that there's no conflict.
Posted by: miyagawa | 2011.09.29 at 23:29
Ok. Auto upgrades are potentially problematic, but I don't know how much of a problem it actually is. Not doing auto-upgrades, which has been my approach also has its problems as tests have a tendency to fail somewhere in the dependency chain if you are adding the latest version of module B and have an older version module A. Even if module A and B are still compatible.
Is it possible to do something like '$carton install Moose-1.21.tar.gz' to get a specific version of module?
Posted by: Øystein Torget | 2011.09.29 at 23:58
The potential problem with auto upgrades are not only updating the code to newer versions which might break some compatibilities (which is actually rare), but also the possibility where some of the modules (packages) can be deprecated/dropped when a distribution is updated.
This is why we have to rescan the whole tree to make sure that everything is sane. This is not obviously easy but doable.
Posted by: miyagawa | 2011.09.30 at 00:02
And yes, you should be able to say carton install AUTHOR/Module-ver.tar.gz to install a specific version of the distribution from CPAN.
Posted by: miyagawa | 2011.09.30 at 00:20
There are sane means of handling backwards incompatible changes within the current Perl ecosystem. Fear of breaking backwards compatibility has come up on the PDL list because reproducibility on science is a BIG deal. I would go so far as to say that the PDL developers hope to NEVER break backwards compatibility. My (so far unneeded) solution to this problem is that if you plan on introducing backward-incompatible changes, you should completely rename your module. In our case, that would likely be something like PDL2. This has the virtue that both PDL and PDL2 could be installed side-by-side with stock CPAN and could even be used in the same perl script.
Thoughts?
Posted by: David Mertens | 2011.10.05 at 06:58
David: yeah, renaming the module (like PDL, PDL2) is a working solution, and there are distributions that does that thing when they introduce major incompatibilities, such as Dancer2 in the works. Similarly, some modules use ::1, ::2 sub namespaces, like perl5i::2.
It works because the downstream have to explicitly change the dependency to request for the new major version, but it will be pain in the ass when the distribution has lots of downstream plugins etc. since the plugin authors have to duplicate the module to support the new version or to make it work with both versions, etc.
Posted by: miyagawa | 2011.10.05 at 09:23