In a typical web application the most frequently occurring task is to get parameters from a request. Perl community and popular frameworks have been having two interfaces to this: param()
and parameters()
. And there's a few issues.
param()
Good old CGI.pm has a convenient param()
method, which behaves differently based on a context:
my $q = CGI->new;
my @keys = $q->param(); # get the list of param names
my $name = $q->param('name'); # scalar context: always get single
my @names = $q->param('name'); # list context: get multiple (if any)
This is quite nice, since your code says how you want values by explicitly stating the context (whether a scalar context or a list context). The only place it bites is that there are cases where you accidentally force a list context, such as when assigning it to a hash or pass to a method call:
my $vars = {
name => $q->param('name'), # Oops, it's a list context!
email => scalar $q->param('email'), # this is correct
};
This code quite doesn't work if there are multiple (and even number of ) name
parameters, or even worse, injects some unintentional parameters to $vars
that could be seriously dangerous if you inject that to an internal utilities or databases.
So, param()
is quite nice but only if you are really careful for this list context gotcha.
parameters()
Catalyst has added parameters()
to its Catalyst::Request object and it allows you to get values in an array ref if there are multiple.
my $form = $c->request->parameters;
# ?a=b&b=c
# $form = { a => 'b', b => 'c' }
# ?a=b&a=c&b=c
# $form = { a => [ 'b', 'c' ], b => 'c' };
This might look intuitive but wait a minute. The data structure gets different per user input rather than how you code it, and that sucks. This means you have to always check if the value is an array ref or not, since:
my $v = $c->request->parameters;
my $query = $v->{query};
my @names = @{$v->{name}};
$query
might become ARRAY(0xabcdef)
if there are multiple query=
parameters in the query. @names
line might cause Can't use string as an ARRAY ref
error if there's only one (or zero) name
parameter. This causes horrible issues when using standard HTML elements like option
or checkbox
forms, or tools like jQuery's serialize()
.
The correct way to write that would be:
my $v = $c->request->parameters;
my $query = ref $v->{query} eq 'ARRAY' ? $v->{query}->[0] : $v->{query};
my @names = ref $v->{name} eq 'ARRAY' ? @{$v->{name}} : ($v->{name});
and it is tedious and gross.
Rack::Request
Let's see how other languages try to solve this problem. First, Rack::Request.
Rack::Request has params
method which always returns a Hash object. They have their own rule for multiple values. If there are multiple values for the same key (like foo
), the value is always the last value. By naming the key in a special way, like foo[]
, you can state that "This key might have multiple values", and req.params['foo']
would return Array instead of the String value.
Although it kind of hurts that you have to force this behavior in the low level library like Rack, but I think this is a good middle ground, since you can name your parameters in your templates and the request handler code to specify whether you want an Array or a String. This technique has been actually ported to Perl as modules like Catalyst::Plugin::Params::Nested
WebOb.py
WebOb is a Python paste library to handle WSGI request parameters and such and is used in Python frameworks such as Pylons. WebOb document explicitly talks about this may-or-may-not-be-multiple params problem very clearly:
Several parts of WebOb use a “multidict”; this is a dictionary where a key can have multiple values. The quintessential example is a query string like ?pref=red&pref=blue; the pref variable has two values: red and blue.
In a multidict, when you do request.GET['pref'] you’ll get back only 'blue' (the last value of pref). Sometimes returning a string, and sometimes returning a list, is the cause of frequent exceptions. If you want all the values back, use request.GET.getall('pref'). If you want to be sure there is one and only one value, use request.GET.getone('pref'), which will raise an exception if there is zero or more than one value for pref.
and I like it. It does the right thing if you handle as a normal hash but provides a method like getall
to explicitly demand list instead of a string.
Hash::MultiValue
So, I was thinking of stealing this idea for our Plack::Request which currently inherits this sucky parameters() from HTTP::Engine and then Catalyst::Request, which most of the Plack gang agree is a bad idea.
Last night I was sketching the initial implementation of WebOb's MultiDict into Perl: Hash::MultiValue. It uses tie
to behave like a normal hash with a single entry, but with an API to get multiple values if you want:
use Hash::MultiValue;
my $hash = Hash::MultiValue->new(
foo => 'a', foo => 'b', bar => 'baz',
);
# $hash is an object, but can be used as a hashref and DWIMs!
my $a = $hash->{foo}; # 'b' (the last entry)
my @k = keys %$hash; # ('foo', 'bar') not guaranteed to be ordered
You can use the object just like a normal hash reference, and the value always returns the last element (if there are multiple). And you can also use the OO API call on the object to get multiple values, just like WebOb's MultiDict:
my $foo = $hash->get('foo'); # always single (regardless of context)
my @bar = $hash->get_all('bar'); # always multi
my @keys = $hash->keys; # Ordered keys
You should always use this get_all
if you want multiple values. Being explicit is a good thing, right? There is also no list context gotcha like you see with CGI.pm style param().
Performance concern
There is a benchmark script attached because it used to do some tie/overload stuff which should definitely affect the performance.
UPDATE: this module does not use tie nor overload anymore, but uses inside-out object approach, thank to Michael Peters and Aristotle for the suggestion! The post content is updated appropriately.
With my quick test, the inside-out object based approarch, in a typical web request where there's only a few (~10) keys the performance is like 21,000 QPS (Hash::MultiValue) vs 32,000 QPS (normal hash). So, it is just like 80% of the overhead.
Whether this would become a critical overhead depends how fast your web application is: Plack standalone server runs like 1500 QPS and most framework gives an overhead to make it 500 QPS or less, so I think the overhead would be eventually < 1% of your web application, so maybe it doesn't really matter.
I'll probably spend some time soon on Plack-Request repository by creating a branch for this type of thing. Any input would be highly welcome ;)
You could potentially get around the tie() bit if you did an inside-out like object. Have the object's internal data stored in this global/inside-out hash but the requests values stores in the object's own hash. Not saying this would necessarily be better, but it's an alternative approach to avoid tie().
Posted by: Michael Peters | 2009.12.15 at 17:08
Agh, that sounds like a cool idea. Let me see how it performs...
Posted by: miyagawa | 2009.12.15 at 17:18
Implemented the inside-out object state:
http://github.com/miyagawa/Hash-MultiValue/commit/4a4621760d4037368ac8c5edd7e8cb2fc7742197
The benchmarks is not really bad: tie -> 6000 QPS, inside-out -> 14000 QPS and normal hash -> 32000 QPS
The only downside for this approach is that hash based write doesn't write through:
$hash = Hash::MultiValue->new(foo => 1, foo => 2);
$hash->{foo} = 3;
$hash->getall('foo'); # still (1, 2)
Posted by: miyagawa | 2009.12.15 at 17:50
Also, even with the tie() implementation: you can do ->as_hashref; to "finalize" the data structure to bypass tie() overhead onward. That gives 15000 QPS instead of 6000, which is a pretty good deal.
Posted by: miyagawa | 2009.12.15 at 18:08
Committed a new one on insideout:
http://github.com/miyagawa/Hash-MultiValue/commit/3d57fac7e293ca0a9d0371204ed806e64a054ba7
Simplified most of the code and the benchmark says something like 21000 QPS v 32000 QPS which is acceptable. The hash based write would not go through but i'll document that properly and merge.
Posted by: miyagawa | 2009.12.16 at 08:20