For the impatient: can you make echo.psgi streaming server (run it as plackup -i AnyEvent -a eg/dot-psgi/echo.psgi port 9090) work without $start_response?
Python's WSGI and Ruby's Rack (as well as JavaScript JSGI) has a significant interface difference: start_response or 3 param responses. Namely:
Python's app will get two parameters env and start_response (in Perlish code):
sub app { my($env, $start_response) = @_; # do something my $w = $start_response->($status, $headers); return $body; # or $w->write($body) }This is a little ugly, but useful for server side push without thread environment (i think).
Ruby has threads (whether it's a native thread or user level thread doesn't really matter here) and it only has one arg for the app handler and returns 3 parameters as a response (in Perlish code):
sub app { my $env = shift; # do something return [ $status, $headers, $body ]; }We liked Rack's simplicity and made that as a default interface.
But well, this is Perl, and we can't say threads are available anywhere. Of course we do have ithreads, and wonderful Coro to do continuation which is actually a much better thing than ithreads (IMHO!), but asynchronous environments like AnyEvent, Perlbal, POE or other event loops would suffer if we require this three param response if they pause the request and resume once it's ready, or do server side push.
So we (half-)decided that 'psgi.async' $env variable should be an option to pass $start_response if needed. This 'optional' thing would make the whole thing a lot complicated. (I haven't written this async spec down for this exact reason)
Is being optional good or bad?
Shall we make start_response optional, or make this a default and ditch three param response?
Being optional means more flexibility: server implementations doesn't need to pass $start_response if that doesn't make sense there. The application framework may ignore $start_response if they don't want to do streaming. This might sound all good.
I observed an interesting thing when writing and testing Catlyst::Engine::PSGI, the PSGI adapter for Catalyst. Most catalyst engine has write() method, that outputs HTTP header (if it's not sent yet) and print the content immediately to the client. This is useful when you want to output a huge amount of data file (like CSV) out of the database without eating much memory. Of course you can rewrite that to save to a temp file first, or to write a PerlIO or IO-like objects that does database lookup until it's done using the getline() interface.
Actually, today most Catlayst engine implementation supports the immediate print.
The problem is that the current Engine::PSGI doesn't really support $start_response (yet) in $c->res->write . The output buffer to the write() method is currently buffered and then returned once everything gets done as three param response [ $status, $headers, $buffered_body ] to the PSGI server backends.
This might be okay, but not ideal. To support the immediate output, I should update Catalyst::Engine::PSGI to support $start_response, so if the response handler is there we'll immediately start the response and then write the body to the writer object.
But then, most implementations other than AnyEvent and Perlbal do not support $start_response, so they should support it to do streaming write, if they can. This makes the both side (app side and server side) implement both start_response mode and direct response mode.
Also, when we think about middleware that does both side, things get more complicated. tokuhirom and I discussed how the Gzip compress middleware would look like in WSGI and then Rack, and how we'll write them in Perl if they do $start_response or not.
Overall, start_response interface is more flexible but ugly. three param response is cleaner and middleware is easier to adapt but has limitations in event loops. Allowing both makes the both side initially happy, but makes the both side unnecessarily complicated eventually, i think.
To be honest, using start_response everywhere might make the both side harder (and uglier) to write initially but eventually reasonable amount of code. In other words, if we can implement the event loop pause/resume/server push with three param style response (with special $body type?) then we can leave start_response out.
So, today I'm okay with making it an option, but I'm really afraid this will bite us sooner than later. And then we might better decide to make it out (no start_response at all) or in (start_response everywhere).
What do you think?
Several thoughts:
Creating simple convenience wrapper that return IO handles that invoke the user callback when the server is ready to write would also work (but of course be more trouble to write). This would look something like:
so then the handler would look like:
Taking this a bit further:
which is used as:
When reified, this IO object would basically do:
On concurrent but single threaded backends this could then install event handlers to copy from $read to the real output handle in a nonblocking way:
On multithreaded/multiprocess concurrent backend, and nonconcurrent backends the $body could be simply extracted from the handle and invoked on the real output handle directly by the backend.
Therefore, I think start_response is an unnecessary abstraction, we can get away with what we have right now without falling into a tricky maze of blocking vs. nonblocking handlers.
Posted by: Yuval | 2009.09.23 at 22:24
oh duh... exit() after $write->$body
Posted by: Yuval | 2009.09.23 at 22:25
Yeah ... we've been toying with some special defer/generator type of $body object there so backends can deal with that (mostly using Plack::Util::foreach but some backends might be able to handle it separately) to do server push.
One thing I really want to have for the PSGI/Plack launch is that "All Catalyst apps should work fine" which is not true right now because of $c->res->write buffered output. I'm not sure how ideally this could be a problem, but we definitely could solve this once we have an answer to this question i.e. use start_response, or return direct response output to start response and then do some callback to pipeline the output to the backend.
Posted by: miyagawa | 2009.09.23 at 22:34
BTW, it's not very obvious but it's also possible to write an event savvy callback.
writer { my $fh = AnyEvent::Handle->new(shift);$something_else->cb(sub {
$fh->push_write(...);
$fh->push_shutdown;
});
}
which can of course still be run on a concurrent/multiprocess or nonconcurrent backend.
In this case the declaration could be written as event_writer { } or something to signal to an AnyEvent:: backend that it's OK to invoke the callback without pipe & fork, but the handle can still run like that as something blocking (using AnyEvent::Subprocess or something).
Posted by: Yuval | 2009.09.23 at 22:39
I think that abomination is better deprecated in Catalyst.
It really gets the order wrong, using state to mark the response as "finalized" and short circuiting out of those methods again after the response has been returned. It really makes no sense.
This could be facilitated by making all Catalyst handlers be called inside a writer { } block, but as a longer term fix Catalyst should really provide a better way to do streaming (e.g. by putting a handle inside the response).
Posted by: Yuval | 2009.09.23 at 22:45