Http Chunked Responses – streaming challenges

Streaming http responses is not used by many developers. Most web framework provide some support, but they all seem to give you the message: keep your responses small. If you want to find out what pitfalls to look out for when streaming large responses, read on.

Streaming the http response means you don’t know the length of the content when you start. This is mostly applicable to large responses, in our case a large result set from the database. Exploring a few options in Ruby and Java, there seems to be a few glaring omissions in the chunked response support. Be aware of them before you start and pick the framework that supports your use case best. Some frameworks (like Tappestry) don’t support it at all!

Blocking
Firstly, the response will be blocked. The framework should not mind that the calling thread is used (blocks) to serve a few minutes worth of data. Many Ruby web framework don’t handle long lived responses very well.

Documentation
The second omission is the documentation. In the Play framework, the Async responses always seems to be ‘run running calculations’. Streaming is also long running, but the first result is available almost instantly.

Multi threading
The third omission is multi threading. Most frameworks give you an object to write your write to (out or output stream) but if your response is large and requires transformation before output you might want to use multiple threads. Guess what happens when you hit the output stream with multiple threads? Breackage.

Buffering
The fourth omission is buffering. When writing a response you don’t want to send every character the minute it is available. One framework I found get this one right: undertow.

Pulling
The last omission I’ll discuss is pulling. When the client is downloading the response slowly, it’s easy to overwhelm the output stream (if it doesn’t block when it’s full). The most elegant solution would be one where the framework tells you when you can write more, when the client is (almost) done reading the previous response. The Grizzly framework seems to support this, I’ll check it out and report.

Http Chunked Responses – streaming challenges

Java 8 – Streams and Lambdas

Pipeline programming

Reactive programming is the new rage. Declare your logic and let it go! If you’ve worked with Scala or Ruby, you’re used combining different operations on your collection like mapping, sorting, filtering. When you can create functions (e.g. blocks) on the fly this becomes simple and readable. For instance mutation, filtering and sorting an array in Ruby is one line of:

['b', 'c', 'a', 'ab'].map(&:capitalize).select{|i|i.length == 1}.sort # => ["A", "B", "C"]

Lambdas

Luckily Java 8 also has the ability to create functions (lambdas). They are great for transforming your collections. Where creating a filtered version of a collection involved creating a new collection and a loop over the old one, with lambdas you can often replace this with a short and expressive statement

In it’s true statically half object oriented nature, Java has special functional classes to represent lambdas where primitives play a special part. Defining a function could like something like this:

ToIntFunction<? super String> alwaysOne = s -> 1;

Probably you will mostly use anonymous lambdas on collections. Collections (e.g. List, Map, Set) have new functions specifically designed for this. We’ll get to the Streams in a moment, first a short summary of the collection goodness you get with Java 8:

Java 8 adds these methods to all collections (iterables):

  • forEach – perform some action on each element
  • removeIf –  remove elements for witch the lambda return true

java.util.List also adds:

  • replaceAll – the mutable version of ‘map’ which doesn’t return the new collection (gotta love Java)
  • sort – mutates collection, lambda should return compareTo compatible results (e.g. 0, 1, -1)

java.util.Map adds:

  • computeIfPresent – replace or remove one entry with result of lambda but only if entry already existed
  • computeIfAbsent – sets a map value if it’s not already there (great for lazy initialising)
  • compute – combination of the above functions
  • merge – add or replace entry using the old value as input
  • replaceAll – same as List

Function pointers

Perhaps the weirdest syntactical change in Java 8 is the method reference operator :: In stead of using a plain lambda expression:

stream.map(s -> s.toLowerCase)

you can refer to the method by it’s class and name directly:

stream.map(String::toLowerCase)

A bit less verbose but it might not be obvious when to use the reference over the lambda.

Enter Streams

Apart from the useful functions per collection, the java.util.stream.Stream is the new functional kid on the Java block. They are designed specifically for lambda operators, parallel processing and chaining multiple transformations together. A simple example will make Java 8 get very close to the Scala/Ruby version:

Arrays.asList("a", "b").stream().map(String::toUpperCase).filter(s -> s.length() == 1).sorted()

A few things to note: generics are here to save the day, ‘s -> s.length’ is only possible because the type is already known. There is some type inference going on, so the input type (String in this case) can differ from the output type, which is very cool. Most notably Stream offers you two functions: map and reduce, renowned functions from the functional world. And since you can use concurrent computation by using parallel streams, you can easily write a fast single machine map/reduce using the stream API. If you’ve worked with Scala or Ruby you’ll realise how profound it is having these two functions available.

Not all lambdas are equal

Streams have two kinds of operations: intermediate (transforming, lazy) and terminal (value producing).  In its try OO style, in Java 8 the responsibility for collecting all those (parallel) transformations into some output is delegated to Collectors. In a not so great OO style but conveniently terminal operations like sum() are available directly on some of the stream types.

Types of Streams

This:

Arrays.asList(“1”, “2”, “3”).stream().map(s -> new Integer(s)).filter(i -> i > 1).collect(Collectors.summingInt(Integer::intValue)); //  5

is equivalent to:

Arrays.asList(“1”, “2”, “3”).stream().mapToInt(Integer::valueOf).filter(i -> i > 1).sum(); //  5

If we weren’t producing Integers but more complex objects, the first variation would make more sense, but the second version creates an IntStream, which has convenient methods that  only make sense on a numeric stream such as sum(). Creating an IntStream directly from a Collection containing Integers is not possible but using the static functions of the IntStream interace (good Lord) you can do:

IntStream.of(1, 2, 3).filter(i -> i > 1).sum();

In short

The new lambdas in Java 8 have already been to good use in the default libraries. Streams are going to make your Java programming life very different, adding fast and expressive ways of filtering and transforming your data. When Java 8 is adopted expect your functions (or your colleagues functions) to start accepting and returning Streams in stead of collections. It will make your programs faster, shorter and more fun to write.

Further reading

 

Java 8 – Streams and Lambdas

Front End Architecture

HTML/Java Workflow & Architecture

Currently I’m thinking about different ways to set up a good set up for creating/changing websites. When your team consists of HTML/CSS developers on one side that know about user interaction, graphics and layouts and Java developers on the other side that excel at making data available and dealing with the complexities of 3rd party communications. The challenge is to set up the team and architecture for an effective workflow, letting everyone do what he/she does best.

The context is a company that has a lot of data and wants to make this data accessible through multiple channels and sites. The user experience is a very important, so tuning the websites on the HTML level is an ongoing activity. Expanding to new devices and websites is one of the (technical) objectives.

The way to set up the components and what abstractions to use is not obvious. After speaking to some peers in the industry, I’ve distilled three main options to choose from. Combinations are possible, but I’ll describe the simple case. I haven’t decided which is best, though and I might be overlooking some crucial things. My main goals for any solutions are:

  • Zero round-trip while developing HTML/CSS/JS
  • Independence of HTML developer when creating new ways to display the same data
  • Smooth work-flow from concept to implementation.

JavaScript All the Way

With this approach JavaScript takes care of all interactions of the website. A good example is the Google Start page, which contains very little HTML, but once you enter a search term, the content completely changes, using JavaScript

The pre-rendered HTML is very minimal, providing the structure that you fill with JavaScript. The biggest advantage is, provided you can create a clean remote API for JavaScript to access, total independence of your HTML development. As long as you want to display the same data, new websites can be created with few changes required on the back-end. Since the HTML doesn’t have to be server generated, it can be plain HTML all the way. Although that simplifies things there are some downsides as well. Google being the biggest issue; the content generated by JavaScript is not parsed by Google, so your site will not do well in the search engines. Getting the data to your JavaScript means you’ll have to provide an easy http accessible API (using JSON or REST etc.), it might not be trivial to come up with a clean API, basically you’re adding a physical layer to your architecture.

Pros

  • Largely independent from back-end
  • Zero roundtrip when changing HTML
  • There is only one version of the HTML (no conversion to templates)

Cons

  • Google cannot index content generated by JavaScript
  • Complex workflows that keep state are harder to maintain/debug in JavaScript
  • An extra layer of
  • Performance and security
  • Re-use is limited

Server Side Scripting

Here you don’t let JavaScript fill your combo-boxes, but it’s a dynamic language like PHP/Python/Ruby/ASP that renders them (let’s assume PHP). The HTML is generated by a language that intrinsically supports a fast (save&refresh) development cycle. Although these scripting languages are powerfull enough to build your entire application, you risk not separating the data from the presentation or trying to build things that are too complex (PHP-hell). So you will probably have your PHP access some (remote) API that handles the complex things. There is a bit more flexibility in terms of connecting your PHP to the back-end API and because the PHP is run on the server security is less of an issue.

Pros

  • HTML can be indexed by Google.
  • Short roundtrip when changing PHP
  • PHP code runs on a trusted server
  • Simple enough for HTML developers

Cons

  • Risk of PHP layer to grow too big
  • Extra runtime environment needed (Apache)
  • HTML only works after PHP parsing

The Java World

This is (for me) the most common model. Some Java Web-framework handles the user request, invokes a controller and passes the result to some HTML-template engine which renders the Java objects into HTML. The real difference here is that you have direct access to Java objects from the HTML template, so when you can model and access your data in Java, you can use these objects directly in your HTML. For Java developers this is very natural. Your web application generates HTML, whilst the developer can work and reason about Java objects almost all the way, the HTML is considered the VIEW part of the framework. One issue with this, is when the HTML layer is doing more than just render the data. Integrating interactive behavior in your HTML is handled by some frameworks quite extensively but at the cost of proprietary abstractions of both JavaScript and the interactions between front and back end. Although this works, it seems to be mostly geared towards Java developers that don’t want to deal with real HTML and JavaScript. Good for Java developers, unnatural for HTML developers.

Pros

  • Data access and HTML generation in one machine/JVM, easy and fast
  • Short development time for Java-heavy applications

Cons

  • Extra channels and websites will need to be modeled in the server framework
  • Templates are not HTML, only work when run inside the Java framework
  • Longer development round-trip, but needs full Java set-up with war overlay

Wrapping up

So what’s the best? Of course it depends. Did I miss any pros or cons? What do you use to get maximum productivity from both your back-end and front-end developers.

References

Front End Architecture