Technology Radar Operator

The trend of increasing number of frameworks and programming languages warrants a new role to keep on top of recents developments while keeping your team sane: the “Technology Radar Operator.”

The onslaught of new frameworks is staggering. In the last month alone, Elixer’s first book was released, AngularJS 2.x caused a stir and MeteorJS reached 1.0. Will you be using any of these frameworks in the next year? Should you? What if you start a new project, write a new micro-service or need to fix a performance or usability problem in your product? Will you use your trusted MySQL, CakePHP and JQuery? Or will one of those components need an upgrade?

In stead of expecting every developer to know every new tool, I suggest teams larger than 10 developers to institute a (part time) role of technology watcher analogus to the ThoughtWorks technology radar. You don’t just need a radar, you need someone to operate it! The rest of the team can focus on getting better at the tools they already use.

The Radar Operator will investigate new tools, new releases and new languages and when a problem or project arrises, he/she will be consulted to advice on tools, frameworks and languages to use.

The main benefits of having this role in your company/team:

  1. Keep up to date on new technologies and how they benefit your projects/company
  2. Prevent overload of information for your developers
  3. Have a repeatable process for selecting and adopting new technologies.
  4. Attract/promote developers to fill this role

So what tasks will the Radar Operator have?

  • Know about the technological landscape your company or team(s) are in. Track usage and experience of the tools, e.g. who knows which framework
  • Keep tabs on the wider language landscape, is it a Java shop? Then Clojure and Scala are on the radar. Is it a Ruby shop? Then Elixer might need investigation.
  • Try out new versions of already used frameworks, find out which new features or bug fixes might impact the teams
  • Rate the tools according to company priorities. If raw performance is key, then always rate the new languages/frameworks on performance.
  • Educate and promote. New tools are not adopted easily, new languages and tools need to be presented and developers need to learn in hands-on sessions what it can do.
  • Map the value of new technology, what are their main strengths, weaknesses.
  • Guide adoption of new technologies, work with the teams adopting the new tools and spread those learnings.

What do you think? Is it time for this new role and would you want to take it up?

Technology Radar Operator

NoSQL Fast? Not always. A benchmark

NoSQL databases should offer superior performance and scalability. You’re giving up your trusted SQL for something, right?

This benchmark follows a simple use case:
Fetch 100.000 records from the database and stream that into an HTTP response. The test was done with 20 concurrent users.

We wanted to test the performance of different databases. Our options:

  • Cassandra – easily scalable
  • MongoDB – fast and flexible
  • PostresSQL – trusted technology
  • Datastore – App Engine Scalable Database

Reading 100.000 records from the database requires the result to be paged. Having concurrent users means you don’t want to fetch the full result set into memory before serving it.

Multiple clients accessing the same data can be good or bad for the database. Caching will work well since data is effectively read multiple times. Concurrency might also introduce competition for this resource depending on the architecture of the database.

The result were quite surprising, reading 100k rows from Cassandra with 20 threads  took on average 61 seconds. MongoDB was roughly twice as fast and PostgreSQL almost 30 to 60 times faster than Cassandra! Our experience with importing and exporting data from Cassandra matches these numbers, mutation small amounts of data is fast, getting large amounts of data out is slow.

Times are in seconds, PostgreSQL average response time is 0.9 seconds. Cassandra 61 seconds, MongoDB 17 seconds.

Databases

Lower is better. The 90% bar means: 90 percent of the requests were delivered in the given time

NoSQL Fast? Not always. A benchmark

Http Chunked Responses – streaming challenges

Streaming http responses is not used by many developers. Most web framework provide some support, but they all seem to give you the message: keep your responses small. If you want to find out what pitfalls to look out for when streaming large responses, read on.

Streaming the http response means you don’t know the length of the content when you start. This is mostly applicable to large responses, in our case a large result set from the database. Exploring a few options in Ruby and Java, there seems to be a few glaring omissions in the chunked response support. Be aware of them before you start and pick the framework that supports your use case best. Some frameworks (like Tappestry) don’t support it at all!

Blocking
Firstly, the response will be blocked. The framework should not mind that the calling thread is used (blocks) to serve a few minutes worth of data. Many Ruby web framework don’t handle long lived responses very well.

Documentation
The second omission is the documentation. In the Play framework, the Async responses always seems to be ‘run running calculations’. Streaming is also long running, but the first result is available almost instantly.

Multi threading
The third omission is multi threading. Most frameworks give you an object to write your write to (out or output stream) but if your response is large and requires transformation before output you might want to use multiple threads. Guess what happens when you hit the output stream with multiple threads? Breackage.

Buffering
The fourth omission is buffering. When writing a response you don’t want to send every character the minute it is available. One framework I found get this one right: undertow.

Pulling
The last omission I’ll discuss is pulling. When the client is downloading the response slowly, it’s easy to overwhelm the output stream (if it doesn’t block when it’s full). The most elegant solution would be one where the framework tells you when you can write more, when the client is (almost) done reading the previous response. The Grizzly framework seems to support this, I’ll check it out and report.

Http Chunked Responses – streaming challenges

Java 8 – Streams and Lambdas

Pipeline programming

Reactive programming is the new rage. Declare your logic and let it go! If you’ve worked with Scala or Ruby, you’re used combining different operations on your collection like mapping, sorting, filtering. When you can create functions (e.g. blocks) on the fly this becomes simple and readable. For instance mutation, filtering and sorting an array in Ruby is one line of:

['b', 'c', 'a', 'ab'].map(&:capitalize).select{|i|i.length == 1}.sort # => ["A", "B", "C"]

Lambdas

Luckily Java 8 also has the ability to create functions (lambdas). They are great for transforming your collections. Where creating a filtered version of a collection involved creating a new collection and a loop over the old one, with lambdas you can often replace this with a short and expressive statement

In it’s true statically half object oriented nature, Java has special functional classes to represent lambdas where primitives play a special part. Defining a function could like something like this:

ToIntFunction<? super String> alwaysOne = s -> 1;

Probably you will mostly use anonymous lambdas on collections. Collections (e.g. List, Map, Set) have new functions specifically designed for this. We’ll get to the Streams in a moment, first a short summary of the collection goodness you get with Java 8:

Java 8 adds these methods to all collections (iterables):

  • forEach – perform some action on each element
  • removeIf –  remove elements for witch the lambda return true

java.util.List also adds:

  • replaceAll – the mutable version of ‘map’ which doesn’t return the new collection (gotta love Java)
  • sort – mutates collection, lambda should return compareTo compatible results (e.g. 0, 1, -1)

java.util.Map adds:

  • computeIfPresent – replace or remove one entry with result of lambda but only if entry already existed
  • computeIfAbsent – sets a map value if it’s not already there (great for lazy initialising)
  • compute – combination of the above functions
  • merge – add or replace entry using the old value as input
  • replaceAll – same as List

Function pointers

Perhaps the weirdest syntactical change in Java 8 is the method reference operator :: In stead of using a plain lambda expression:

stream.map(s -> s.toLowerCase)

you can refer to the method by it’s class and name directly:

stream.map(String::toLowerCase)

A bit less verbose but it might not be obvious when to use the reference over the lambda.

Enter Streams

Apart from the useful functions per collection, the java.util.stream.Stream is the new functional kid on the Java block. They are designed specifically for lambda operators, parallel processing and chaining multiple transformations together. A simple example will make Java 8 get very close to the Scala/Ruby version:

Arrays.asList("a", "b").stream().map(String::toUpperCase).filter(s -> s.length() == 1).sorted()

A few things to note: generics are here to save the day, ‘s -> s.length’ is only possible because the type is already known. There is some type inference going on, so the input type (String in this case) can differ from the output type, which is very cool. Most notably Stream offers you two functions: map and reduce, renowned functions from the functional world. And since you can use concurrent computation by using parallel streams, you can easily write a fast single machine map/reduce using the stream API. If you’ve worked with Scala or Ruby you’ll realise how profound it is having these two functions available.

Not all lambdas are equal

Streams have two kinds of operations: intermediate (transforming, lazy) and terminal (value producing).  In its try OO style, in Java 8 the responsibility for collecting all those (parallel) transformations into some output is delegated to Collectors. In a not so great OO style but conveniently terminal operations like sum() are available directly on some of the stream types.

Types of Streams

This:

Arrays.asList(“1”, “2”, “3”).stream().map(s -> new Integer(s)).filter(i -> i > 1).collect(Collectors.summingInt(Integer::intValue)); //  5

is equivalent to:

Arrays.asList(“1”, “2”, “3”).stream().mapToInt(Integer::valueOf).filter(i -> i > 1).sum(); //  5

If we weren’t producing Integers but more complex objects, the first variation would make more sense, but the second version creates an IntStream, which has convenient methods that  only make sense on a numeric stream such as sum(). Creating an IntStream directly from a Collection containing Integers is not possible but using the static functions of the IntStream interace (good Lord) you can do:

IntStream.of(1, 2, 3).filter(i -> i > 1).sum();

In short

The new lambdas in Java 8 have already been to good use in the default libraries. Streams are going to make your Java programming life very different, adding fast and expressive ways of filtering and transforming your data. When Java 8 is adopted expect your functions (or your colleagues functions) to start accepting and returning Streams in stead of collections. It will make your programs faster, shorter and more fun to write.

Further reading

 

Java 8 – Streams and Lambdas

The Clojure Ecosystem

The Clommunity

Recently starting programming in Clojure, the somewhat lackluster state of some of the Clojure community projects is striking. Programming languages thrive when there is an active community and Clojure seems a mixed bag. Clojure sees regular (non breaking) releases and some of Clojure books are amongst the best development books ever written.

Leiningen

Probably one of the tools that makes us sing songs of hope for Clojure is Leiningen. It’s an active project and it’s the defacto standard for building Clojure projects. It hasn’t crapped out on me once, I installed a few plugins with no issues and

IDE support

  • Sublime Text 2. The SublimeREPL project is pretty awesome, but it’s not Clojure specific. The REPL does start and being able to send snippets of code to the REPL is pretty neat. I’ve developed a few functions using this approach. The main issue is that the Repl is not using the Lein project but just starts a vanilla Repl. The Repl froze a few times: annoying. So I’d say it’s nice but not awesome.
  • Eclipse. Though both Eclipse and Intellij feel too heavy for a dynamic language, they do have plugins you can use. Counterclockwise is the Clojure plugin for Eclipse. It has support for Leiningen, but lacks support for running tests. Running a Repl  inside Eclipse feels odd but sending code snippets seems to work. I couldn’t figure out how to load the full project into the Repl, which is strange because Eclipse is so project oriented.
  • IntelliJ. La Clojure is the plugin for IntelliJ, maintained by Jetbrains (which ought to be a plus). Haven’t used it yet.
  • Emacs. There are several deceised Emacs clojure plugins, but one has emerged (survived): nrepl.el. I didn’t know Emacs, so trying it out was a bit of a struggle. Emacs uses Lisp for some parts, so I imagine it has good support for Clojure’s syntax. The commands available from nrepl make it seem like a very powerful option.

SQL

For SQL databases there is Korma, which still a bit minimal but it allowed me to do a fairly complex join query using aliases. It’s under active development, although I would advise to double check its working before using it in a live environment. Clojureql seems to be a more stable but also a bit stale alternative.

Solr

Solr is a popular Lucene based database, it has search features beyond regular SQL databases and it is generally well supported. I say generally because Solr seems to be a weak spot for Clojure. The main reason for this blog was the state of Clojure Solr libraries.

There are 4 Solr Clojure projects (that I could find).

  • clojure-solr seems to be the grand daddy given its age and the fact that one of the other projects is based on it. It’s dormant, even the forks are not up to date. 
  • solrclient is not really a project, more a one off trying to use Solr through its JSON interface.
  • solrclj is less dead and looks like a proper project. It hasn’t been updated to the latest Solr and Clojure versions though. If I’d fork one project to bring it up to date it would be this one.
  • star is, well, the rising star amongst these libraries. Only created hours ago and literally minutes after I started searching for Solr + Clojure on github. The big ‘cool’ here is that the author also created the famous Solr Ruby client rsolr. Though no real lines of Clojure have been written yet, from the first commit it already looks promising and this guy obviously knows what is required in a Solr client library.
  • icarus is a little bit more recent, there is a version that supports Solr 3.5 (4.2 is the most recent). It’s only for querying and looks like it’s no longer maintained. It’s based on clojure-solr.

Unit Testing

Clojure.test

Clojure comes with a unit testing API build right into the core. It’s cool that you can start writing unit tests out the gate. Leiningen runs it out the box and for a Java developer with JUnit experience (like me), it just works.

Midje

I’ve been reading Brian Marik’s book on Clojure, he authored his own testing framework Midje. Haven’t looked at it yet, but worth a mention.

Lazytest

Lazytest was written by the same guy who did the default Clojure.test library. Lazytest has one main issue: it’s dead. The current development branch is unstable but also two years old, a fatal combination. It’s one giant rough edge and doesn’t seem to support Leiningen properly.

Autotest

One nice feature of SBT (Scala’s Leiningen) is the automatic compilation and running of tests on source change. There are some attempts to add this to Lein, but projects are either undocumented, too old or test framework specific. I do hope I’ll find a lein plugin that just does ‘detect source change = rerun tests’

Misc

One library worth mentioning is Cheshire. I’ve only used one function (parse-string) so far but it’s a feature rich and fun library to use. Of course its main strength comes from Clojure itself because any JSON structure can be parsed to and from a Clojure data structure (of maps and vectors). Keys turning into keywords makes it a pleasure to work with JSON. The author Lee Hinman seems a very productive member of the Clojure community, check out his other projects.

For further exploration of Clojure, here’s a list of libraries and Leiningen plugins

The Clojure Ecosystem

Version numbers are overrated – use version labels instead

Most developers use version numbers for their software. It’s considered a ‘best practice’, but version numbers are overused and give you a false sense of control and compatibility. Use the simplest versioning scheme possible, where versionless is the simplest.

Rationale

What is a version number? It’s a label (doesn’t have to be numeric) that identifies one particular snapshot of your software. Usually there is one overall version number for the whole system, even when components of that system have their own versions. Using this label, anyone can reason about the state of the code for that label. Users might say, in version IE 9.0.1.2 there is a bug when I try to print. The developers will know exactly which state the code was in for that version number and should be able to find and run that version (and fix the bug). Summary: you need to label revisions of your code to refer to it later.

Numbers are silly

So far, so good. Version labels are useful. But in stead of labeling the software with a date of release, the name of the latest feature, most developers use numeric labels. And not just a sequence number (1, 2, 3), but fancy numbers with dots! Giving your application a version number like 15.0.12.21.h makes it (and you) look really complicated and smart, don’t you think? Have you seen the Chrome version numbers? Anyway, let’s go into these version numbers more deeply. A typical application, let’s say a website for creating blogs, releases these versions:

  • 1.0
  • 1.1
  • 1.2
  • 2.0

What does that mean? Well, convention says:

  • first ever release
  • small change
  • small change
  • big change

That’s it, nothing more to it. But the numbers look more professional, don’t you think? Although list 1 and 2 have the exact same meaning. And was the change from 1.2 to 2.0 was really that big? We don’t know. What this developer deems big, might not be so big and worthy of a ‘major’ version bump for others. So why use numbers at all? Why not use descriptive names like:

  • ‘first basic YourBlog release’
  • ‘fixed bugs in login’
  • ‘fixed bugs in page-saving’
  • ‘redesigned UI’
Now you might argue that you can’t see which version preceded which. Who cares? If you need get back to the exact version of fixed bugs in page-saving you can just check out that version (assuming you’ve labeled it in your source control). If you want, you can track the chronology somewhere else. Or you could add the date to the label to signify when it was released to add even more meaning to the label:
  • ‘2011-1-10 first basic CMS release’,
  • ‘2011-2-1 fixed bugs in login’,
  • ‘2011-2-9 fixed bugs in page-saving’,
  • ‘2011-3-2 redesigned UI (big change)’.
Now you have everything you need, suppose you read theses labels in your source control? Much better than just 1.1, 1.2, don’t you think?

My Application is an API

There are cases where using the dot-numbers make sense: when you’re developing an API or library for public use. Then the numbers signify something very important: compatibility. The convention basically says: if you increase the number behind the dot (1.1 -> 1.2), the library will be backwards compatible with all 1.x releases. When increasing the main (major) number (1.2 -> 2.0), compatibility is not guaranteed with 1.x releases. That is quite an exact definition of a big change. So if your API is always backwards compatible, you can stick to 1.x releases forever.
Now, this might be the root cause of the wide spread use of numbers as version labels. Most developers think API development is the coolest thing in the business and many practices (like misuse of interfaces in Java) stem from thinking your code will be used as an API. In reality, most code is not an API. GUI applications don’t have a programatic interface, so they don’t need version numbers. Even if you’re releasing a new version of your library for in-house use, no one will trust your backwards compatibly claims anyway and just re-test their whole system using your new version. Only when the library is developed externally, you want to know if you can upgrade safely or not.
To summarize: for API development version numbers signify something important, most people don’t build public APIs but build applications or websites that don’t require this strict convention.

Maven: WTF?

Maven doesn’t help in making people steer clear of the number fetish. When you create a new maven project. The initial version for you module is: 0.0.1. Yes, not one dot, but two! You’re going to be doing some serious API-building business. No, you’re not.

What does that mean? By convention, 0.0.1 means ultra-pre-beta-alpha release. It’s probably just prints ‘hello world’ in a console. So you start building and maven increases your version with every release you make. 0.0.1, 0.0.2, 0.0.3. You’re not making much progress, so after a few weeks you make the bold move and change it to 0.1, your first pre-beta-alpha release! Or is it, perhaps it’s already live, in production and making money. When do you switch over to 1.0? When all the bugs have been fixed? The first time you go live? 1.0 means done, right? Stable, right? Yes, it does. But your website is never done, your GUI will always have issues so there is no 1.0!

I find when building whole systems, there is never a clean cut release. The first public release is not much different from the one before or after. So stop fiddling with the numbers.

Just to get back at the API argument, when you release an API to the world, there will be a 1.0. It’s that contract again, meaning: this API is stable and ready for production use, it has undergone testing and everything. This number means something because your communicating it explicitly, it’s in the product you ship (like commons-lang-1.0.jar). Your website just has a url, your GUI an icon, no version.

Version-less coding

Some code doesn’t need version numbers at all, it’s just code. If it’s checked into source control (on a particular branch) it’s usable. No 0.1 release or 0.1-SNAPSHOT. Just the code. If I want to make change that breaks stuff, I’ll create a separate branch. Maven doesn’t allow this, it’s basically duplicating what my source control system can do better (track revisions). For libraries, I might want this, but for my main project, I don’t. One of the reasons these version numbers are still so prevalent, is that Maven requires a version number. Starting with 0.0.1 will start you off thinking you need a complex version numbering scheme.

The build number alternative

I propose a complete alternative to using version numbers. It’s simple: build numbers and labels. Every time you build your system, for instance in your CI tool (like Hudson) it gets a build identifier, let’s say a timestamp. That number might be set in your source control as a label or the revision of the code might be stored alongside the build number, either way the build number will be a reference to a specific state of your code, but also a specific attempt to build that code. Sometimes you have to build the same code multiple times to get a good release (your build procedure might have issues). Now the build numbers are just a sequence. You can label releases that are actually going out for release, so you can refer to them later. Using the build number, you can even pick up the exact artifacts from that build.

  • 2011-01-01-1543 – ‘added new content type’
  • 2011-01-02-1218
  • 2011-01-02-1543
  • 2011-01-02-1743 – ‘improved render time’
  • 2011-01-03-1109

Most CI tools (like Bamboo, probably others too) even support labeling a build.

One extra benefit of this, is that your code doesn’t have to contain version numbers. I think it’s a smell that the version of the code, is in the code. The version is something external to your code, your source control system has to deal with revisions, not the code itself. You’ll see that a lot of web projects don’t have a version at all, it’s just the code.

So, using labels in stead of dot-numbers, everyone will know what the version entails but you don’t have to worry about the numbers anymore. So, let’s make it easier!

Version numbers are overrated – use version labels instead

Front End Architecture

HTML/Java Workflow & Architecture

Currently I’m thinking about different ways to set up a good set up for creating/changing websites. When your team consists of HTML/CSS developers on one side that know about user interaction, graphics and layouts and Java developers on the other side that excel at making data available and dealing with the complexities of 3rd party communications. The challenge is to set up the team and architecture for an effective workflow, letting everyone do what he/she does best.

The context is a company that has a lot of data and wants to make this data accessible through multiple channels and sites. The user experience is a very important, so tuning the websites on the HTML level is an ongoing activity. Expanding to new devices and websites is one of the (technical) objectives.

The way to set up the components and what abstractions to use is not obvious. After speaking to some peers in the industry, I’ve distilled three main options to choose from. Combinations are possible, but I’ll describe the simple case. I haven’t decided which is best, though and I might be overlooking some crucial things. My main goals for any solutions are:

  • Zero round-trip while developing HTML/CSS/JS
  • Independence of HTML developer when creating new ways to display the same data
  • Smooth work-flow from concept to implementation.

JavaScript All the Way

With this approach JavaScript takes care of all interactions of the website. A good example is the Google Start page, which contains very little HTML, but once you enter a search term, the content completely changes, using JavaScript

The pre-rendered HTML is very minimal, providing the structure that you fill with JavaScript. The biggest advantage is, provided you can create a clean remote API for JavaScript to access, total independence of your HTML development. As long as you want to display the same data, new websites can be created with few changes required on the back-end. Since the HTML doesn’t have to be server generated, it can be plain HTML all the way. Although that simplifies things there are some downsides as well. Google being the biggest issue; the content generated by JavaScript is not parsed by Google, so your site will not do well in the search engines. Getting the data to your JavaScript means you’ll have to provide an easy http accessible API (using JSON or REST etc.), it might not be trivial to come up with a clean API, basically you’re adding a physical layer to your architecture.

Pros

  • Largely independent from back-end
  • Zero roundtrip when changing HTML
  • There is only one version of the HTML (no conversion to templates)

Cons

  • Google cannot index content generated by JavaScript
  • Complex workflows that keep state are harder to maintain/debug in JavaScript
  • An extra layer of
  • Performance and security
  • Re-use is limited

Server Side Scripting

Here you don’t let JavaScript fill your combo-boxes, but it’s a dynamic language like PHP/Python/Ruby/ASP that renders them (let’s assume PHP). The HTML is generated by a language that intrinsically supports a fast (save&refresh) development cycle. Although these scripting languages are powerfull enough to build your entire application, you risk not separating the data from the presentation or trying to build things that are too complex (PHP-hell). So you will probably have your PHP access some (remote) API that handles the complex things. There is a bit more flexibility in terms of connecting your PHP to the back-end API and because the PHP is run on the server security is less of an issue.

Pros

  • HTML can be indexed by Google.
  • Short roundtrip when changing PHP
  • PHP code runs on a trusted server
  • Simple enough for HTML developers

Cons

  • Risk of PHP layer to grow too big
  • Extra runtime environment needed (Apache)
  • HTML only works after PHP parsing

The Java World

This is (for me) the most common model. Some Java Web-framework handles the user request, invokes a controller and passes the result to some HTML-template engine which renders the Java objects into HTML. The real difference here is that you have direct access to Java objects from the HTML template, so when you can model and access your data in Java, you can use these objects directly in your HTML. For Java developers this is very natural. Your web application generates HTML, whilst the developer can work and reason about Java objects almost all the way, the HTML is considered the VIEW part of the framework. One issue with this, is when the HTML layer is doing more than just render the data. Integrating interactive behavior in your HTML is handled by some frameworks quite extensively but at the cost of proprietary abstractions of both JavaScript and the interactions between front and back end. Although this works, it seems to be mostly geared towards Java developers that don’t want to deal with real HTML and JavaScript. Good for Java developers, unnatural for HTML developers.

Pros

  • Data access and HTML generation in one machine/JVM, easy and fast
  • Short development time for Java-heavy applications

Cons

  • Extra channels and websites will need to be modeled in the server framework
  • Templates are not HTML, only work when run inside the Java framework
  • Longer development round-trip, but needs full Java set-up with war overlay

Wrapping up

So what’s the best? Of course it depends. Did I miss any pros or cons? What do you use to get maximum productivity from both your back-end and front-end developers.

References

Front End Architecture