Creating objects in Java is easy with the new keyword. In fact, it’s one of those things that you don’t think about. Need to access a file? Just create a new File instance: new File("build.properties"). For most Java developers, that’s all they need to know. Life becomes more interesting, though, when you start working with multiple class loaders.
Class loaders? Argh! Run away, run away!
That was pretty much my reaction for many a year. I just didn’t want to know about them. They were some kind of black magic and always Somebody Else’s Problem. It’s strange, because class loaders are actually pretty straightforward. Most Java developers know that you compile Java files to these *.class files and that those compiled classes have to be loaded by the JVM somehow. That’s basically what the class loader does. But like threads, the problem is not understanding what they do, but getting them to work together.
How many times have you heard the phrase “it’s a class loader issue?” I’ve certainly heard (and said) it more times than I’d care to admit. As soon as you have more than one class loader in an application, you have to start worrying about which classes can “see” which others. It can easily become a nightmare. But class loader behaviour is perhaps a post for another time. Let’s get back to new.
So, the first time that you create a new object, the JVM has to first load the class. This happens transparently when you use new. The question is, what class loader is used? And why does it matter?
Consider a scenario from Grails. We have a build system based on Gant that loads build scripts and executes them. In one of them, we instantiate a Jetty server and start it. The sequence of object creation goes like this:
In fact, the above is a simplification of what actually happens, but it suits the purpose of this post.
The JARs for the first three classes are all on the classpath of what we will call the build class loader. This loads all the classes used directly by the build. So what about Jetty’s Server class? The most important thing to understand is that the Server class must be loaded by the same class loader that loads the Grails web application. Although you can pass your own class loader to the embedded server, if it’s different to the one that loads Server you’ll run into those dreaded class loader issues.
Bearing that in mind, let’s look at what happens if the RunApp script uses new to create the server instance:
def server = new org.mortbay.jetty.Server() ... server.start()
Right about now, you should be asking yourself “what class loader was used to load the Server class?” It’s a critical question because it determines what class loader is used to load the entire web application and hence what classpath the application’s runtime dependencies should be on. In this case, the class loader used is whichever one loaded the RunApp script. The new operator effectively delegates to this.getClass().getClassLoader().
What does that mean for our example? It means that the build class loader is used to load the Server class and therefore must also be used to load the web application classes. In other words, all the application’s runtime dependencies must be included in the build class loader! What’s the problem with that, you may ask. There is one potential problem and one actual.
The potential problem is class conflicts. What if the web application depends on a different version of a library that’s already on the build system? It’s a particular problem if any of the Apache XML API libraries are on the classpath. These cause absolute havoc.
The other problem is that the more JARs you have on the classpath, the longer it takes for the JVM to find the class it’s after. That means longer start up times. It’s one of the problems OSGi was designed to solve (he was told by a man in the pub). Why put JARs on the build classpath that the build itself doesn’t need?
The solution is to work out where you want a class loader boundary and use reflection to instantiate your object:
def runtimeClassLoader = new URLClassLoader(...) def server = runtimeClassLoader.loadClass("org.mortbay.jetty.Server").newInstance() ... server.start()
This is pretty easy in Groovy because the start() method is evaluated at runtime, but Java needs to know the type at compile-time. You can’t do this:
ClassLoader runtimeClassLoader = new URLClassLoader(...) Server server = (Server) runtimeClassLoader.loadClass("org.mortbay.jetty.Server").newInstance() ... server.start()
because you’ll get a ClassCastException on line 2. The declared type of server is loaded by this.getClass().getClassLoader(), whereas the new instance is loaded in a different class loader. Different class loader means different classes. So you have to use reflection to invoke the methods and access the fields you need. Fortunately, you only have to jump through these hoops at class loader boundaries.
As you’ve seen, the new operator is normally something you don’t have to think about, but as soon as you start dealing with multiple class loaders, you have to be aware of and understand its behaviour. The trick is to work out suitable class loader boundaries and then use reflection to load and instantiate classes at those boundaries. It may sound like unnecessary extra work, but you can gain real improvements in application/framework reliability. If you’re lucky, things may even run a bit faster 🙂