Adding Spring Security users in bulk (in Grails)

Earlier this week I was at a client trying to help them diagnose some issues with a CSV import process. I was of course aware of Ted Naleid’s seminal blog post on bulk updates in Grails, and the issues mentioned there seemed the most likely culprits. Unfortunately, it didn’t turn out nearly as straightforward as I’d hoped.

We started with VisualVM, looking for any obvious problems with the memory usage. Nothing showed up, and in fact the import wasn’t creating a lot of records anyway. We progressed on to JProfiler and P6Spy, hoping to see some hotspots in the code or particularly slow queries. We did identify a couple of places that seemed to be taking the majority of the time, but it still wasn’t clear to me whether the issue was in code, Grails, Hibernate, or the database.

That day we implemented a workaround that shifted some work into a background thread using the Platform Core event bus. This was a reasonable thing to do anyway, considering the requirements of the business logic. Yet I was left still wondering why certain parts of the import process were fundamentally slow.

It bugged me enough that I decided to investigate one of the major culprits of the slow import: the Spring Security Core plugin’s UserRole.create() method. Perhaps I could reproduce the problem in a small project without the complexity of the client project. It seemed simple enough to be worth a try. And so I created a new Grails 2.3.7 project with Spring Security Core installed and the following controller action:

@Transactional
def createUsers()
    def startTime = System.currentTimeMillis()
    
    def roles = (0..<60).collect {
        new Role(authority: RandomStringUtils.randomAlphabetic(12)).save()
    }
    
    def users = (0..<100).collect {
        new User(
                username: RandomStringUtils.randomAlphabetic(12),
                password: RandomStringUtils.randomAscii(20)).save()
    }
    
    for (r in roles) {
        for (u in users) {
            UserRole.create u, r, false
        }
    }

    println "Total: ${(System.currentTimeMillis() - startTime) / 1000} s"

    redirect uri: "/"
}

To my relief, this took more than 30 seconds to complete on the first run. That seemed a lot slower than it should considering it’s only creating a total of 760 records. There was obviously some underlying issue here that I wasn’t seeing. I tried to clear and flush the session every 20 iterations, but that didn’t have a significant impact.

My next step was to simply create 760 Role records and then, independently, 760 User records. Both of these only took a few seconds. So what was special about UserRole? Why did its creation seem to be so expensive? I wanted to eliminate the database as a problem, so I tried using Groovy SQL (basically native JDBC) for the UserRole persistence. The total time dropped to a few seconds. So not the database then.

A Google search brought up another blog post on inserting data via Grails, by Marc Silverboard. In addition to using native JDBC, he suggests using a Hibernate stateless session. This sounded like an interesting possibility, so I shoe-horned it into my test action:

def createUsers()
    ...
    def session = sessionFactory.openStatelessSession()
    def tx = session.beginTransaction()
    def counter = 0
    for (r in roles) {
        for (u in users) {
            session.insert(new UserRole(user: u, role: r))
            counter++
            if (counter % 20 == 0) {
                session.flush()
            }
        }
    }
    tx.commit()
    session.close()
    ...
}

It’s certainly uglier code, partly as I decided to do batch flushing every 20 rows (I also configured Hibernate’s JDBC batch size to 20). The results were worth it: the total import time came down to just 1 second! Obviously the issue was Hibernate’s caching in the session. Conundrum solved. I was still left wondering why the caching was such an issue only for UserRole, but that was a question for another time.

It would have been easy to stop at this point and bask in the glow of a job well done. Unfortunately, that’s not really me. With my engineering background, I did wonder whether the new code was bypassing more than just Hibernate’s caching. And then I remembered validation. Could validation be the real issue? In order to isolate that particular feature, I reverted all the code back to its original state and then modified the UserRole.create() method to use the validate: false option. I restarted the server and then clicked on the link that triggered the user creation. 10 seconds! I did it again. 6 seconds. After a few more times it settled down at just under 4 seconds. Wow.

Why is validation such an issue on UserRole? I have no idea. I did give deepValidate: false a go, but it didn’t show nearly as big an improvement as switching off validation completely. Maybe one of my readers understand what’s going on and can provide us with the answer. Or perhaps not knowing will bug me enough to get me looking deeper. But for now, I just want to summarise my findings:

  • Grails and Hibernate have a lot of moving parts – it can be hard to diagnose issues
  • You really do need to invest some quality time and rigour for any diagnosis phase
  • Hibernate stateless sessions are work investigating for any bulk inserts
  • Validation could be a significant hidden problem – try disabling it

I think in this case, the drop from 4s to 1s may make it worthwhile using a stateless session. But in either case, be sure you can do without the validation! And I hope this helps you with your own GORM bulk insertions, either with the diagnosis or the solution.

Going solo

As the dust settles from the launch of the second edition of Grails in Action, it’s now time to shift focus to other things. I genuinely hope that the book proves a valuable companion to all you Grails developers out there, but it’s been a draining experience. I’ve provided a handy link to the book in the sidebar.

The first step has already been taken: I have my website up and running! That formally announces my availability for consulting and training in both Groovy and Grails, technologies I’ve been involved with for about 8 years now. I plan to extend my offerings to Gradle as well, a tool I believe will become dominant in the area of building software as it has the power, flexibility and accessibility to deal with the all the different requirements of the many builds out there.

A longer term goal of mine is to produce online learning material, both free and paid, to help users of all technologies I favour. This work will appear throughout the year and I’ll announce it through Twitter, Google+ and of course this blog.

Don’t worry though, I’ll still be contributing to open source and keeping the Groovy Podcast trucking along!

Contributing to the Groovy documentation

I like contributing to open source projects. I also love using Groovy for programming. Unfortunately, contributing to programming languages scares me because of all the grammar and parser stuff. I’m sure I could get into the internals with time, but I feel that time is better spent elsewhere. Now, one of those places is the Groovy user guide.

Groovy has been without a proper user guide for a long time now. Yes, there are various pages on the wiki with useful information, but it’s mostly unstructured. So the announcement of a full-blown user guide with language specification filled me with anticipation. And recently, the penny finally dropped and I realised this is something that I can contribute to. I know how to write Groovy, so all that’s required is a little bit of writing.

Continue reading

Shared Grails JARs for Tomcat deployment

While I was at GR8Conf US, one of the attendees asked me how to deploy two Grails WAR files to Tomcat without running into the dreaded “out of permgen space” error. This problem stems from Grails apps loading a lot of classes, and each webapp gets its own copy of those classes. So that’s pretty much double the permgen usage when you deploy two Grails WARs to a single Tomcat instance.

The common solution to this problem is to put the library JARs common to all Grails applications into Tomcat’s shared lib directory. Then there will only be one copy of the corresponding classes loaded in the VM regardless of how many webapps are deployed. It’s a pretty neat solution considering how many common JARs there are between Grails apps, but Grails throws in an additional challenge in that some per-application state is actually per-VM state. So deploying more than one Grails WAR into a Tomcat with shared Grails JARs can cause issues.

A quick web search brings up this question on StackOverflow with a corresponding list of the JARs that can be shared and those that can’t. Certainly for Grails 2.0+, it seems that only the grails-* JARs are unsafe, so I came up with a short events script that splits the JARs, putting the Grails ones in the WAR file and the rest in a sharedLibs directory:

eventCreateWarStart = { warName, stagingDir ->
    if (grailsEnv == "production") {
        def sharedLibsDir = "${grailsSettings.projectWorkDir}/sharedLibs"

        ant.mkdir dir: sharedLibsDir
        ant.move todir: sharedLibsDir, {
            fileset dir: "${stagingDir}/WEB-INF/lib", {
                include name: "*.jar"
                exclude name: "grails-*"
            }
        }

        println "Shared JARs put into ${sharedLibsDir}"
    }
}

Note that this fragment goes into the scripts/_Events.groovy file in the Grails project. I hope it helps folks!

“It’s more of an art than a science”

I’ll be honest, this phrase bothers me. Perhaps it’s because I’m a scientist by training. Perhaps it’s because this seems to be a misuse of the work ‘art’ or a misinterpretation on my part. But whenever I hear it used with reference to software development, I hear: “we use heuristics and guesswork because we don’t have time to do research and there is no body of research from which to draw”. Does that really make the solution to an underlying question or problem an ‘art’ rather than a science?

I of course tried googling the phrase to determine what it’s supposed to mean, but didn’t get very far. The top result from my search was:

It means it is not something which is governed by clearly-defined rules, as a science would be. In science things are either right or wrong; in psychology (or any art) it’s not possible to say what is ‘right’ or ‘wrong’.

This particular answer seems to misunderstand what science is. In essence, it’s a way to understand the way the world works through experimentation and verification. It’s also typically methodical because reproducibility of research results is important.

Perhaps the reference to rules is based on experience with things like mechanics in Physics and Newton’s laws of motion. We can predict the trajectory of projectiles in the air for example. But this is a very limited view of science. I have been watching the programme Horizon on the BBC recently and learned about the science of taste and even creativity. Yes, we’re learning through science about how creativity works!

At the end of the 19th Century and beginning of the 20th Century, we thought that we would soon learn everything there was to learn about the world. Over a century later, there still seems to be no end to the growth of our scientific knowledge. And things that used to be firmly considered “arts” are much less so now.

Consider cooking: more and more chefs are learning the basic science of both taste and cooking. From that base of understanding they can be even more creative in what they do. It allows people like Heston Blumenthal to create bacon and egg ice cream or snail porridge. If you’re interested, McGee on food and cooking is an essential read on the underlying science.

This also highlights an important point: creativity and science are in no way mutually exclusive. In fact, each enables the other. As I mentioned, a scientific base allows for more creativity because of the deeper understanding of how things work, but creativity is also essential in providing insights into how things work.

Coming back to the original point of this post, my ire was recently raised by a discussion on Hacker News where someone wrote

I’m not sure that there is a sure-fire way to quantify what tests are or are not necessary. In my opinion, this is something that comes with experience and is more of an art than a science. But I’m okay with that.

This seems innocuous enough and I wouldn’t be surprised if many people agree with it. But do we really think that it’s not possible to learn through research what a good level of tests is? Software is typically structured and full of patterns, so the pool of possible structures to investigate is limited. In addition, we already have tools to detect cyclomatic complexity and other metrics of software, so would it be so hard to determine which parts of the software are involved with the “critical” features?

I think what bothers me the most is that despite the huge revenue of the industry as a whole, and how much money depends on the successful completion of IT projects, so little research seems to be done to help improve the software development process. Perhaps the research is being done but it’s not widely disseminated. But I would at least have expected to come across research to back up the claims of agile practitioners (as one example). Not that I necessarily disagree with what they say, but it seems that going agile requires more faith than should be necessary.

Does the software development industry and community require a more scientific mindset? What do you think?