Monday, November 3, 2008

Spring Batch "Hello World" 3

In part 1 and part 2 of this tutorial we saw the basic concepts of Spring Batch and implemented a simple tasklet based job . The focus of this last part is on item-oriented batch processing. This approach consists of reading bulk data, performing some calculation and outputting the result, one item at a time. As an example think of processing a large flat file composed of records and writing the output to a database. Each line in the file corresponds to an item: it will be read, transformed and output separately.

Spring Batch provides a solid foundation to implement item-oriented jobs and includes a collection of ready-to-use objects for common scenarios. Let's try to sum up the main objects that are involved here. First we have ItemReader, which is responsible of reading input data. Its main method is Object read() which returns the next item each time it is invoked, and null when the data is exhausted. This returned item is usually expected to map to a domain object.
The counterpart of ItemReader is ItemWriter. As you can expect, its main role is to process and output items. Quite logically, its main method is void write(Object item).

Here is where ItemOrientedStep comes into play: it associates an ItemReader and an ItemWriter and orchestrates the data flow between them: it calls repeatedly its ItemReader and hands the returned item to its ItemWriter until no more items are left. These read/wrtie couples are grouped into chunks, meaning that a transaction is started at the beginning of a chunk and committed at its end. The size of a chunk is configurable. Note also that ItemOrientedSteps support advanced behaviour such as restarting and fault tolerance.
Let's put all of this into practice with yet another marvelous Hello World example.

Hello World, Again

We will define a job that reads from a file formed of comma-separated records. Our job will replace the commas with spaces and write the output to a new file.
Our file looks like this:
Hello,World,!
Hello,World,!
And the expected output is as follows:
Hello World !
Hello World !
Impressed? At least we are not going to write a single line of Java. We will be configuring objects provided by the framework instead!
We will use FlatFileItemReader and FlatFileItemWriter to implement our job, but first a word about FieldSets. FlatFileItemReader and FlatFileItemWriter read and write from text files but in-between we will be handling java domain objects with typed attributes. We need an object that abstracts this transformation; something similar to ResultSets in JDBC. This is exactly the role of FieldSet. To put it in other words, FlatFileItemReader transforms (indirectly) a line into a FieldSet and then to a domain object whereas FlatFileItemWriter transforms back (indirectly again) a domain object to a FieldSet and then to a line (String) again.
Let's get back to our FlatFileItemReader now. Here's the bean definition:
<bean id="itemReader" class="org.springframework.batch.item.file.FlatFileItemReader">
    <property name="resource" value="file:./hello.txt" />
    <property name="lineTokenizer" ref="lineTokenizer"/>
    <property name="fieldSetMapper" ref="fieldSetMapper"/>
</bean>
The resource parameter is obviously the file to read. You notice that we wired 2 collaborators. LineTokenizer is responsible for transforming a line into a FieldSet. As for FieldSetMapper, it maps the obtained FieldSet to a domain object. Their bean definitions are the following:
<bean id="lineTokenizer" class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer"/>
The DelimitedLineTokenizer delimiter defaults to comma, which is just what we need. Let's define our FieldSetMapper:
<bean id="fieldSetMapper" class="org.springframework.batch.item.file.mapping.PassThroughFieldSetMapper"/>
PassThroughFieldSetMapper passes the FieldSet directly without mapping it to an object. In our case this is fine since we don't really have domain objects and the transformation we are implementing is simple.
Let's define the FlatFileItemWriter now:
<bean id="itemWriter" class="org.springframework.batch.item.file.FlatFileItemWriter">
    <property name="fieldSetCreator" ref="fieldSetMapper"/>
    <property name="lineAggregator" ref="lineAggregator"/>
    <property name="resource" value="file:./hello2.txt" />
</bean>
You might have noticed how similar it is to the FlatFileItemReader's definition. The resource property is the output file and again we have 2 collaborators. First, a FieldSetCreator is needed to transform the domain object to a FieldSet but because we kept the FieldSet as our object in the reading phase, we need a FieldSetCreator the does nothing. It happens that PassThroughFieldSetMapper implements FieldSetCreator and does what we want so we will just inject it. Finally, we will dfine the LineAggregator:
<bean id="lineAggregator" class="org.springframework.batch.item.file.transform.DelimitedLineAggregator">
    <property name="delimiter" value=" "/>
</bean>
LineAggregator is the counterpart of LineTokenizer; its role is to transform a FieldSet to a String. We are replacing commas with spaces hence the value of the delimiter property.
Now we need to define our ItemOrientedStep. The simplest way is to use SimpleStepFactoryBean, which instantiate an ItemOrientedStep with sensible defaults. Notice that a transaction manager is also needed (for chunk management and job repository operations). In this example we will use a ResourcelessTransactionManager which does nothing. Needles to say, this is completely inappropriate in most real scenarios.

<bean id="transactionManager" class="org.springframework.batch.support.transaction.ResourcelessTransactionManager"/>

<bean id="step" class="org.springframework.batch.core.step.item.SimpleStepFactoryBean">
    <property name="transactionManager" ref="transactionManager" />
    <property name="jobRepository" ref="jobRepository" />
    <property name="itemReader" ref="itemReader" />
    <property name="itemWriter" ref="itemWriter" />
</bean>
We have our step ready. We just need to define the job:
<bean id="simpleJob" class="org.springframework.batch.core.job.SimpleJob">
    <property name="name" value="simpleJob" />
    <property name="steps">
        <list>
            <ref local="step"/>
        </list>
    </property>
    <property name="jobRepository" ref="jobRepository"/>
</bean>
We're done!

Running the Code

We will run the job as usual with Maven:
mvn exec:java -Dexec.mainClass=org.springframework.batch.core.launch.support.CommandLineJobRunner -Dexec.args="itemOrientedJob.xml simpleJob"
The code source can be downloaded here.

What's Next?

What's next is up to you! Don't hesitate to run the Spring Batch samples, to learn about the advanced features or simply to experiment. Good luck!