Android Dev & Splinters: maggio 2012

venerdì 18 maggio 2012

WebHarvest: Easy Java Web Scraping

Process of extracting data from Web pages is also referred as
Web Scraping 
    or Web Data Mining.

World Wide Web, as the largest database, often contains various data that we would 
    like to consume for our needs.

The problem is that this data is in most cases mixed together with 
    formatting code - that way making human-friendly, but not machine-friendly content.

Doing manual copy-paste is error prone, tedious and sometimes even impossible.

Web software designers usually discuss how to make clean separation between content and style, 
    using various frameworks and design patterns in order to achieve that.

Anyway, some kind of merge 
    occurs usually at the server side, so that the bunch of HTML is delivered to the web client.

Every Web site and every Web page is composed using some logic.

It is therefore needed 
    to describe reverse process - how to fetch desired data from the mixed content.

Every extraction procedure in Web-Harvest is user-defined through XML-based 
    configuration files.

Each configuration file describes sequence of 
    processors executing some common task in order to accomplish the final goal.

Processors 
    execute in the form of pipeline.
Thus, the output of one processor execution is input 
    to another one.

This can be best explained using the simple configuration fragment:

<xpath expression="//a[@shape='rect']/@href">
    <html-to-xml>
        <http url="http://www.somesite.com/"/>
    </html-to-xml>
</xpath>
 
When Web-Harvest executes this part of configuration, the following steps occur:
    
http processor downloads content from the specified URL.
html-to-xml processor cleans up that HTML producing XHTML content.

            xpath processor searches specific links in XHTML from previous step giving
            URL sequence as a result.  
        
Web-Harvest supports a set of useful processors for variable manipulation,
    conditional branching, looping, functions, file operations, HTML and XML processing,
    exception handling. See User manual for technical description of 
    provided processors.

Note inspired from the good article :

http://masochismtango.com/2010/02/15/webharvest-web-scraping-from-java/

On web harvesting activity and the possible java direct interaction.

I often have a need to quickly scrape some data out of a web page (or
 list of web pages), which can then be fed into Excel and on to 
specialist data visualisation tools.

To this end I have turned to WebHarvest,
 an excellent scriptable open source API for web scraping in Java. 

I 
really really like it, but there are some quirks and setup issues that 
have cost me hours so I thought I’d roll together a tutorial with the 
fixes.

WebHarvest Config for Maven

When it works Maven is a 
lovely tool to hide dependency management for Java projects, but 
WebHarvest is not configured quite right out of the box to work 
transparently with it. 

(Describing Maven is beyond the scope of this 
post, but if you don’t know it, it’s easy to setup with the M2 plugin for Eclipse.)

This is the Maven POM I ended up with to use WebHarvest in a new JavaSE project:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
 

 <modelVersion>4.0.0</modelVersion>
 <groupId>WebScraping</groupId>
 <artifactId>WebScraping</artifactId>
 <packaging>jar</packaging>
 <version>0.00.01</version>
 <properties>
   <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
 </properties>

 <build>
   <plugins>
     <plugin>
       <artifactId>maven-compiler-plugin</artifactId>
       <configuration>
       <source>1.6</source>
       <target>1.6</target>
       </configuration>
     </plugin>
   </plugins>
 </build>

 <repositories>
   <repository>
     <id>wso2</id>
     <url>http://dist.wso2.org/maven2/</url>
   </repository>
   <repository>
     <id>maven-repository-1</id>
     <url>http://repo1.maven.org/maven2/</url>
   </repository>
 </repositories>
 

<dependencies>
  <dependency>
     <groupId>commons-logging</groupId>
     <artifactId>commons-logging</artifactId>
     <version>1.1</version>
     <type>jar</type>
     <scope>compile</scope>
  </dependency>
  <dependency>
    <groupId>log4j</groupId>
    <artifactId>log4j</artifactId>
    <version>1.2.12</version>
    <type>jar</type>
    <scope>compile</scope>
  </dependency>
  <dependency>
    <groupId>org.webharvest.wso2</groupId>
    <artifactId>webharvest-core</artifactId>
    <version>1.0.0.wso2v1</version>
    <type>jar</type>
    <scope>compile</scope>
  </dependency>
 

<!-- web harvest pom doesn't track dependencies well -->
  <dependency>
    <groupId>net.sf.saxon</groupId>
    <artifactId>saxon-xom</artifactId>
    <version>8.7</version>
  </dependency>
  <dependency>
    <groupId>org.htmlcleaner</groupId>
    <artifactId>htmlcleaner</artifactId>
    <version>1.55</version>
  </dependency>
  <dependency>
    <groupId>bsh</groupId>
    <artifactId>bsh</artifactId>
    <version>1.3.0</version>
  </dependency>
  <dependency>
    <groupId>commons-httpclient</groupId>
    <artifactId>commons-httpclient</artifactId>
    <version>3.1</version>
  </dependency>
 </dependencies>
</project> 

You’ll note that the WebHarvest dependencies had to be added 
explicitly, because the jar does not come with a working pom listing 
them.

Writing A Scraping Script

WebHarvest uses XML configuration files to describe how to scrape a 
site – and with a few lines of Java code you can run any XML 
configuration and have access to any properties that the script 
identified from the page. 

This is definitely the safest way to scrape 
data, as it decouples the code from the web page markup – so if the site
 you are scraping goes through a redesign, you can quickly adjust the 
config files without recompiling the code they pass data to.

The site  some good example scripts
 to show you how to get started, so I won’t repeat them here. 

The 
easiest way to create your own is to run the WebHarvest GUI from the 
command line,(ie: java -jar webharvest_all_2.jar) start with a sample script, and then hack it around to get
 what you want – it’s an easy iterative process with good feedback in 
the UI.



FOR INSTANCE: let's have a look at the second example to start working on the Canon products at Yahoo Shopping sample.

As we can see we want to harvest around the following url:

http://shopping.yahoo.com/s:Digital%20Cameras:4168-Brand=Canon:browsename=Canon%20Digital%20Cameras:refspaceid=96303108;_ylt=AnHw0Qy0K6smBU.hHvYhlUO8cDMB;_ylu=X3oDMTBrcDE0a28wBF9zAzk2MzAzMTA4BHNlYwNibmF2



the simple script is :



<config charset="ISO-8859-1">
    
    <include path="functions.xml"/>
                
    <!-- collects all tables for individual products -->
    <var-def name="products">    
        <call name="download-multipage-list">
            <call-param name="pageUrl">http://shopping.yahoo.com/s:Digital%20Cameras:4168-Brand=Canon:browsename=Canon%20Digital%20Cameras:refspaceid=96303108;_ylt=AnHw0Qy0K6smBU.hHvYhlUO8cDMB;_ylu=X3oDMTBrcDE0a28wBF9zAzk2MzAzMTA4BHNlYwNibmF2</call-param>
            <call-param name="nextXPath">//a[starts-with(., 'Next')]/@href</call-param>
            <call-param name="itemXPath">//li[@class="hproduct" or @class="hproduct first" or @class="hproduct last"]</call-param>
            <call-param name="maxloops">10</call-param>
        </call>
    </var-def>
    
    <!-- iterates over all collected products and extract desired data -->
    <file action="write" path="D:/tmp/canon/catalog.xml" charset="UTF-8">
        <![CDATA[ <catalog> ]]>
        <loop item="item" index="i">
            <list><var name="products"/></list>
            <body>
                <xquery>
                    <xq-param name="item" type="node()"><var name="item"/></xq-param>
                    <xq-expression><![CDATA[
                            declare variable $item as node() external;

                            let $name := data($item//*[@class='title'])
                            let $desc := data($item//*[@class='desc'])
                            let $price := data($item//*[@class='price'])
                                return
                                    <product>
                                        <name>{normalize-space($name)}</name>
                                        <desc>{normalize-space($desc)}</desc>
                                        <price>{normalize-space($price)}</price>
                                    </product>
                    ]]></xq-expression>
                </xquery>
            </body>
        </loop>
        <![CDATA[ </catalog> ]]>
    </file>

</config> 







As a simple example, this is a script to go to the Sony-Ericsson developer site’s handset gallery at http://developer.sonyericsson.com/device/searchDevice.do?restart=true, and rip each handset’s individual spec page URI:

<?xml version="1.0" encoding="UTF-8"?>
<config>
 

    <!-- indicates we want a loop, through the list defined in <list>, doing <body> for each item where the variables uri and i are defined as the index and value of the relevant item -->

    <loop item="uid" index="i">
        <!-- the list section defines what we will loop over - here, it pulls out the value attribute of all option tags -->
        <list>
            <xpath expression="//option/@value">
                <html-to-xml>
                    <http url="http://developer.sonyericsson.com/device/searchDevice.do?restart=true"/>
                </html-to-xml>
            </xpath>
        </list>
        <!-- the body section lists instructions which are run for every iteration of the loop -->
        <body>
            <!-- we define a new variable for every iteration, using the iteration count as a suffix  -->
            <var-def name="uri.${i}">
                <!-- template tag is important, else the $ var syntax will be ignored and won't do any value substitutions -->
                <template>device/loadDevice.do?id=${uid}</template>
            </var-def>
        </body>
    </loop>
</config> 

mercoledì 16 maggio 2012

Core Spring and IOC concept in a nut. 1

Core Spring API Components

Spring is a container and component model. Everything else, including AOP, transactions, database access, web applications, and the like is built on top of this container and component model. Objects managed in the container do not have to know about Spring or the container because of Inversion of Control (IoC).

This pattern specifies the involvement of the Spring container (which manages lifecycle), your object, and any other dependant objects – known as beans in Spring parlance.

The container is able to inject any number or type of dependant beans together while specifying the relationship throughconfiguration.

Dependency injection is enabled by creating properties and matching setter methods of your target object for the types of objects that you expect to inject. Alternatively, objects may be injected
during instantiation by providing a constructor with a signature that matches types you expect to inject.

The core of Spring framework’s functionality lies within this IoC container, which is discussed next.

The Inversion of Control Container

The Inversion of Control (IoC) container provides the dependency injection support to your applications that enables you to configure and integrate application and infrastructure components together. Through IoC, your applications may achieve a low-level of coupling, because all of the bean
configuration can be specified in terms of IoC idioms (such as property collaborators and constructors). Meanwhile, most if not all of your application’s bean lifecycle (construction to destruction) may be managed from within the container.

This enables you to declare scope – how and when a new object
instance gets created, and when it gets destroyed.

For example, the container may be instructed that a
specific bean instance be created only once per thread, or that, upon destruction, a database bean may disconnect from any active connections.
Through requests to the Spring IoC container, a new bean may
either get constructed or a singleton bean may get passed back to the requesting bean. Either way, this is a transparent event that is configured along with the bean declaration.

Spring Container Metadata Configuration

Spring provides several implementations of the ApplicationContext interface out of the box. In standalone applications using XML metadata (still commonplace today), it is common to create an

instance of 

org.springframework.context.support.ClassPathXmlApplicationContext or 

org.springframework.context.support.FileSystemXmlApplicationContext.

Configuration metadata consisting of bean definitions represented with XML or Java configuration is preferred for third-party APIs for which you do not have access to source code. 

In most other cases, configuration metadata – in addition to, or in place of XML – may be applied through Java annotations. 

ioc_basics.xml – 

A Basic Spring Application Configuration File

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-3.0.xsd">
 

<bean class="com.iocbasics.BasicPOJO"/>
 

</beans>

This is a simple XML document that makes use of the Spring beans namespace – the basis for resolving your POJO within the Spring container. 

Instantiating the IoC Container

To instantiate the IoC container and resolve a functional ApplicationContext instance, the relevant ApplicationContext implementation must be invoked with a specified configuration.

With a standard Java application with XML metadata configuration, the ClassPathXmlApplicationContext will resolve any number of Spring XML configuration resources given a path relative to the base Java classpath. This constructor is overloaded with a variety of argument arrangements that provide ways to specify the resource locations.

Spring also provides a number of other applicationContext implementations. For example, FileSystemXmlApplicationContext is used to load XML configuration from the file system (outside of the classpath), and AnnotationConfigApplicationContext supports loading annotated Java classes.

In the next part of code, the context constructor is given a single resolvable path of the Spring configuration XML file. If the XML resource were located outside of classpath, the absolute file system path would be provided to the FileSystemApplicationContext factory class to obtain the ApplicationContext.

BasicIoCMain.java
– Single Line of Code to Startup you Application Context

ApplicationContext ctx = new ClassPathXmlApplicationContext("ioc_basics.xml");

The next step is to obtain your configured beans.

Simply calling ApplicationContext.getBean method (by passing it the class of the instance you want returned) will provide, by default, the singleton instance of the bean.

That is, every bean will be the same instance.
BasicIocMain.java - Obtaining a Bean Reference by Class.

BasicPOJO basicPojo = ctx.getBean(BasicPOJO.class);

Alternatively, Spring can infer which bean definition your looking for by passing it the ID (or qualifier) of the bean instance instead of the bean class. This helps when dealing with multiple bean definitions of the same type.

For instance, given a bean definition of the same BasicPojo class, The next sample illustrates the combined code effort in setting up and obtaining this bean resource.

BasicIocMain.java – Obtaining a Bean Reference by Name

BasicPOJO basicPojo = (BasicPOJO)ctx.getBean("basic-pojo");

Bean Instantiation from the IoC Container
 

A basic operation in a Spring context is instantiating beans. This is done in a variety of ways; however, we will focus on the most common use cases. 

In this example, the BasicPojo bean provides both a noarg and arguments-based constructor. In addition, we have a POJO property named color with type ColorEnum (see later). 

We will use both BasicPOJO and ColorEnum objects to illustrate how you can define and populate your beans within Spring XML configuration.

package com.sample.iocbasic;

public class BasicPOJO {

public String name;
public ColorEnum color;

public ColorRandomizer colorRandomizer;
// empty constructor
public BasicPOJO() {
}

public BasicPOJO(String name, ColorEnum color) {
this.name = name;
this.color = color;
}

public String getName() {
return name;
}

public void setName(String name) {
this.name = name;
}

public ColorEnum getColor() {
return color;
}

public void setColor(ColorEnum color) {
this.color = color;
}

public ColorRandomizer getColorRandomizer() {
return colorRandomizer;
}

public void setColorRandomizer(ColorRandomizer colorRandomizer) {
this.colorRandomizer = colorRandomizer;
}

}

package com.sample.iocbasic;

public enum ColorEnum {
violet, blue, red, green, purple, orange, yellow
}

Constructor Injection

Constructor arguments may be set through XML configuration. This will enable you to inject dependencies through the constructor arguments.

To do this, use the constructor-arg element within
the bean definition, as shown in next sample.

ioc_basics.xml – XML Bean Construction using Parameterized Arguments
…
<bean id="constructor-setup"
class="com.sample.iocbasic.BasicPOJO">
<constructor-arg name=”name” value="red"/>
<constructor-arg name=”color” value="violet" />
</bean>
…

Bean References and Setter Injection

Bean properties may be injected into your target beans through references to other beans within the scope of your application context.

This is known as bean collaboration.

This requires defining the additional bean that you wish to refer to, also known as the collaborator. Using the ref attribute in the
property tag enables us to tell Spring which bean we want to collaborate with, and thus have injected

ioc_basics.xml – XML Configuration for Bean Collaboration by Setting Injection

<bean id="no-args"
class="com.sample.iocbasic.BasicPOJO">
<property name="color" ref="defaultColor"/>
<property name="name" value="Mario"/>
</bean>

<bean id="defaultColor"
class="com.sample.iocbasic.ColorEnum"
factory-method="valueOf">
<constructor-arg value="blue"/>
</bean>

Static and Instance Factory Injection

Note the factory-method attribute on the defaultColor element.
In this case the static factory method instantiation mechanism was used.

Note also that the class attribute does not specify the type of object returned by a factory method, as it specifies only the type containing that factory method.

For this simple example, a string was fed to the enum static factory method valueOf, which is the common approach to resolving enum constants from strings.

When static factory methods are not practical, use instance factory methods instead.

These are methods that get invoked from existing beans on the container to provide new bean instances.
The class in the next sample demonstrates instance factory methods to provide random ColorEnum instances.

ColorRandomizer.java – Class Definition for the ColorRandomizer Factory Bean

package com.sample.iocbasic;
import java.util.Random;

public class ColorRandomizer {

ColorEnum colorException;

public ColorEnum randomColor() {
ColorEnum[] allColors = ColorEnum.values();
ColorEnum ret = null;
do {
   ret = allColors[new Random().nextInt(allColors.length - 1)];
}
while (colorException != null && colorException == ret);
return ret;
}

public ColorEnum exceptColor(ColorEnum ex) {
ColorEnum ret = null;
do {
ret = randomColor();
} while (ex != null && ex == ret);

return ret;
}

public void setColorException(ColorEnum colorExceptions) {
this.colorException = colorExceptions;
}

}

To invoke the factory within our Spring context, ColorRandomizer will be defined as a bean, then one of its methods will be invoked in another bean definition as a way to vend an instance of ColorEnum.

We obtain two separate instances of ColorEnum using both ColorRandomizer factory methods to illustrate variances in factory method invocations.

Obtaining Bean Instances from Factory Methods in Spring


<bean id="colorRandomizer" class="com.sample.iocbasic.ColorRandomizer" />


<bean id="randomColor" factory-bean="colorRandomizer" factory-method="randomColor"/>


<bean id="exclusiveColor" factory-bean="colorRandomizer" factory-method="exceptColor">
<constructor-arg ref="randomColor"/>
</bean>

Bean Scopes

Spring uses the notion of bean scopes to determine how beans defined in the IoC container get issued to the application upon request with getBean methods, or through bean references.

A bean’s scope is set with the scope attribute in the bean element, or by using the @Bean annotation in the class file.
Spring defaults to the singleton scope, where a single instance of the bean gets shared throughout the entire container. Spring provides a total of six bean scopes out of the box for use in specific context implementations, although only singleton, prototype, and thread are available through all context
implementation.

The other scopes – request, session, and globalSession – are available only to application contexts that are web-friendly, such as WebApplicationContext.

Bean scopes available in Spring:

Singleton Single bean instance per container, shared throughout the IoC container.

Prototype New bean instance created per request.

Request Web application contexts only: Creates a bean instance per HTTP request.

Session Web application contexts only: Creates a bean instance per HTTP session.

GlobalSession Web portlet only: Creates a bean instance per Global HTTP session.

Thread* Creates a bean instance per thread. Similar to request scope.

* Thread scope is not registered by default, and requires registration with the CustomScopeConfigurer bean.

To illustrate the behavior of prototype- and singleton-scoped beans, next sample declares two beans of the same type, which differ only in scope.

The value from the singleton-scoped bean should always
return the same value, whereas the prototype-scoped bean will always return different values (since the
factory returns random numbers).

This shows the Spring configuration file and the next one
shows the main class.

ioc_basics.xml – Overriding Default Scope for Beans with XML Metadata

<beans…>
<bean id="randomeverytime" factory-bean="colorRandomizer" factory- method="randomColor"
scope="prototype"/>

<bean id="alwaysthesame" factory-bean="colorRandomizer factory-method="randomColor"
scope="singleton"/>

</beans>

BasicIocMain.java –Simple For-Loop

public static void demonstrateScopes(ApplicationContext ctx) {
for (int i = 0; i < 5; i++) {
System.out.println("randomeverytime: " +
ctx.getBean("randomeverytime", ColorEnum.class));
System.out.println("alwaysthesame: " +
    ctx.getBean("alwaysthesame", ColorEnum.class));
}
}

The output of this loop will emit text similar to this:

Output of Bean Scope Induced Behavior
randomeverytime: green
alwaysthesame: orange
randomeverytime: purple
alwaysthesame: orange
randomeverytime: violet
alwaysthesame: orange
randomeverytime: violet
alwaysthesame: orange
randomeverytime: green
alwaysthesame: orange

You register the thread scope, or any other custom scope, in XML by defining a org.springframework.beans.factory.config.CustomScopeConfigurer bean.

Pass the scope implementation class to the map property scopes. The map property is evaluated with the key providing the scope name,
and value having the scope’s implementation class. Registering in this fashion is always compatible to @Bean annotated properties with the @Scope annotation. That is, a scope definition once enabled for any given scope is activated throughout the container and for all manners of configuration (see next).

Ioc_basics.xml – Registering Custom Scopes in XML

<beans…>
<bean class="org.springframework.beans.factory.config.CustomScopeConfigurer">
<property name="scopes">
<map>
   <entry key="thread">
    <bean class="org.springframework.context.support.SimpleThreadScope"/>
   </entry>
</map>
</property>
</bean>
<bean id="threadColor" factory-bean="colorRandomizer" factory-method="randomColor"
scope="thread"/>
</beans…>

Umlet

http://www.umlet.com/ 

Finally a cool customizable and free uml tools, that I really really appreciate for it's immediate and simple approach.

Good projecting to everyone !!

ETL Pattern

giovedì 10 maggio 2012

jQuery notes

Some notes finally helps me to properly integrate this powerful traversing/navigation/managing tool but I can't find specifically in any documentation I've read.

These lines were added from a java class that load a javascript on rendering components. This should be useful because it's not so obvious the sigle or double quote integration.

So let's start retrieving the main dom object to work on.

We know the id of the document:

    "        var strCH='\\'[id*=columnHeader]\\'';"+
 

// using the jQuery wrapper retrieve the children list (jquery function http://api.jquery.com/children/)

// then retrieve the first array dom object

    "        var tableCols=$(strCH).children().get(0);" +

// pay attention that this object is a standard dom object and not a jQuery wrapper instead. So every jquery api function give us an error.

// TO START AGAIN WITH A JQUERY WRAPPER LET'S CALL AGAIN $

    "        var trc = $(tableCols).find('tr');"+

    "        var tdsCol = $(trc).children();" +
 

// for each colum  

    "        tdsCol.each(function(j){" +
    "            if (this._logger==null){" +
    "                this._logger = new Log(Log.DEBUG, Log.popupLogger);" +
    "            }    " +

    "            var tdc = $(this).get(0);" +
 

    "            tdc.style.borderBottomWidth='0px';" +
    "            tdc.style.borderBottomStyle='';" +
    "            tdc.style.borderRightWidth='0px';" +
    "            tdc.style.borderRigthStyle='';" +
    "            tdc.style.borderTopWidth='0px';" +
    "            tdc.style.borderTopStyle='';" +
    
    
    "        });" +
 

giovedì 3 maggio 2012

jQuery Selectors

Use w3c excellent jQuery Selector Tester to experiment with the different selectors.

Selector	Example	Selects
*	$("*")	All elements
#id	$("#lastname")	The element with id=lastname
.class	$(".intro")	All elements with class="intro"
element	$("p")	All p elements
.class.class	$(".intro.demo")	All elements with the classes "intro" and "demo"

:first	$("p:first")	The first p element
:last	$("p:last")	The last p element
:even	$("tr:even")	All even tr elements
:odd	$("tr:odd")	All odd tr elements

:eq(index)	$("ul li:eq(3)")	The fourth element in a list (index starts at 0)
:gt(no)	$("ul li:gt(3)")	List elements with an index greater than 3
:lt(no)	$("ul li:lt(3)")	List elements with an index less than 3
:not(selector)	$("input:not(:empty)")	All input elements that are not empty

:header	$(":header")	All header elements h1, h2 ...
:animated	$(":animated")	All animated elements

:contains(text)	$(":contains('W3Schools')")	All elements which contains the text
:empty	$(":empty")	All elements with no child (elements) nodes
:hidden	$("p:hidden")	All hidden p elements
:visible	$("table:visible")	All visible tables

s1,s2,s3	$("th,td,.intro")	All elements with matching selectors

[attribute]	$("[href]")	All elements with a href attribute
[attribute=value]	$("[href='default.htm']")	All elements with a href attribute value equal to "default.htm"
[attribute!=value]	$("[href!='default.htm']")	All elements with a href attribute value not equal to "default.htm"
[attribute$=value]	$("[href$='.jpg']")	All elements with a href attribute value ending with ".jpg"
[attribute^=value]	$("[href^='jquery_']")	All elements with a href attribute value starting with "jquery_"

:input	$(":input")	All input elements
:text	$(":text")	All input elements with type="text"
:password	$(":password")	All input elements with type="password"
:radio	$(":radio")	All input elements with type="radio"
:checkbox	$(":checkbox")	All input elements with type="checkbox"
:submit	$(":submit")	All input elements with type="submit"
:reset	$(":reset")	All input elements with type="reset"
:button	$(":button")	All input elements with type="button"
:image	$(":image")	All input elements with type="image"
:file	$(":file")	All input elements with type="file"

:enabled	$(":enabled")	All enabled input elements
:disabled	$(":disabled")	All disabled input elements
:selected	$(":selected")	All selected input elements
:checked	$(":checked")	All checked input elements

mercoledì 2 maggio 2012

Spring Integration Tutorial - part 1

An extract from the original very interesting article:http://java.dzone.com/articles/spring-integration-hands

This tutorial is the first in a two-part series on Spring 
Integration. In this series we're going to build out a lead management 
system based on a message bus that we implement using Spring 
Integration. Our first tutorial will begin with a brief overview of 
Spring Integration and also just a bit about the lead management domain. 

After that we'll build our message bus. 

I've used Maven profiles to isolate the dependencies 
you’ll need if you’re running Java 5. 

The tutorials assume that you're 
comfortable with JEE, the core Spring framework and Maven 2. Also, 
Eclipse users may find the m2eclipse plug-in helpful.

To complete the tutorial you'll need an IMAP account, and you'll also need access to an SMTP server.

Let's begin with an overview of Spring Integration.

A bird's eye view of Spring Integration

Spring
 Integration is a framework for implementing a dynamically configurable 
service integration tier. 

The point of this tier is to orchestrate 
independent services into meaningful business solutions in a 
loosely-coupled fashion, which makes it easy to rearrange things in the 
face of changing business needs. 

The service integration tier sits just 
above the service tier as shown in figure 1.

Following the book 
Enterprise Integration Patterns by Gregor Hohpe and Bobby Woolf 
(Addison-Wesley), Spring Integration adopts the well-known pipes and 
filters architectural style as its approach to building the service 
integration layer. 

Abstractly, filters are information-processing units 
(any type of processing—doesn’t have to be information filtering per 
se), 

and pipes are the conduits between filters. 

In the context of 
integration, the network we’re building is a messaging infrastructure—a 
so-called message bus—and the pipes and filters and called message 
channels and message endpoints, respectively. 

The network carries 
messages from one endpoint to another via channels, and the message is 
validated, routed, split, aggregated, resequenced, reformatted, 
transformed and so forth as the different endpoints process it.

Figure 1. The service integration tier orchestrates the services below it.

That
 should give you enough technical context to work through the tutorial. 
Let’s talk about the problem domain for our sample integration, which is
 enrollment lead management in an online university setting.

Lead management overview

In
 many industries, such as the mortgage industry and for-profit 
education, one important component of customer relationship management 
(CRM) is managing sales leads. 

This is a fertile area for enterprise 
integration because there are typically multiple systems that need to 
play nicely together in order to pull the whole thing off. 

Examples 
include  

front-end marketing/lead generation websites,
external lead vendor systems,
intake channels for submitted leads,
lead databases,
e-mail systems (e.g., to accept leads, to send confirmation e-mails),
lead qualification systems,
sales systems and potentially others.

This
 tutorial and the next use Spring Integration to integrate several of 
systems of the kind just mentioned into an overall lead management 
capability for a hypothetical online university. Specifically we’ll 
integrate the following:

    •    a CRM system that allows campus
 and call center staff to create leads directly, as they might do for 
walk-in or phone-in leads
    •    a Request For Information (RFI) form on a lead generation ("lead gen") marketing website
    •    a legacy e-mail based RFI channel
    •    an external CRM that the international enrollment staff uses to process international leads
    •    confirmation e-mails

Figure
 2 shows what it will look like when we’re done with both tutorials. For
 now focus on the big picture rather than the details.

Figure 2. This is the lead management system we'll build.

For
 this first tutorial we're simply going to establish the base staff 
interface, the (dummy) backend service that saves leads to a database, 
and confirmation e-mails. 

The second tutorial will deal with lead 
routing, web-based RFIs and e-mail-based RFIs.

Let's dive in. We’ll begin with the basic lead creation page in the CRM and expand out from there.

Building the core components

[You can download the source code for this section of the tutorial here] 
We’re
 going to start by creating a lead creation HTML form for campus and 
call center staff. That way, if walk-in or phone-in leads express an 
interest, we can get them into the system. This is something that might 
appear as a part of a lead management module in a CRM system, as shown 
in figure 3.

Figure 3. We'll build our lead management module with integration in mind from the beginning.

Because
 we’re interested in the integration rather than the actual app 
features, we’re not really going to save the lead to the database. 
Instead we’ll just call a createLead() method against a local 
LeadService bean and leave it at that. But we will use Spring 
Integration to move the lead from the form to the service bean.
Our first stop will be the domain model.

Create the domain model

We’ll need a domain object for 
leads, so listing 1 shows the one we’ll use. It’s not an 
industrial-strength representation, but it will do for the purposes of 
the tutorial.

Listing 1. Lead.java, a basic domain object for leads.

package crm.model;

... other imports ...

public class Lead {
    private static DateFormat dateFormat = new SimpleDateFormat();
    
    private String firstName;
    private String middleInitial;
    private String lastName;
    private String address1;
    private String address2;

    ... other fields ...
    
    public Lead() { }
    
    public String getFirstName() { return firstName; }

    public void setFirstName(String firstName) {
        this.firstName = firstName;
    }

    ... other getters and setters, and a toString() method ...
}

There is nothing special happening here at all. 

So far the Lead class is
 just a bunch of getters and setters. You can see the full code listing 
in the download.
 

If you thought that was underwhelming, just wait until you see the LeadServiceImpl service bean in listing 2.

Listing 2. LeadServiceImpl.java, a dummy service bean. 

package crm.service;

import java.util.logging.Logger;
import org.springframework.stereotype.Service;
import crm.model.Lead;

@Service("leadService")
public class LeadServiceImpl implements LeadService {
    private static Logger log = Logger.getLogger("global");
    
    public void createLead(Lead lead) {
        log.info("Creating lead: " + lead);
    }
} 

This is just a dummy bean. In real life we’d save the lead to a 
database. The bean implements a basic LeadService interface that we've 
suppressed here, but it's available in the code download.
Now that we have our domain model, let’s use Spring Integration to create a service integration tier above it.

Create the service integration tier

If
 you look back at figure 3, you’ll see that the CRM app pushes lead data
 to the service bean by way of a channel called newLeadChannel. 

While 
it’s possible for the CRM app to push messages onto the channel 
directly, it’s generally more desirable to keep the systems you’re 
integrating decoupled from the underlying messaging infrastructure, such
 as channels. That allows you to configure service orchestrations 
dynamically instead of having to go into the code.

Spring 
Integration supports the Gateway pattern (described in the 
aforementioned Enterprise Integration Patterns book), which allows an 
application to push messages onto the message bus without knowing 
anything about the messaging infrastructure. Listing 3 shows how we do 
this.

Listing 3. LeadGateway.java, a gateway offering access to the messaging system.

package crm.integration.gateways;

import org.springframework.integration.annotation.Gateway;
import crm.model.Lead;

public interface LeadGateway {
    
    @Gateway(requestChannel = "newLeadChannel")
    void createLead(Lead lead);
}

We are of course using the Spring Integration @Gateway annotation to map
 the method call to the newLeadChannel, but gateway clients don’t know 
that. Spring Integration will use this interface to create a dynamic 
proxy that accepts a Lead instance, wraps it with an 
org.springframework.integration.core.Message, and then pushes the 
Message onto the newLeadChannel. 

The Lead instance is the Message body, 
or payload, and Spring Integration wraps the Lead because only Messages 
are allowed on the bus.

We need to wire up our message bus. Figure 4 shows how to do that with an application context configuration file.

Listing 4. /WEB-INF/applicationContext-integration.xml message bus definition.

<?xml version="1.0" encoding="UTF-8"?>
<beans:beans xmlns="http://www.springframework.org/schema/integration"
    xmlns:beans="http://www.springframework.org/schema/beans"
    xmlns:p="http://www.springframework.org/schema/p"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.springframework.org/schema/beans
        http://www.springframework.org/schema/beans/spring-beans-2.5.xsd
        http://www.springframework.org/schema/integration
        http://www.springframework.org/schema/integration/spring-integration-1.0.xsd">

    <gateway id="leadGateway"
        service-interface="crm.integration.gateways.LeadGateway" />
    
    <publish-subscribe-channel id="newLeadChannel" />
    
    <service-activator
        input-channel="newLeadChannel"
        ref="leadService"
        method="createLead" />

</beans:beans> 

The first thing to notice here is that we've made the Spring 
Integration namespace our default namespace instead of the standard 
beans namespace. 

The reason is that we're using this configuration file 
strictly for Spring Integration configuration, so we can save some 
keystrokes by selecting the appropriate namespace. 

This works pretty 
nicely for some of the other Spring projects as well, such as Spring 
Batch and Spring Security.

In this configuration we've created the 
three messaging components that we saw in figure 3. 

First, we have an 
incoming lead gateway to allow applications to push leads onto the bus. 

We simply reference the interface from listing 3; 

Spring Integration 
takes care of the dynamic proxy. Next we create a publish/subscribe 
("pub-sub") channel called newLeadChannel. 

This is the channel that the 
@Gateway annotation referenced in listing 3. 

A pub-sub channel can 
publish a message to multiple endpoints simultaneously. 

For now we have 
only one subscriber—a service activator—but we already know we're going 
to have others, so we may as well make this a pub-sub channel.

The
 service activator is an endpoint that allows us to bring our 
LeadServiceImpl service bean onto the bus. 

We're injecting the 
newLeadChannel into the input end of the service activator. 

When a 
message appears on the newLeadChannel, the service activator will pass 
its Lead payload to the leadService bean's createLead() method.
Stepping
 back, we've almost implemented the design described by figure 3. The 
only part that remains is the lead creation frontend, which we'll 
address right now.

Create the web tier

Our user interface for creating new leads
 will be a web-based form that we implement using Spring Web MVC. The 
idea is that enrollment staff at campuses or call centers might use such
 an interface to handle walk-in or phone-in traffic. Listing 5 shows our
 simple @Controller.

Listing 5. LeadController.java, a @Controller to allow staff to create leads

package crm.web;

import java.util.Date;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Controller;
import org.springframework.ui.Model;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestMethod;
import crm.integration.gateways.LeadGateway;
import crm.model.Country;
import crm.model.Lead;

@Controller
public class LeadController {
    
    @Autowired
    private LeadGateway leadGateway;
    
    @RequestMapping(value = "/lead/form.html", method = RequestMethod.GET)
    public void getForm(Model model) {
        model.addAttribute(Country.getCountries());
        model.addAttribute(new Lead());
    }
    
    @RequestMapping(value = "/lead/form.html", method = RequestMethod.POST)
    public String postForm(Lead lead) {
        lead.setDateCreated(new Date());
        leadGateway.createLead(lead);
        return "redirect:form.html?created=true";
    }
}

 

This isn't an industrial-strength controller as it doesn't do HTTP 
parameter whitelisting (for example, via an @InitBinder method) and form
 validation, both of which you would expect from a real implementation. 

But the main pieces from a Spring Integration perspective are here. 

We're autowiring the gateway into the @Controller, and we have methods 
for serving up the empty form and for processing the submitted form. 

The
 getForm() method references a Countries class that we've suppressed 
(it's in the code download); 

it just puts a list of countries on the 
model so the form can present a Country field to the staff member. 

The 
postForm() method invokes the createLead() method on the gateway. 

This 
will pass the Lead to the dynamic proxy LeadGateway implementation, 
which in turn will wrap the Lead with a Message and then place the 
Message on the newLeadChannel.

There are a few other configuration
 files you will need to put in place, including web.xml, 
main-servlet.xml and applicationContext.xml. 

There's also a JSP for the 
web form. As none of these relates directly to Spring Integration, we 
won't treat them here. Please see the code download for details.

With that, we've established a baseline system. 

To try it out, run
 

  mvn jetty:run

against crm/pom.xml and point your browser at
 

  http://localhost:8080/crm/main/lead/form.html

You
 should see a very basic-looking web form for entering lead information.
 Enter some user information (it doesn't matter what you enter—recall 
that we don't have any form validation) and press Submit. 

The console 
should report that LeadServiceImpl.createLead() created a lead. 
Congratulations!

Even though we now have a working system, it 
isn't very interesting. From here on out (this tutorial and the next) 
we'll be adding some common features to make the lead management system 
more capable. 

Our first addition will be confirmation e-mails; 

Adding confirmation e-mails

After
 an enrollment advisor (or some other staff member) creates a lead in 
the system, we want to send the lead an e-mail letting him know that 
that's happened. Actually—and this is a critical point—we really don't 
care how the lead was created. Anytime a lead appears on the 
newLeadChannel, we want to fire off a confirmation e-mail. I'm making 
the distinction because it points to an important aspect of the message 
bus: it allows us to control lead processing code centrally instead of 
having to chase it down in a bunch of different places. Right now 
there's only one way to create leads, but figure 2 revealed that we'll 
be adding others. No matter how many we add, they'll all result in 
sending a confirmation e-mail out to the lead.

Figure 4 shows the new bit of plumbing we're going to add to our message bus.

Figure 4. Send a confirmation e-mail when creating a lead.

To do this, we're going to need to make a few changes to the configuration and code.

POM changes

First we need to update the POM. Here's a summary of the changes; see the code download for details:
    •    Add a JavaMail dependency to the Jetty plug-in.
    •    Add an org.springframework.context.support dependency.
    •    Add a spring-integration-mail dependency.
    •    Set the mail.version property.

These changes will allow us to use JavaMail.

Expose JavaMail sessions through JNDI

We'll
 also need to add a /WEB-INF/jetty-env.xml configuration to make our 
JavaMail sessions available via JNDI. 

Once again, see the code download 
for details. 

it's included a /WEB-INF/jetty-env.xml.sample configuration
 for your convenience. 

As mentioned previously, you'll need access to an
 SMTP server.
 

Besides creating jetty-env.xml, we'll need to update 
applicationContext.xml. 

Listing 6 shows the changes we need so we can 
use JavaMail and SMTP.

Listing 6. /WEB-INF/applicationContext.xml changes supporting JavaMail and SMTP

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
    xmlns:context="http://www.springframework.org/schema/context"
    xmlns:jee="http://www.springframework.org/schema/jee"
    xmlns:p="http://www.springframework.org/schema/p"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.springframework.org/schema/beans
        http://www.springframework.org/schema/beans/spring-beans-2.5.xsd
        http://www.springframework.org/schema/context
        http://www.springframework.org/schema/context/spring-context-2.5.xsd
        http://www.springframework.org/schema/jee
        http://www.springframework.org/schema/jee/spring-jee-2.5.xsd">
    
    <jee:jndi-lookup id="mailSession"
        jndi-name="mail/Session" resource-ref="true" />
    
    <bean id="mailSender"
        class="org.springframework.mail.javamail.JavaMailSenderImpl"
        p:session-ref="mailSession" />
    
    <context:component-scan base-package="crm.service" />
</beans>

The changes expose JavaMail sessions as a JNDI resource. 

We've 
declared the jee namespace and its schema location, configured the JNDI 
lookup, and created a JavaMailSenderImpl bean that we'll use for sending
 mail.

We won't need any domain model changes to generate 
confirmation e-mails. We will however need to create a bean to back our 
new transformer endpoint.

Service integration tier changes

First,
 recall from figure 4 that the newLeadChannel feeds into a 
LeadToEmailTransformer endpoint. This endpoint takes a lead as an input 
and generates a confirmation e-mail as an output, and the e-mail gets 
pipes out to an SMTP transport. 

In general, transformers transform given
 inputs into desired outputs. No surprises there.

Figure 4 is 
slightly misleading since it's actually the POJO itself that we're going
 to call LeadToEmailTransformer; the endpoint is really just a bean 
adapter that the messaging infrastructure provides so we can place the 
POJO on the message bus. 

Listing 7 presents the LeadToEmailTransformer 
POJO.

package crm.integration.transformers;

import java.util.Date;
import java.util.logging.Logger;
import org.springframework.integration.annotation.Transformer;
import org.springframework.mail.MailMessage;
import org.springframework.mail.SimpleMailMessage;
import crm.model.Lead;

public class LeadToEmailTransformer {
    private static Logger log = Logger.getLogger("global");
    
    private String confFrom;
    private String confSubj;
    private String confText;
    
    ... getters and setters for the fields ...

    @Transformer
    public MailMessage transform(Lead lead) {
        log.info("Transforming lead to confirmation e-mail: " + lead);
        
        String leadFullName = lead.getFullName();
        String leadEmail = lead.getEmail();
        MailMessage msg = new SimpleMailMessage();
        
        msg.setTo(leadFullName == null ?
                leadEmail : leadFullName + " <" + leadEmail + ">");
        
        msg.setFrom(confFrom);
        msg.setSubject(confSubj);
        msg.setSentDate(new Date());
        msg.setText(confText);
        
        log.info("Transformed lead to confirmation e-mail: " + msg);
        return msg;
    }
}

 

Again, LeadToEmailTransformer is a POJO, so we use the 

@Transformer 
annotation to select the method that's performing the transformation. 

We
 use a Lead for the input and a MailMessage for the output, and perform a
 simple transformation in between.

When defining backing beans for
 the various Spring Integration filters, it's possible to specify a 
Message as an input or an output. 

That is, if we want to deal with the 
messages themselves rather than their payloads, we can do that. (Don't 
confuse the MailMessage in listing 7 with a Spring Integration message; 
MailMessage represents an e-mail message, not a message bus message.) We
 might do that in cases where we want to read or manipulate message 
headers. In this tutorial we don't need to do that, so our backing beans
 just deal with payloads.

Now we'll need to build out our message 
bus so that it looks like figure 4. We do this by updating 
applicationContext-integration.xml as shown in listing 8.

Listing 8. /WEB-INF/applicationContext-integration.xml updates to support confirmation e-mails

<?xml version="1.0" encoding="UTF-8"?>
<beans:beans xmlns="http://www.springframework.org/schema/integration"
    xmlns:mail="http://www.springframework.org/schema/integration/mail"
    xmlns:beans="http://www.springframework.org/schema/beans"
    xmlns:context="http://www.springframework.org/schema/context"
    xmlns:p="http://www.springframework.org/schema/p"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://www.springframework.org/schema/integration/mail
        http://www.springframework.org/schema/integration/mail/spring-integration-mail-1.0.xsd
        http://www.springframework.org/schema/beans
        http://www.springframework.org/schema/beans/spring-beans-2.5.xsd
        http://www.springframework.org/schema/context
        http://www.springframework.org/schema/context/spring-context-2.5.xsd
        http://www.springframework.org/schema/integration
        http://www.springframework.org/schema/integration/spring-integration-1.0.xsd">

    <context:property-placeholder
        location="classpath:applicationContext.properties" />
    
    <gateway id="leadGateway"
        service-interface="crm.integration.gateways.LeadGateway" />
    
    <publish-subscribe-channel id="newLeadChannel" />
    
    <service-activator
        input-channel="newLeadChannel"
        ref="leadService"
        method="createLead" />
    
    <transformer input-channel="newLeadChannel" output-channel="confEmailChannel">
        <beans:bean class="crm.integration.transformers.LeadToEmailTransformer">
            <beans:property name="confFrom" value="${conf.email.from}" />
            <beans:property name="confSubject" value="${conf.email.subject}" />
            <beans:property name="confText" value="${conf.email.text}" />
        </beans:bean>
    </transformer>
    
    <channel id="confEmailChannel" />
    
    <mail:outbound-channel-adapter
        channel="confEmailChannel"
        mail-sender="mailSender" />

</beans:beans>

 

The property-placeholder configuration loads the various ${...} 
properties from a properties file; see 
/crm/src/main/resources/applicationContext.properties in the code 
download. 

You don't have to change anything in the properties file.

The
 transformer configuration brings the LeadToEmailTransformer bean into 
the picture so it can transform Leads that appear on the newLeadChannel 
into MailMessages that it puts on the confEmailChannel. 

As a side note, 
the p namespace way of specifying bean properties doesn't seem to work 
here (I assume it's a bug: 
http://jira.springframework.org/browse/SPR-5990), so I just did it the 
more verbose way.

The channel definition defines a point-to-point 
channel rather than a pub-sub channel. That means that only one endpoint
 can pull messages from the channel.

Finally we have an 
outbound-channel-adapter that grabs MailMessages from the 
confEmailChannel and then sends them using the referenced mailSender, 
which we defined in listing 6.

That's it for this section. We should have working confirmation e-mails. Restart your Jetty instance and go again to
http://localhost:8080/crm/main/lead/form.html
 

Fill
 it out and provide your real e-mail address in the e-mail field. A few 
moments after submitting the form you should receive a confirmation 
e-mail. If you don't see it, you might check your SMTP configuration in 
jetty-env.xml, or else check your spam folder.

Summary

In this tutorial we've taken our first steps toward developing an integrated lead management system. Though the current bus configuration is simple, we've already seen some key Spring Integration features, including

   •   support for the Gateway pattern, allowing us to connect apps to the message bus without knowing about messages
   •   point-to-point and pub-sub channels
   •   service activators to allow us to place service beans on the bus
   •   message transformers
   •   outbound SMTP channel adapters to allow us to send e-mail
The second tutorial will continue elaborating what we've developed here, demonstrating the use of several additional Spring Integration features, including
   •   message routers (including content-based message routers)
   •   outbound web service gateways for sending SOAP messages
   •   inbound HTTP adapters for collecting HTML form data from external systems
   •   inbound e-mail channel adapters (we'll use IMAP IDLE, though POP and IMAP are also possible) for processing incoming e-mails