Yuan's SOA Blog: Mule DataWeave and DB insert Operation

With the recent release of Mule ESB 3.7, DataWeave is replacing DataMapper as the new mapping tool. DataMapper is deprecated in 3.7. You actually need install a plugin for the DataMapper to work on the stand alone Mule ESB.

DataWeave is so new there are few online resources available. I have found the following 3 online resources that helped me to get started.

https://www.youtube.com/watch?v=R5dUqDrvFvI

https://developer.mulesoft.com/docs/dataweave

https://github.com/mulesoft/anypoint-examples

As a beginner of DataWeave, I have learned a few things through trial and error. I’m posting up these notes to help myself in my learning process.

Comparing with previous DataMapper, I personally see some pros and cons:

DataWeave Pros:

3 data panels on the same screen, the output samples is updated simultaneously as you update the code/script (when it works).
You can switch between the output format spontaneously by modifying /java, /json, /csv etc
You can have multiple outputs with different formats
The mapping script/code is inside the same XML file, so you don’t have to track another file somewhere else
The syntax is much easier than the DataMapper complex groovy code
Other misc. benefits, such as selecting default value is a breeze

DataWeave Cons:

New release, limited learning resources available
limited GUI function, no drag and drop
You can type code directly in, if the output doesn’t match the output, then you are on your own.
Some bugs still need be flushed out.

Before I even start, let me clarify a trivial detail. When you fire up Anypoint Studio, there is no such thing as “DataWeave”, it is actually called “Transform Message”. With that out of the way, I am using “DataWeave” liberally in this post, so long you understand what I’m referring to.

One common usage of DataWeave is to transform data in a format that will go into database. Besides general notes on DataWeave, I’ll show a few things within the context of DB insert.

1. Manually declare the input/output data format:

In order for DataWeave to show the input (panel), processing script (panel), and the output (panel) correctly, it relies on DataSensor to sense the input and output payload formats correctly. Sometimes that format information is not available. For example, in the case the input/out XML strings do not have associated schemas. You can manually declare the input data format so DataSendor can pick up the formats: 1) click on the predecessor of Transform Message processor 2) select MetaData, 3) select the “output”, 4) select the editor button 4) pick your XSD.

By the same token, you can pick the output format by setting the input format for the subsequent processor.

2. DB insert data format:

In the Oracle SOA/OSB world, before invoking a DB insert operation, you must have a clearly defined XSD for the insert operation. So a natural question to ask is what is the input data format for Mule ESB DB insert operation? You noticed that I didn’t say “input XSD”, because, unlike SOA/OSB, Mule ESB doesn’t exclusively work with XML format.

Technically, the data in the insert statement can come from different sources (not just from the payload) in various formats.

Although there are various formats to use for DB insert, it is important to know that things work the best if the DB insert (non-bulk) is defined like this:

“insert (cola, colb, …) values (#[payload.a], #[payload.b], …)”.

This is equivalent to saying the DB processor “expects” (defines) the input to be a Java LinkedHasMap like

{a=’foo’, b=’bar’).

If you declare your insert statement this way, then DataSensor can pick up (sense) this format automatically.

There is not much compile time enforcement of the input data format with DataWeave/DB insert. If your insert input is not what is expected, then the DB operation will throw exception at runtime.

Technically, the DB insert can extract data from anywhere using any valid syntax. For example, I can have

“insert into … (acol, bcol, ccol) values (#[xpath3('/per:person/per:emplId')], #[flowVars.b], #[flowVars.c])”

The only thing the statement implies is the payload is XML and should have this element ‘/per:person/per:emplId'.

When you have insert statement like this, then DataSensor can’t help you at all. You are totally on your own to make sure the payload is formatted as expected. i.e “payload” has the correct XML data, and flowVars “b”, “c” are filled with correct values. If you drag a Transform Message in front of such a DB insert, the Transform Message won’t show any sample output, because it simply doesn’t handle complex data format like this. In this case, you’ll have to guarantee with your own eyes that the mapping scripts (text) are entered correctly.

To sum it up one more time, when the insert statement is coded, it “declares” where (and how, to certain degrees) the data is going to come from. That declaration becomes the “expected” input data format. If it comes from multiple input sources, you need to prepare those data items (payload, flowVars or whatever) accordingly.

When the insert statement is pure like “insert … (x, y, …) values (#[payload.a], #[payload.b],…)”, then you will get most help from the DataWeave as you type your mapping script/code.

3. More DB Insert Example

What if you have an insert statement like below?

“insert … (addr, city, zip, fname, lname, …) values (#[payload.addr], #[payload.city], #[payload.name.first], #[payload.name.last],…)”,

Then you need prepare your LinkedHasMap “payload” like this:

{addr=’111’, city=’DC’, zip=’000’, name { first=’foo’, last=’bar’}…}
Please note, even with this small variation in (expected) insert format, DataWeave struggles to fully support the output sample panel.

4. Now let’s take a look at the Bulk Insert operation.

Suppose you have “insert into … (addr1, addr2, addr3…) values (#[payload.addr1], #[payload.addr2], …), and “bulkMode=true”. Then the expected input format is

[

{addr11, addr12, addr13},

{addr21, addr22, addr23},

…

]

However, it is not always possible to translate the output into the above format with DataWeave. My trick is to do 2 steps:

…

address: payload.ns0#person.ns0#personContact.*ns0#address map {

address1: $.ns0#address1,

address2: $.ns0#address2,

...

}

Then followed by

<set-payload values=”#[payload.address]" doc:name="setAddrRecs" />

Now your input is an ArrayList of LinkedHashMap, which is the builkInsert expected format in this case.

5. Do not put DB to_date() inside DataWeave

It’s too tempting to do this in DataWeave script:

birthdate: "to_date('" ++ payload.ns0#person.ns0#demographics.ns0#birthDate ++ "', 'yyyy-mm-dd')"

then use it in the insert directly like: insert … #[payload.birthdate]…

That will fail miserably. Instead, do this:

DataWeave script:

birthdate: payload.ns0#person.ns0#demographics.ns0#birthDate

DB: insert … to_date(#[payload.birthdate], 'yyyy-mm-dd')

6. DataWeave default value

DataWeave script:

address: payload.ns0#person.ns0#personContact.*ns0#address map {

address1: $.ns0#address1 default ' ',

address2: $.ns0#address2 default ' ',

What does it do?

If input element <address1> doesn’t exist, it will fill in space for output “address1”. However, if the input has <address1/> (empty element), then the output “adddress1” will be “”. If you want output address1=” “ for both cases, then you need this:

address1: $.ns0#address1 when ($.ns0#address1? and $.ns0#address2 != "") otherwise ' ',

7. “zip” is reserved?

It appears to me “zip” has some special meaning internally.

I cannot put anything after a field with name “zip:”. I can’t put zip: blah default ‘99999. In fact, I can’t even put zip as the last field in the mapping, because that requires a “}” after “zip”, even that will cause dataWeave to fails. So either it’s a bug or it’s an undocumented internally reserved keyword.

Yuan's SOA Blog

Wednesday, August 19, 2015

Mule DataWeave and DB insert Operation

9 comments: