Friday, April 7, 2017

Replacing source files on the fly - Playing around with XML files

The following post came up after seeing that lot of people in the FDMEE community were asking "how can we manipulate the source file and replace by new one on the fly?"
In other words, how we can replace the source file selected by end user with a new file we create during the import process... or let's be clear... how can we cheat on FDMEE? :-)

I thought it was good idea to share with you a real case study we had from a customer. Their ERP system had a built-in process which was extracting data in XML format. Hold on, but can FDMEE import XML files? Not out the box, Yes with some imagination and scripting.

The Requirement
As stated above, FDMEE does not support all kind of formats out of the box. We usually have to ask our ERP admin (or IT) to create a file in a format that FDMEE can easily read, mainly delimited files such as CSV.

But what about Web Services like SOAP or REST? they mainly return XML or JSON responses. We need to be prepared for that in case we want our FDMEE integration to consume a WS. This is quite useful in FDMEE on-premise as I guess Data Management for the Cloud will include "some kind of JSON adapter" sooner or later in order to integrate with non-Oracle web services.

And what about any other integration where source files are in another format than fixed/delimited?

Luckily I had that one... we get an XML file from our IT and we need to convert to CSV so we can import through FDMEE.

High Level Solution
The main idea is to convert the XML file selected to a CSV file that FDMEE can understand. Now the questions are Where and How?

  • Where? It makes sense that we do the conversion before the actual import happens. BefImport event script?
  • How? FDMEE will be expecting a CSV file, how do we convert the XML to CSV? There are multiple methods: Python modules, Java libraries for XPath... I will show one of them.
The XML File
Image below doesn't show the real XML (confidentiality) but a basic one:
As you can see data is enclosed in <data> tag an lines are enclosed in <dataRow> tags. Besides, each dimension has a different tag. 
As an extra for this post, just to let you know that I usually use Notepad++ plugin XML Tools which allows me to perform multiple operations including XPath queries:
Before we move into more details. What do you think it would happen if we try to import the XML file with no customization? 
FDMEE rejects all records in the file. What were you thinking then? That's the reason I'm blogging about this (lol)

Import Format, Location and DLR for the CSV File
In this case, our source type is File. However, I usually define instances of File when I want FDMEE admins/users to see the real source system (this is optional):
The Import Format (IF)  has been defined to import a semicolon delimited file having numeric data only (you can use any delimiter):
I'm going to keep it simple. One-to-one mapping between XML dimension tags and HFM dimensions:
The Data Load Rule is using the IF we just defined. As you may know, we can have one location with multiple DLRs using different IFs when source is File.

BefImport Event Script
The conversion will done in the BefImport event script which is triggered before FDMEE imports the file the end-user selected when running the DLR.

We can split this script into two main steps:
  1. Create the new CSV file in the location's inbox folder
  2. Update database tables to replace the original file selected with the new one created in step 1
The final solution could be more sophisticated (create the CSV based on IF definition, parse null values, etc.). Today we will go for the simple one.

Let's dive into details.

Converting XML to CSV
There are multiple ways of converting an XML to CSV. To simplify, we could group them as:
  • Method A: parses the entire XML and convert to CSV
  • Method B: converts nodes into CSV lines as we iterate them
Method A would be good for small files. It's also quite useful if our XML structure is complex. However, if we have big files we may want to avoid loading file into memory before converting which is more efficient. Therefore, I have decided to implement Method B. Within all different options we have, I will show the event-style method using xml Python module.

Which Python modules I'm using?
  • xml module to iterate the XML nodes (iterparse)
  • csv module to create the CSV file
  • os to build the file paths
Let's import the modules:

# Import Section
try:
    from xml.etree.ElementTree import iterparse
    import csv
    import os
    fdmAPI.logInfo("Modules successfully imported")
except ImportError, err:
    fdmAPI.logFatal("Error importing libraries: " % err)

Then we need to build the different paths for the XML and CSV files. We will also create a file object for the CSV file. This object will be used to create the csv writer.
The XML file is automatically uploaded to the location's inbox folder when import begins. The CSV file will be created in the same folder.

# Get Context details
inboxDir = fdmContext["INBOXDIR"]
locName = fdmContext["LOCNAME"]
fileName = fdmContext["FILENAME"]
loadId = fdmContext["LOADID"]

# XML File
xmlFile = os.path.join(inboxDir, locName, fileName)
fdmAPI.logInfo("Source XML file: %s" % xmlFile)

# CSV file will be created in the inbox folder
csvFilename = fileName.replace(".xml", ".csv")
csvFilepath = os.path.join(inboxDir, locName, csvFilename)

# To avoid blank lines in between lines: csv file 
# must be opened with the "b" flag on platforms where 
# that makes a difference (like windows)
csvFile = open(csvFilepath, "wb")
fdmAPI.logInfo("New CSV file: %s" % csvFilepath)

The writer object for the CSV file must use semicolon as delimiter so it matches with our IF definition. We have also enclosed non-numeric values in quotes to avoid issues in case you define your import format as comma delimited:

try:
    # Writer
    writer = csv.writer(csvFile, delimiter=';', quoting=csv.QUOTE_NONNUMERIC)
except Exception, err:
    fdmAPI.logDebug("Error creting the writer: %s" % err)

Once the writer is ready, it's time to iterate the nodes and building our CSV. Before seeing the code, I'd like to highlight some points:
  • We just want to capture start tags so we only capture start event in iterparse
  • We can include event in the for statement for debugging purposes (we can print how the XML file is read)
  • Property tag returns the XML node name (<entity>...)
  • Property text returns the XML node text (<entity>EastSales</entity>) 
  • We know amount is the last XML tag so we will write the CSV line when it's found
  • The CSV writer generates the delimited line from list of node texts (row)
try: 
    # Iterate the XML file to build lines for CSV file
    for (event, node) in iterparse(xmlFile, events=['start']):
        
        # Ignore anything not being dimension tags
        if node.tag in ["data", "dataRow"]:            
            continue

        # For other nodes, get node value based on tag
        if node.tag == "entity":
            entity = node.text
        elif node.tag == "account":
            account = node.text
        elif node.tag == "icp":
            icp = node.text
        elif node.tag == "custom1":
            c1 = node.text
        elif node.tag == "custom2":
            c2 = node.text
        elif node.tag == "custom3":
            c3 = node.text
        elif node.tag == "custom4":
            c4 = node.text        
        elif node.tag == "amount":
            amount = node.text 
        
        # Build CSV row as a list (only when amount is reached)
        if node.tag == "amount":
            row = [entity, account, icp, c1, c2, c3, c4, amount] 
            fdmAPI.logInfo("Row parsed: %s" % ";".join(row))        
            # Output a data row
            writer.writerow(row)
        
except Exception, err:
    fdmAPI.logDebug("Error parsing the XML file: %s" % err)

The result of this step is the CSV file created in the same folder as the XML one:
If we open the file, we can see the 3 lines generated from the 3 XML dataRows:
Cool, first challenged completed. Now we need to make FDMEE to import the new file. Let's move forward.

Replacing the Source File on the fly
FDMEE stores the file name to be imported in several tables. It took to me some time and several tests to get which tables I had to update. Finally I got them:
  • AIF_PROCESS_DETAILS: to show the new file name in Process Details page
  • AIF_BAL_RULE_LOADS: to set the new file name for the current process
  • AIF_PROCESS_PERIODS: the file name is also used in the table where FDMEE stores periods processed by the current process
To update the tables we need 2 parameters: CSV file name and current Load Id (Process Id)

# ********************************************************************
# Replace source file in FDMEE tables
# ********************************************************************

# Table AIF_PROCESS_DETAILS
sql = "UPDATE AIF_PROCESS_DETAILS SET ENTITY_NAME = ? WHERE PROCESS_ID = ?"
params = [csvFilename, loadId]
fdmAPI.executeDML(sql, params, True)

# Table AIF_BAL_RULE_LOADS
sql = "UPDATE AIF_BAL_RULE_LOADS SET FILE_NAME_STATIC = ? WHERE LOADID = ?"
params = [csvFilename, loadId]
fdmAPI.executeDML(sql, params, True)

# Table AIF_PROCESS_PERIODS
sql = "UPDATE AIF_PROCESS_PERIODS SET IMP_ENTITY_NAME = ? WHERE PROCESS_ID = ?"
params = [csvFilename, loadId]
fdmAPI.executeDML(sql, params, True)

Let's have a look to the tables after they have been updated:
  • AIF_BAL_RULE_LOADS
  •  AIF_PROCESS_DETAILS
  •  AIF_PROCESS_PERIODS
At this point, FDMEE doesn't know anything about the original XML file. Maybe some references in the process log, but nothing important.

Let's give a try!
Ready to go. FDMEE user selects the XML file when running the DLR:
Import is happening... and... data imported! XML with 3 dataRows = 3 lines imported
Process details show the new file (although it's not mandatory to change it if you don't want to)

I'm going to leave it here for today. Processing XML files can be something very useful, not only when we have to import data but in other scenarios. For example, I'm sure some of you had some solutions in mind where the Intersection Check Report (FDMEE generates an XML file which is converted to PDF) had to be processed...

I hope you enjoy this post and find it useful for your current or future requirements.

Have a good weekend!

4 comments:

  1. As always Francisco, thank you very much for the great information.

    ReplyDelete
  2. Nice Idea!
    Just a thought you could insert the data of the csv manually into the TDataseg_T instead of altering the tables, works aswell.

    Best

    Tim

    ReplyDelete
    Replies
    1. thanks, the problem of inserting into TDATASEG_T is that we cannot leverage other standard functionality like import expressions or import scripts.

      Regards

      Delete