Tuesday, May 23, 2017

Code Snippet: merging files by writing chunks

Do you have multiple source files that have to be merged so you can import them as a single one? This requirement is quite common, especially in automated data loads.

For example, our ERP system is exporting two files for Balance Sheet and Profit & Loss. We want to import them as a single file under the same POV. Merging the source files is the solution.

The approach taken is to merge files by writing chunks into target file. In this way, we avoid memory issues when having large source files.

Let's have a look!

Merging a list of files by writing chunks


'''
 Snippet:       Merge a list of files
 Author:        Francisco Amores
 Date:          23/05/2016
 Blog:          http://fishingwithfdmee.blogspot.com
 
 Notes:         This snippet can be pasted in any event script.
                Content of fdmContext object will be logged in the
                FDMEE process log (...\outbox\logs\)
                
 Instructions:  Set log level (global or application settings) to > 4 
 Hints:         Use this snippet to merge multiple single files into
                ont.
                It write chunks to avoid memory issues with large files
               
 FDMEE Version: 11.1.2.3 and later
 ----------------------------------------------------------------------
 Change:
 Author:
 Date:
'''

# initialize
srcFolder = r"C:\temp"
tgtFolder = r"C:\temp"
listSrcFilename = ["file1.txt", "file2.txt", "file3.txt"]
tgtFilename = "merge.txt"

# import section
import os
import shutil

try:
    # Open Target File in write mode
    tgtFilepath = os.path.join(tgtFolder, tgtFilename)
    tgtFile = open(tgtFilepath, "w")
    # Log
    fdmAPI.logInfo("File created: %s" % tgtFilepath)    
    
    # Loop source files to merge
    for srcFilename in listSrcFilename:
    
        # file path
        filepath = os.path.join(srcFolder, srcFilename)          
        # Log
        fdmAPI.logInfo("Merging file: %s" % filepath)                    
        # Open file in read mode
        srcFile = open(filepath, "r")        
        # Copy source file into target
        # 10 MB per writing chunk to avoid big file into memory
        shutil.copyfileobj(srcFile, tgtFile, 1024*1024*10)
        # Add new line char in the target file
        # to avoid issues if source file don't have end of line chars
        tgtFile.write(os.linesep)
        # Close source file
        srcFile.close()
        # Debug
        fdmAPI.logInfo("File merged: %s" % file)
        
    # Close target file
    tgtFile.close()
                    
except (IOError, OSError), err:
    raise RuntimeError("Error concatenating source files: %s", err)

Code snippets for FDMEE can be downloaded from GitHub.

Wednesday, May 17, 2017

Universal Data Adapter for SAP HANA Cloud

Some time ago I covered SAP HANA integration through the Universal Data Adapter (UDA). You can see details in the 3 parts I posted:
Now that everything is heading into the Cloud, why not playing around with SAP HANA Cloud?

SAP HANA Cloud
When I first tried to get a SAP ECC training environment, I noticed that SAP was offering nothing for free. Nowadays, things have changed a little bit. Luckily, they noticed that you need to offer some trial/training sandbox if you want people get closer to you.
For those who want to be part of the game, you can visit their Cloud site.

Why the Universal Data Adapter?
SAP HANA Cloud brings something called SAP Cloud Connector. Too complicated for me :-)
Luckily for me, I googled an easier way of extracting data from Cloud. There is something called database tunnels which allows on-premise systems to connect the HANA DB in the cloud through a secure connection. It doesn't sound quite straight forward but it didn't take too long to configure.

There are different ways of opening the tunnel. I have used the SAP Cloud Console Client which you can download from SAP for free.

Once the database tunnel is opened from the FDMEE Server(s) to the SAP HANA Cloud DB, the Universal Data Adapter can be used in the same way that we used with on-premise HANA DB.
Please, note that as I'm not using a productive cloud environment I had to open the tunnel via command line. This is fair enough to complete my POC.

My data in SAP HANA Cloud
I'm keeping this simple so I have a table in HANA Cloud with some dummy data:
Let's go through the configuration steps to bring that data into my application.

Importing data through FDMEE
As any configuration of UDA we need to:
  • Configure ODI Topology for the physical connection, logical schema and context
  • Configure FDMEE (source system, source adapter, period mapping, etc.)
ODI
Data Server needs to point to the DB tunnel:
We use the same JDBC driver as for HANA on-premise:
As usually, I create a dedicated context for this new source system. That gives me more flexibility:

FDMEE
In FDMEE, nothing different. 
We first create the source system with the context we created in ODI:
Then, add the source adapter for the table we want to extract data from:
Time now to import the table definition, classify columns and generate the template package in ODI:
As you can see above, FDMEE could reverse the HANA Cloud table so I can now assign the columns to my dimensions and regenerate the ODI scenario:

I'm not going to show how to create a location and data load rule as I assume you are familiar with that process.

Final step is to run our data load rule and see how data is pulled from the SAP cloud and loaded into HFM on-premise app through FDMEE :-)
I'm going to leave it here for today. As you can see, Universal Data Adapter provides a simple and transparent way of connecting our on-premise system with heterogeneous source systems, including SAP HANA Cloud!

Cheers