Tuesday, May 23, 2017

Code Snippet: merging files by writing chunks

Do you have multiple source files that have to be merged so you can import them as a single one? This requirement is quite common, especially in automated data loads.

For example, our ERP system is exporting two files for Balance Sheet and Profit & Loss. We want to import them as a single file under the same POV. Merging the source files is the solution.

The approach taken is to merge files by writing chunks into target file. In this way, we avoid memory issues when having large source files.

Let's have a look!

Merging a list of files by writing chunks

 Snippet:       Merge a list of files
 Author:        Francisco Amores
 Date:          23/05/2016
 Blog:          http://fishingwithfdmee.blogspot.com
 Notes:         This snippet can be pasted in any event script.
                Content of fdmContext object will be logged in the
                FDMEE process log (...\outbox\logs\)
 Instructions:  Set log level (global or application settings) to > 4 
 Hints:         Use this snippet to merge multiple single files into
                It write chunks to avoid memory issues with large files
 FDMEE Version: and later

# initialize
srcFolder = r"C:\temp"
tgtFolder = r"C:\temp"
listSrcFilename = ["file1.txt", "file2.txt", "file3.txt"]
tgtFilename = "merge.txt"

# import section
import os
import shutil

    # Open Target File in write mode
    tgtFilepath = os.path.join(tgtFolder, tgtFilename)
    tgtFile = open(tgtFilepath, "w")
    # Log
    fdmAPI.logInfo("File created: %s" % tgtFilepath)    
    # Loop source files to merge
    for srcFilename in listSrcFilename:
        # file path
        filepath = os.path.join(srcFolder, srcFilename)          
        # Log
        fdmAPI.logInfo("Merging file: %s" % filepath)                    
        # Open file in read mode
        srcFile = open(filepath, "r")        
        # Copy source file into target
        # 10 MB per writing chunk to avoid big file into memory
        shutil.copyfileobj(srcFile, tgtFile, 1024*1024*10)
        # Add new line char in the target file
        # to avoid issues if source file don't have end of line chars
        # Close source file
        # Debug
        fdmAPI.logInfo("File merged: %s" % file)
    # Close target file
except (IOError, OSError), err:
    raise RuntimeError("Error concatenating source files: %s", err)

Code snippets for FDMEE can be downloaded from GitHub.

No comments:

Post a Comment

Thanks for feedback!