Loading data from files

There are two ways to load data from files into EasyMorph:
  • Load one or more files, create a separate table for each loaded file
  • Load multiple uniform files as a single table

Loading a file

The easiest way to load a file is to simply drag the file into EasyMorph. In this case EasyMorph automatically creates a new import transformation depending on the file extension. Extensions that are recognized automatically: xls, xlsx, txt, csv, psv, tsv, qvd, sas7bdat. You may need to adjust settings of the created import transformation in order to load the file correctly — e.g. choose a separator, or pick columns.

Another way is to create an import transformation explicitly. You can select appropriate import transformation from the Start screen, or by going to menu Main (or Design) and pressing "Insert table or chart" button.

Below are file formats supported in EasyMorph:

Description Extensions
Delimited text file (e.g. CSV) .csv, .psv, .tsv, .txt or other
Text with fixed width columns .txt or other
Excel spreadsheets .xls, .xlsx, .xlsm
XML files .xml
Qlik QVD files .qvd
SAS data files .sas7bdat
SQLite data files* .sqlite, .sqlite3 or other
* To load from an SQLite data file create a database connector first. See Loading from databases for details.

Example: US Census 2012

Hint: To reduce clutter move tables to different tabs. To move a table to another tab create a new tab, then right-click the table tile bar and choose "Move to another tab", or press Ctrl+M.

To load a few files drag them one after another into EasyMorph (one at a time), or create a separate import transformation for each file. One import transformation creates one table. You can later use "Append" transformation to concatenate them into one table, if needed.

Loading multiple uniform files as one table (ver.3.6 and above)

To load several uniform files (e.g. files of one type with the same set of column headers) use the multuple load mode which is available in any file import transformation.

In this mode multiple files in a particular folder are loaded and automatically concatenated into one table in EasyMorph. The files to load can be defined in two ways:

  • Explicitly select particular files to load.
  • Select all files that match a search criterion such as search string, wildcard, or regular expression.

Advanced topics

Loading a particular subset of multiple files

When loading multiple files as described above it is possible to filter file names based on a search criteria such as substring, wild card, or regular expression. However, in some cases more advanced filtering might be required. A few examples:

  • Load only the latest 10 files based on file creation date
  • Extract timestamps from file names, load only the lastest 10 files based on the timestamps
  • When multiple files per day exists, load only the latest file
  • Exclude files with zero length
  • Load only files with timestamps that aren't already loaded
  • Include files from subfolders
  • Extract timestamps from subfolder names

Such cases are arranged in EasyMorph using iterations. Iterations is a somewhat advanced topic. It is explained later in this tutorial in the chapter "Loops and iterations". Briefly, it works as follows:

  • Create a project (A) that loads one file; create a parameter
  • In Import transformation replace file name with a parameter; save the project
  • Create another project (B) with a list of files to load (you can use "List of files" transformation to create such list)
  • Apply necessary filtering transformations
  • Use the "Iterate" transformation in the "Iterate and Append" mode to run project A for every file in the list

See example of loading multiple files : Iterations.zip (example #1)

Processing very big text files

If the file(s) to load are too big (e.g. hundreds of gigabytes) and can't fit in memory then you can split them into chunks first, and process one chunk at a time. The "File splitter" transformation splits text files into chunks based either on number of rows (used for files with fixed width columns), or unique values in particular field (i.e. partitioning of delimited text files). The "File splitter" transformation returns a list of created files (the chunks), which can further be used for iterating.

Read next: Load data from database