Columnar In-Memory Data Transformation Engine

The unique user experience offered by EasyMorph has become possible thanks to its innovative in-memory data transformation engine. Many databases and traditional ETL utilities have inherited their technical concepts from the 1970s, when computer memory was very expensive and therefore computers didn't have much of it. These days RAM prices are very low and keep declining. This made it possible to process datasets with millions of rows right in memory, with the advantage of eliminating slow disk I/O operations entirely, and accessing full datasets instantly. EasyMorph is a new type of data transformation application that greatly benefits from the new abilities enabled by the modern technologies:

Aggressive data compression

Data compression allows EasyMorph to store full results of every transformation step in RAM, even if they contain millions of rows. Several smart optimizations help make it even more efficient:

  • Vocabulary compression applied to each column, independently.
  • Data compression is done on the fly and is highly parallelized.
  • Columns of any length but with single value consume practically no memory.
  • For each transformation, only the delta of changes in data is stored. Input + delta = output.
  • Column vocabularies are reused between datasets, where possible.
  • EasyMorph Server optimizes RAM usage even more by purging datasets from memory when they are no longer needed for further calculations.

Data compression has very little overhead in terms of performance. Most transformations in EasyMorph operate directly on compressed data (which sometimes even speeds up calculations). Therefore data is frequently stays compressed during all calculations and is only decompressed when exported into a database or file.

Mixed data typing

Databases and traditional ETL tools require data to be strongly typed — i.e. each column must of one and only one data type. This has historical reasons due to restrictions of databases created in the 1970s. Nowadays, a lot of information comes from semi-structured data sources such as spreadsheets, web-form data or XML files. Type restrictions make working with such data sources unnecessarily complicated.

Just as in Excel, in EasyMorph text and numbers can be mixed in the same column, which makes it a perfect tool for parsing semi-structured data.

Smart auto-calculation

When you change parameters or properties of a transformation the engine smartly recalculates only transformations that are affected by the change. It can do this because, remember, it stores full results of all transformation steps in memory. Therefore it can keep results of unaffected transformations, and recalculates only affected ones. This eliminates the need to re-run everything from the beginning again and again. Meanwhile, you can even keep working with the project as the recalculation takes place in the background, and is triggered automatically.

Automatic parallelization

Most scripts (Visual Basic, Python, Qlik, SAS, etc.) are executed in a single thread — i.e. one operation at a time. Even when multi-threaded execution is possible it typically is non-trivial and requires advanced programming skills. This is yet another consequence of old computer architectures from the times when most computers had only one CPU.

However, modern computers have multiple CPU cores and allow parallelized multi-threaded execution. The engine takes full advantage of it, by analyzing calculation logic and processing data in parallel, when possible. As a result, computations are performed faster with better CPU utilization. Calculations in EasyMorph are parallelized automatically, with no pre-configuration required.

Try EasyMorph. It's a new experience.

Free download