Columnar In-Memory Data Transformation Engine

The "magical" user experience offered by EasyMorph has become possible thanks to its state-of-the-art in-memory calculation engine. Many databases and traditional ETL utilities have inherited their technical concepts from the 1970s when computer memory was costly, and therefore computers didn't have much of it. These days RAM prices are very low and keep declining. EasyMorph leverages the abundance of RAM and fast multi-threaded CPUs even on consumer-grade computers to bring numerous benefits to its users:

Data compression enables fast in-memory processing

Aggressive data compression allows EasyMorph to store the full result of every transformation step in RAM, even if it deals with millions of rows. This made it possible to process relatively large datasets entirely in memory, with the advantage of eliminating slow disk I/O operations altogether. Several smart optimizations help achieve significant compression ratios:

  • Data compression is done on the fly and is highly parallelized
  • Vocabulary compression of each column
  • Columns with the same value in all rows consume practically no memory
  • For each transformation, only the delta of changes in data is stored
  • Column vocabularies are reused between datasets, where possible

Data compression in EasyMorph has very low overhead in terms of performance. Most actions in EasyMorph operate directly on compressed data, which speeds up calculations even more and offsets the compression overhead. Data remains compressed throughout all workflow steps and is only decompressed when exported into a file or external system.

Whitepaper "Modern data transformation rethought from scratch" describes in detail the reasons behind creating EasyMorph as well as the application's technical design. Get the whitepaper.

Relaxed data type system simplifies development

Databases and traditional ETL tools require data to be strongly typed — i.e., each column must be of one and only one data type. This has historical reasons due to restrictions of databases created in the 1970s. Nowadays, a lot of information comes from semi-structured data sources such as spreadsheets, web-form data, or XML files. Strong type restrictions make working with such data unnecessarily complicated.

Just as in Excel, in EasyMorph, text and numbers can be mixed in the same column, which makes it a perfect tool for parsing spreadsheets and semi-structured data.

Smart auto-calculation saves time

When an action's parameters or properties are changed, the engine smartly recalculates only the actions that are affected by the change, thus avoiding costly full recalculations. It can do this because it stores in memory the full results of all steps of a workflow. Therefore it can keep the results of the unaffected steps intact and recalculate only the affected ones. This eliminates the need to re-run the entire workflow from the beginning again and again as you would have to do in other ETL tools or with scripts. Meanwhile, you can even keep working as the recalculation occurs in the background and is performed automatically, very much like in Excel.

Automatic parallelization speeds things up

Despite the proliferation of multi-core CPUs, most scripts (written in Visual Basic, Python, Qlik, SAS, etc.) are executed in a single thread only — i.e., one operation at a time, thus wasting time and resources. Even when multi-threaded execution is possible, it's typically non-trivial to arrange and requires advanced programming skills.

The EasyMorph engine takes full advantage of multiple CPU cores by analyzing calculation logic and processing data in parallel, when possible. As a result, computations are performed faster with better CPU utilization. Calculations in EasyMorph are parallelized automatically, with no pre-configuration required.

Fixed-point decimal arithmetics ensures accuracy

Double precision

The vast majority of data transformation applications treat numeric values as 64-bit floating-point numbers that were originally intended for scientific calculations. However, floating-point numbers are prone to rounding errors in business calculations. Most of the errors remain unnoticed and end up in reports and databases. For instance, the condition 0.1 + 0.2 = 0.3 incorrectly returns FALSE in many popular data tools because they use 64-bit floats.

EasyMorph processes numbers as 128-bit fixed-point decimals instead of 64-bit floats. Fixed-point decimals have been specifically designed for business calculations. They completely avoid the type of errors caused by the floating-point arithmetics. The condition mentioned above correctly returns TRUE in EasyMorph. In addition, the double precision significantly reduces or entirely eliminates rounding and roll-up errors.

Try next-generation data preparation today.

Free download   Book a demo