Profiling data

After loading data from a file or database, you may want to make a few checks to see if the data makes sense and there are no issues with its quality. In case of data quality issues, you may want to perform a more thorough investigation in order to understand the scale and patterns of the problem. This process is frequently called data profiling. EasyMorph provide comprehensive means to profile data:

  • The "Cell Metadata" dialog is used to profile cell values.
  • The "Column Profiler" dialog is for profiling individual columns.
  • Finally, the "Analysis View" is a powerful tool for instant filtering and exploratory data analysis.

Cell Metadata

The "Cell Metadata" dialog is invoked by right-clicking a cell and choosing "Cell metadata". It displays the cell value's data type and additional metadata. For instance, in the screenshot below, with the help of the cell profiler you can see that the cell value is actually a text, not a number.

Cell profiler

Note that the dialog is floating — you can keep it open while clicking different cells.

Column Profiler

The Column Profiler is invoked by double-clicking a column header. Alternatively, right-click the column header and choose "Filter/Profile".

Column profiler

The tab "Values" shows a list of unique values in the column. The list is searchable. Also, you can select particular values and create a filter action with them right from the profiler with a single click.

The tab "Profile" shows various counts and metadata that help understand what kind of values are present in the column.

Column profiler

Note that dates are numbers in EasyMorph (the type system of EasyMorph is explained later in the tutorial). Therefore, the Profiler shows counts for possible dates among number counts. Each count/metadata metric has a button for quick filtering.

Hint: The "Column Profiler" dialog is floating too. When the header of another column is clicked, the column metadata is automatically displayed in the Profiler window.

Watch a relevant video in

Advanced topics

Analysis View

When you maximize a table, EasyMorph automatically switches to the Analysis View. It's like zooming into one particular table of your workflow. To maximize a table, simply double-click its title bar or click the "Maximize" button (see below).

Maximize table

The Analysis View is a powerful tool for data analysis and profiling. In the Analysis View, you can instantly filter the result of any action without inserting a filtering action. Use instant filters to explore relationships in data, identify data quality issues, and find table records.

Analysis View

Hint: In the Analysis View you can still add/remove actions, and edit action properties in the left sidebar (collapsed by default).

To create an instant filter for a column, click on the column header and drag it into the filtering pane above. If the filter pane is hidden, you can enable it by pressing the "Filter pane" button (can be seen in the screenshot above). The instant filters are searchable and sortable, and retain selections when switching between actions in the table. It is especially convenient, that the filters show not only the column values included into the current selection, but also the excluded ones. They are a very powerful tool for exploratory data analysis and profiling.

Dataset filtering

Table metadata

To see the table metadata summary, press the "Table metadata" button in the ribbon menu of the Analysis View. The summary shows the column profiles (the same as in the Column Profiler described above) for all columns in the current dataset. Note that all numbers in the metadata summary table are clickable. When you double-click a number (or Ctrl + double-click for exclusion), an according selection is immediately applied in the respective instant filter.

Table metadata summary

To exit the Analysis View, press the "Exit Analysis View" button in the ribbon menu, or simply double-click the table's title bar again.