Structuring workflow

Giving your workflow a structure

When your workflow grows, the workspace may become cluttered with overlapping tables. Typically, it's a sign that tables in the workflow should be moved to another one or more groups, where each group is located in a separate tab. One tab represents one group.

Data preparation workflows naturally have logical stages (or phases) where the result(s) of one stage becomes the input(s) of another. For instance, loading data can be one stage. Cleansing the loaded data is another. Aggregating the clean data and calculating metrics is a third stage, and so on. It is usually a good idea to have each stage of your workflow in a separate group (tab). Thus, you can focus on one stage at a time. You can always see the full workflow in the workflow diagram (explained in the Advanced topics below).

create new group

To create a new group, simply create a new tab in the workspace (just like you create a new sheet in Excel). When a new group is created, you can either create new tables in it by adding new data, or move tables from other groups (tabs). To move a table into another group, right-click the table's title bar and select "Move to group...". It is also possible to derive a new table in a new tab.

create new group

Actions in table in one group may reference tables from other groups. The order of tabs doesn't affect the order of workflow execution. Tabs can be moved left or right just by dragging them, or by using the group context menu invoked by right-clicking a tab.

Hint: To quickly switch between groups, press Ctrl+Tab, or Ctrl+Shift+Tab.

By organizing actions into tables, and tables into groups, you can logically structure and comfortably navigate complex workflows with hundreds of actions.

Arranging tables for clarity

In EasyMorph, data is an integral part of workflow. It makes the workflow logic very transparent under the condition that tables don't overlap. Therefore, to keep good visibility of the workflow logic, keep tables relatively small and avoid overlapping tables. If you need to look closer at the table's content, you can always maximize it and switch into the Analysis View specifically designed for table analysis. Don't put more than 5–6 tables in one group (unless some of them are minimized).

overlapping tables vs not overlapping

When tables don't overlap, you can clearly see the arrows that show dependencies between tables and actions. Also, since actions in tables are executed left to right, it's typically a good idea to allocate tables in a group also left to right.

Hint: Instead of scrolling, pan across the workspace by dragging the gray background between tables.

Naming groups and tables

For better readability, give tables and groups descriptive names. A group name should describe the workflow stage in that group. A few examples of good names for groups (notice the imperative naming style):

  • Fetch raw data
  • Check data quality
  • Clean up
  • Calculate attributes
  • Aggregate
  • Calculate metrics
  • Export

Naming tables can be a bit tricky because they can represent a data entity or an operation, or both. When a table is referenced in an action, such as "Merge", the action consumes the result (final state) of that table. Therefore, it's usually a good idea to name tables after their outputs, not inputs. A few examples of good names for tables:

  • Orders
  • Orders with addresses
  • Region lookup table
  • Customer metrics weekly
  • Export

If the purpose of the table is to perform an operation (especially an external one) then it may not have a meaningful data entity as its output. In this case, name the table after the high-level operation performed by the actions in this table. A few examples of good names for such tables:

  • Export to database
  • Upload modified files
  • Send email notification
  • Validate parameters

Hint: To quickly rename a table, press F2. Press Ctrl+F2 to select next table.

Advanced topics

Workflow diagram

The workflow diagram is available in menu "Diagram" and provides a high-level overview of the current workflow. The diagram displays:

  • Groups
  • Tables and charts in groups
  • Dependencies between tables
  • Dependencies between groups
  • External dependencies on files, connectors, and applications
  • Workflow dependencies on other modules and projects

When a table is selected, the table dependencies are highlighted in the diagram. The tables on which the selected table depends are highlighted with green color. The tables (and charts) that depend on the selected table are highlighted with blue.

module overview

Hint: Right-click a table and choose "Open in Analysis View" to invoke the Analysis View explained earlier.

Hint: Switch between modules while staying in the diagram view mode.

Hint: Open files and file locations by clicking on external file dependencies.

Looking for more

Do you want to discuss this topic?
Join our community forum.