Supported Transformations
Last updated
Last updated
Data transformation is the process of changing and manipulating your data, so that it is structured and formatted correctly. Generally, data transformation is essential for converting information from the original source system into the data format that is required for the destination system, or for downstream analysis.
Phiona supports numerous methods to transform your data structure and values, join multiple datasets together, or enrich information from outside API data sources. We are consistently adding additional transformations, but would love to hear more feedback on what would make Phiona 10x better—please share your thoughts at: support@phiona.com.
Using Phiona, you can look up information from any external service API that utilizes basic or API key authentication methods. This transform is useful in the case that you want to enrich your customer or prospect database with additional information, or integrating your dataset with machine learning models.
The API enrichment transform requires no coding knowledge whatsoever, but it is helpful to be familiar with API documentation in general as well as for the specific API you intend to use.
The first time that you enter the API enrichment transform, you'll need to set up a new API connection. Once you've set up this API connection, you can use it for any dataset in the future.
When you click on the setup new connection link, the below screen will appear. The first step is to name your API connection so that it's easily referenced in the future, and add the API link that you will have found in the API documentation.
The next step is to determine whether you need to perform a GET function or a POST function. GET functions allow you to look up information from the API and return it, and POST functions allow you to send information to an API for a response. Your API documentation will detail the type of function you will need to use.
Once you have determined the type of connection that is required, you will need to determine whether authentication headers are needed (essentially, permission) for you to access the API.
There are two types of authentication that Phiona supports: basic authentication, and API key authentication. Please note: we do not yet support the OAuth2.0 authentication method, or any other non-basic/API key method.
If your API documentation requires you to use basic authentication, you will need to put in the following information:
The Header Name field will be filled in with "Authorization" (no quotes)
The Header Value field will be filled in with "Basic: {encodedpassword}" (no quotes). The {encodedpassword} value will be a base64 encoded version of your API "username/password". Normally, the API documentation will tell you the username and password, and you will need to manually encode it into base64. You can do this for free at https://www.base64encode.org. For more information on Basic Authentication, please read the Wikipedia article here: https://en.wikipedia.org/wiki/Basic_access_authentication.
If your API documentation requires you to use API key authentication, you will need to put in the following information:
The Header Name field will be filled in with the API Key Header value provided in your API documentation, for example: x-api-key
The Header Value field will be filled in with the API key provided, usually a string of random letters and numbers
Once you have entered your authentication information, you can add the parameters that the API requires to GET or POST your information.
In the case below, we're going to use an API that returns information for a specific country specified. The URL and Authentication has already been added, and now parameters are required to access the API. The name of the parameter we're using will be "country_name" and I'll put in a sample value of "Norway" to see what returns. Referring back to the API documentation, it seems that you add parameters as a part of the URL path ( / in front of the value) as opposed to a query parameter ( ?= in front of the value).
If there is an API that requires the same value for every single API lookup (for example: if you're converting currencies, USD would always be the same even if the amount changes), you would select the "Always a Static Value" option.
Once you are all set adding the parameters, click the "Test" button to return a sample response.
When the response returns, hopefully you will see a response with multiple values below, like this:
If you run into a failure, you might have an error message in the response, in which case, you may need to refer back to your documentation to make sure you've entered in your authorization headers and/or parameters in the correct format.
Once you see the correct API information, you can select any value with an orange dot next to it, to bring that into the dataset view. As you select the values, they'll populate at the top of the sample box.
When you have the correct values that you want to save for your API, click the "Save" button to use this API enrichment.
After you return to the main API Enrichment transform focus area, you can then select your API connection, and the value columns that you selected earlier will automatically populate in blue to the right of the last column in your dataset, like the below image:
You then will select the column where your parameter values will be located (in this case, it is the "country" column), and then click "Show Preview".
If the preview worked, you'll see the values populate in the blue columns automatically.
If there is an issue with the API, Phiona will pop up a message saying that there was an error receiving responses. Generally, this means that we're trying to get information more quickly than the API supports. To fix that issue, select the "Use Low Request Mode" option.
Once you are done enriching your preview data with the API, click "Save" to add it as an action to the blueprint. You can also delete your API connections at any time from this screen as well by clicking the "Edit Existing Connections" button.
Working with APIs can be very tricky- if you continually run into issues, please feel free to reach out to support@phiona.com for additional support.
We separate joins and appends to make it easier for our users to understand the different outcomes that are possible, depending on the previous tools they were using. We found that our users were confused by these combine concepts.
Phiona allows you to combine multiple streams of data—via Inner Join, Left Join or Append. The techniques, although seemingly similar, yield different results. When datasets are appended, additional rows are added to the attribute table. However, when datasets are joined, additional columns are added.
Phiona supports three types of combining data: appends, inner joins, and left joins.
This stacks two datasets on top of each other (i.e., the rows are added to the end). Imagine two datasets with identical columns, but different data (e.g. transaction data for the same retail store from two different months). Using Append, you can combine these into a single dataset.
This merges two sets of data horizontally (i.e., the columns are added to the end) based on a common value, sometimes called a "key." For example, imagine the same monthly transaction dataset from above, that includes all of the customer email addresses used for the transaction. You also have another dataset that has customer information for all of your store's loyalty members, which includes email addresses. If you were looking to understand all of the loyalty members who made a transaction in a given month, you would inner join the two files on the customer email address (the "key") and the join would return only the transactions and customer details for email addresses that matched.
Left Joins are similar to Inner Joins, but will also include all of the information from the first dataset regardless of match. Like the Inner Join, the result of combining data using a Left Join is a data set that is typically extended horizontally by adding columns of data.
To continue the example, if you were performing a Left Join on the data, you would still have all of the data from the transaction data set (even for those transactions that weren't made by loyalty members), while only the matching customer information from the loyalty dataset.
An Append function stacks two datasets on top of each other. Ideally, the data sets that you append have the same number of field columns, and those field column have matching names and data types.
Select Append once you select the Combine Data Sets transform option.
The dataset in this dataset view will be your primary dataset. Select the second dataset to append to it. Please ensure that the column headers for each dataset are the exact same.
You will see a preview of the append when you scroll to the bottom of the screen.
Click the Save button to add the append transformation to your blueprint.
Select Inner Join or Left Join on the Combine menu.
The dataset that you initiated the Combine option on will be the initial dataset in the join. Select the field that will be the "key" value to join.
When joining tables, the fields that you join on must have the same data type.
Select the second dataset that you would like to join with the first.
Select the common key with unique values that will connect the first dataset to the second dataset.
Preview your joined data and make any adjustments to the key values if necessary. You will see filled in preview columns in orange if a match is possible, and if not, the orange columns will be blank and you may need to adjust the key values.
Preview is limited to 100 cell rows. Select Save to add the Combine step to the blueprint, and the full join will occur when you save and run your blueprint.
The Find and Replace transform allows you to identify a string of text values, numbers, or blank cells in a column, and then replace those values with the values of your choice. This transform makes it easy to manipulate information all at once.
Select a column where you want to search for a string of values.
Enter the value string that you'd like to search, or select the "Find Blank Values" option to search for blank cells within a column.
Enter your replacement value, or replace with blank values (useful for getting rid of extra information in the cell that isn't necessary).
You can see a preview of each of the cells with changes (in blue) within the dataset view, similar to below:
The Group transform is helpful for when you want to count certain values by a common category or find the sum, average, maximum, or minimum of a group of values in a common category.
The first step is to select the columns you'd like to aggregate. You can group by up to two columns (for example, continent and country), and then as many value columns as you'd like.
Select the columns with values that you'd like to group together. If you would like to add additional value columns, you can add more below the first value column selection.
If you select a column with non-numerical data, you will be able to group by total count of values or total distinct count of values.
If you select a numerical data column, you will be able to group by sum of values, total count of values, total distinct count of values, minimum value in the group, maximum value in the group, or the average value in a group.
Once you select your grouping columns and value column(s), you will see a preview of your values.
Please note that these preview values are not your final values! The final values will be reflected when you save and run your blueprint.
The Math transform enables you to create calculated values out of multiple number columns within a dataset. The Math transform currently supports the following operations (and their corresponding keyboard symbols):
Addition +
Subtraction -
Multiplication *
Division /
Parentheses ()
Create a new column name for your calculated value.
Enter your equation in the Column Equation field. To access a column header name, use the "@" symbol and a list of valid header name values will pop up (number and currency fields only).
If the equation is valid, the equation field will highlight in green. If it is invalid, it will highlight in red.
You will also see a preview of your valid equation results in the dataset view.
The Merge Column Transform makes it easy to combine the information from two cells together, similar to a CONCATENATE function in Excel or Google Sheets. This step is helpful if you want to create a list of multiple items within the same cell or create a unique value that isn't duplicated.
Select the columns with the values that you would like to merge together.
Select the character that you'd like to separate the merged values by, or select the Custom option to define your own separation values.
Now fill in the new column name, and you'll see a preview of your newly merged column in the dataset view.
The Phiona Pivot / Unpivot Data transform is very similar to what you might find in Microsoft Excel. It allows you to morph how your data is presented based on the values within existing columns.
The Pivot function allows you to roll up your information into values broken out by rows, columns, and values.
The Unpivot function takes multiple columns, merges them into a single column, and creates rows out of the unique values.
To pivot your data, select the Pivot Column option at the top.
Once you've selected the Pivot option, you'll be able to transform your data by selecting the columns, rows, and values to manipulate. We'll use this example dataset:
In this case, we'd like to see all of the product category amounts broken out by payment types. So, we would select "product" as the row, "payment_type" as the column, and then "amount" as the value (with sum as the value type). If we had multiple values (like number of items sold, etc) we could also add additional columns for those as well.
For text fields in the values columns, you can select total count, unique count, and actual values.
For number fields in the values columns, you can select total count, unique count, actual values, sum, average, minimum and maximum.
Once you put in those values, the dataset will look like this:
To Unpivot your data, you'll do something similar to the Pivot function. Here's the example dataset we'll use:
Once you select the Unpivot option, you'll then select the columns that you'd like to Unpivot and then name the unpivoted column that will be created.
Now, you should see your existing profit and loss columns rolled up under a new column, called "results".
You can also choose to keep any blank values that may have been in those columns for continuity.
The Search & Group transform makes it easy to categorize value strings from a certain column into a new or existing column (Search and Group by Values), or to bucket your numerical values into different categories (Search and Group by Range).
For a Search and Group by Values example, you can find all of the countries in Europe, and categorize them as "Europe", or extract certain email domains you want to exclude out of a list of email addresses.
For a Search and Group by Range example, you can create a range of prices that in your price list that would be considered "low cost", say anything under $5, and that would be categorized in a new column as Low Cost.
First, select the Values or Range option.
For Search and Group by Values, select the column that you want to search for a value string, or set of value strings.
Select a column that you would like to add a new category value to, or create a new column. If you create a new column, you'll be asked to name it.
Enter the values that you would like to search for (it may be helpful to scroll through your dataset to figure out which values exist in the preview). You can enter multiple values to search for by separating each string with a comma.
Enter the category value that you would like each of those phrases to have assigned to them. When you add the Group Value, you'll see a preview of your grouping in the dataset view.
For Search and Group by Range, once you've selected the column you'd like to categorize, whether you'd like to insert the result into a new column or existing column, and what the new column name should be, you'll be able to enter in a value for the beginning or end of the range.
If you leave the beginning of the range field blank, all of the values up to the end of range will be categorized. Similarly, if you leave the end of range blank, all of the values higher than the beginning the range number will be categorized.
To use the range function for currency values, it works the same as with numbers, but you'll need to put the appropriate currency sign in front of the number (ex: $5)
The Split Column Transform makes it easy to separate the contents of a single cell together. This step is helpful if you want to separate out first and last names, addresses, or identification numbers.
Select the column with the values that you would like to split.
Select the character that you'd like to separate the values by, or select the Custom option to define your own separation values.
Now fill in the new column name, and you'll see a preview of your newly separated column in the dataset view.
If you want your initial column name to be different, select Rename Column from the header menu to add that step afterwards.
The Unwrap JSON (JavaScript Notation) arrays and objects transform function is useful when you're working with non-relational data. When unwrapping JSON to a tabular structures, there are two ways to do it- expand to columns (helpful when the array is something like address with multiple pieces of information) or expand to rows (helpful when the objects are something like comments, which there may be many for a particular blog post, restaurant, etc). Below is an example of what JSON data might look like:
First, select the column that you would like to unwrap.
The Keep Original Column option allows you to keep the original, non-unwrapped JSON dictionary/list as part of your dataset.
You'll then choose to be able to unwrap your JSON in two different ways. When you unwrap to columns, if there are multiple values within the array, new columns will be created for each of those values. When you unwrap to rows, you will duplicate the rows where those multiple values exist. Both will be previewed before you continue.