Supported Clean Functions
Data cleaning is one those things that everyone does but no one really talks about. Proper data cleaning can make or break your project. Professional data citizens usually spend a very large portion of their time (usually about 40-70%) during their data workflow performing this cleaning work.
Why? Because of a simple truth: Better data = Better output.
In other words... garbage in gets you garbage out. For this reason, we created Phiona to help ourselves save ton of headaches down the road for projects. We have more features on the way, but would love to hear more feedback on what would make Phiona 10x better—please share your thoughts at: support@phiona.com
Inconsistent Data
Description
Detect structural errors during measurement, data transfer, or other types of "poor housekeeping". Phiona detects all of the column types, and if there are certain values that are not the same as the large majority of records, it will automatically flag these values as "invalid".
Remedy
Null Invalid Values (replaces value with a blank record) and Replace Invalid Values (with a standard record value).
Missing Data
Description
Detects missing values within rows, as missing values may indicate faulty data entry. Missing values can also create difficulty in building machine learning models using your dataset.
Remedy
Delete Rows with Missing Values
Duplicate Rows
Description
Detects duplicate rows within your dataset, with an exact match across all of the columns.
Remedy
Remove all duplicate rows, or leave a single duplicate row
Inconsistent Dates
Description
Inconsistent date formats in field columns. Example: YYYY-MM-DD vs DD/MM/YY
Remedy
Options dependent on the different date formats present in the file
Invalid Email Values
Description
Detects invalid email values based on a regular expression string.
Remedy
Null Invalid Email Values (replaces value with a blank record) and Replace Invalid Email Values (with a standard record value).
Last updated