Supported Clean Functions

Data cleaning is one those things that everyone does but no one really talks about. Proper data cleaning can make or break your project. Professional data citizens usually spend a very large portion of their time (usually about 40-70%) during their data workflow performing this cleaning work.

Why? Because of a simple truth: Better data = Better output.

In other words... garbage in gets you garbage out. For this reason, we created Phiona to help ourselves save ton of headaches down the road for projects. We have more features on the way, but would love to hear more feedback on what would make Phiona 10x betterโ€”please share your thoughts at: support@phiona.com

Inconsistent Data

Description

Detect structural errors during measurement, data transfer, or other types of "poor housekeeping". Phiona detects all of the column types, and if there are certain values that are not the same as the large majority of records, it will automatically flag these values as "invalid".

Remedy

Null Invalid Values (replaces value with a blank record) and Replace Invalid Values (with a standard record value).

Missing Data

Description

Detects missing values within rows, as missing values may indicate faulty data entry. Missing values can also create difficulty in building machine learning models using your dataset.

Remedy

Delete Rows with Missing Values

Duplicate Rows

Description

Detects duplicate rows within your dataset, with an exact match across all of the columns.

Remedy

Remove all duplicate rows, or leave a single duplicate row

Inconsistent Dates

Description

Inconsistent date formats in field columns. Example: YYYY-MM-DD vs DD/MM/YY

Remedy

Options dependent on the different date formats present in the file

Invalid Email Values

Description

Detects invalid email values based on a regular expression string.

Remedy

Null Invalid Email Values (replaces value with a blank record) and Replace Invalid Email Values (with a standard record value).

Last updated