Supported Clean Functions
Last updated
Was this helpful?
Last updated
Was this helpful?
Data cleaning is one those things that everyone does but no one really talks about. Proper data cleaning can make or break your project. Professional data citizens usually spend a very large portion of their time (usually about 40-70%) during their data workflow performing this cleaning work.
Why? Because of a simple truth: Better data = Better output.
In other words... garbage in gets you garbage out. For this reason, we created Phiona to help ourselves save ton of headaches down the road for projects. We have more features on the way, but would love to hear more feedback on what would make Phiona 10x betterโplease share your thoughts at:
Detect structural errors during measurement, data transfer, or other types of "poor housekeeping". Phiona detects all of the column types, and if there are certain values that are not the same as the large majority of records, it will automatically flag these values as "invalid".
Null Invalid Values (replaces value with a blank record) and Replace Invalid Values (with a standard record value).
Detects missing values within rows, as missing values may indicate faulty data entry. Missing values can also create difficulty in building machine learning models using your dataset.
Delete Rows with Missing Values
Detects duplicate rows within your dataset, with an exact match across all of the columns.
Remove all duplicate rows, or leave a single duplicate row
Inconsistent date formats in field columns. Example: YYYY-MM-DD vs DD/MM/YY
Options dependent on the different date formats present in the file
Detects invalid email values based on a regular expression string.
Null Invalid Email Values (replaces value with a blank record) and Replace Invalid Email Values (with a standard record value).