added a perl script to clean out some common broken encoding stuff

Community Data Science Collective || Want to submit a patch?