Creating Perl Modules for Data Quality

By Kim Ryan.

Published by The Open Source Developers' Conference Papers

Format Price
Article: Electronic Free Download

A brief history of how I started to create Perl modules for data quality.

A run through of the design, and functionality of modules for parsing of person's name. A description of creating a formal grammar with Parse-RecDescent. Using regression testing to validate changes.

How hard was it to develop CPAN modules:
Amount of time needed to create modules and keep them up to date. Dealing with complexity.

Building on the work of others:
How to find the best CPAN modules that can save you reinventing the wheel. Researching current designs and algorithms

Making sure it works:
Building up a base of users who can assist with testing and supplying sample data. Handling requests for bug fixes and enhancements. Keeping control of the scope

How does it stack up against commercial software: compare features and accuracy of Perl modules to their commercial equivalents

What's next: Other data quality modules that still need to be developed, GUI interfaces, integration with other tools, data warehousing, ETL

Keywords: Perl, CPAN, Data Quality, Parsing, Software Design

Article: Electronic (PDF File; 503.063KB). Published by The Open Source Developers' Conference Papers.

Kim Ryan

I have authored five CPAN modules, mainly in the areas of data quality and text parsing. For details see http://search.cpan.org/~kimryan/