CRISP-DM and GIS, the value of superior project documentation in GIS

As a GIS consultant, I am often working on many different projects for many different clients at any given time. While I like to think of myself as someone with a good head on his shoulders there is only so much I can remember, so when a client contacts me and asks me a detailed question about a GIS task I did four months ago as part of a large project, I am glad I have good documentation to refer back to. Time for adequate technical documentation should be written into every GIS project scope.

Due to the increasing complexity of GIS workflows one would think that technical documentation would be the rule rather than the exception but that has not been my experience. I rarely see technical documentation as part of the project delivery unless the documentation was the project, such as in a needs assessment or master plan. Creating extensive documentation serves two purposes. As mentioned earlier, it gives you something to refer back to if additional questions are asked by the client or additional work is needed. Also, it gives the client something to refer to if they have questions.

Have you ever started a new position and been put on an existing project only to find that the person you replaced did not document anything they did? Worse yet, half the files needed for the project were still on their laptop which was wiped by IT and given to you. Think of the wasted time and money, not to mention frustration that could have been avoided with appropriate documentation.

Likewise, thorough documentation provided to the client also saves extra work as a lack of documentation is often the cause of repeated questions and explanations. After a project is completed it is not uncommon for people not involved in the original project to start asking the question of why something was done the way it was. Good documentation should answer the question of “why” something was done in addition to “what” was done. Data science has implemented an industry standard to ensure data science projects are extremely well documented because, as we in the GIS industry know, when data is involved there are a multitude of decisions and processes involved.

In data science the CRISP-MD (cross-industry process for data mining) methodology is the standard for data mining project planning. Until taking a deep dive into a formal education in data science I had never heard of CRISP-DM. After some exposure to it I think this methodology, or something very similar, would also serve as an excellent standard for the GIS industry as many of the same elements of a data mining project are also elements of a GIS project. This is not surprising as both GIS and data mining have a similar goal of turning raw data into useful information. The following steps outline the CRISP-DM process. For detailed information on each element, Data Science Project Management (https://www.datascience-pm.com/crisp-dm-2/) has an excellent write-up that goes into more detail on each step.

CRISP-DM Steps

1. Business Understanding
    A. Determine Business Objectives
        a. Background
        b. Objectives
        c. Success criteria
    B. Project Assessment
        a. Inventory of resources
        b. Requirements, assumptions and constraints
        c. Risks and contingencies
        d. Terminology
        e. Costs and benefits
    C. Determine Goals
        a. Data goals
        b. Data success criteria
    D. Project Plan
        a. Project Plan
        b. Tools/techniques assessment

2. Data Understanding
    A. Data collection
        a. Data collection report
    B. Data description
        a. Data description report
    C. Data exploration
        a. Data exploration report
    D. Data QC

3. Data Preparation
    A. Data Selection
        a. Include/exclude data
    B. Data cleaning
        a. Data cleaning report
    C. Data creation
        a. Database design
    D. Data integration
        a. Merged data

4. Modeling
    A. Select modeling techniques
        a. Modeling techniques and assumptions
    B. Test design creation
        a. Test design
    C. Build model
        a. Parameters
        b. Steps
        c. Descriptions

5. Evaluation
    A. Evaluate results
        a. Assessment of results with relation to the success criteria
    B. Process Review
        a. Review of the process
    C. Next steps
        a. List of possible next steps

6. Deployment
    A. Deployment plan
        a. Steps for deployment
    B. Maintenance plan
        a. Monitor and maintenance plan
    C. Final project report
        a. Final report/presentation
    D. Project review
        a. Experience documentation

CRISP-DM Process

As you can see, the CRISP-DM process follows a GIS workflow project very well. In fact, I have been producing many of these same elements as part of GIS project documentation for many years. The main difference between a GIS workflow and a data mining workflow is the model itself, however the methodology still applies. The other main difference is the exclusion of metadata.

Project planning and workflow documentation are great but let’s not forget about the data itself. In GIS the most basic form of documentation is metadata, which is simply data that provides information about data. How many times have you opened metadata in GIS data only to find a vast wasteland of empty metadata elements? I would venture to say more often than not. Metadata is hugely important to gaining an understanding about the data you are working with as well as providing an important resource for others. Metadata should always be included on the project checklist and those responsible for data quality control should be checking to make sure it is complete. To learn more about GIS metadata, check out https://www.fgdc.gov/metadata.

Leave a Comment

Your email address will not be published. Required fields are marked *