Sundre Forest Products Computer Model

the model picks out very fine details from the terrain
The model picks out very fine details from the terrain
a continuous prediction model allows planners to make very sophisticated estimates of the archaeological costs associated with developing any site
A continuous prediction model allows planners to make very sophisticated estimates of the archaeological costs associated with developing any site
close-up showing the superb resolution of the model
Close-up showing the superb resolution of the model

THE TASK

Enable Sundre Forest Products to predict areas within their FMA in which Historical Resources Impact Assessment costs are likely, so that they can be avoided, by developing a predictive computer model based on known sites.

SPECIAL CONSIDERATIONS

An unique and exciting aspect of this project was the availability of extremely high resolution LiDAR elevation data for the entire area. We also had detailed site data for about 200 known historical sites within the modelling area. Finally, the area to be modelled was huge; the FMA is about 553,000 ha (5,530 km2). This all translates into large files and long computational times.

OUR SOLUTION

We followed the best practices of model building by dividing our known sites into testing and training sets. We used the training set to build the model and then we tested its predictive power by seeing how many of the known sites in the testing set the model could predict on its own.

We decided to build our model with 10 meter by 10 meter cells. Given the accuracy of the LiDAR data we thought we could really capture some fine features in the model that would be lost with 30 or 60 meter grids. This decision resulted in over 141 million cells for each variable and very time consuming computations!

Many modellers use multiple regression to make predictions but it is not an ideal method for modelling because it assumes all of the input variables are independent. Many modellers ignore this important assumption and use multiple regression anyway even though they know their input variables are not independent. We decided to take advantage of some exciting tools from the field of machine learning and employ a "support vector machine" algorithm to make our predictions. One major advantage of support vector machines is that they do not require or assume the inputs are independent. The other advantage of using a support vector machine is that it makes really accurate predictions with our data.

As part of the modelling process, we started with over 25 independent variables that we thought would help us predict the location of unknown sites. And as the model evolved, we removed variables that didn't help predict sites.

Our model met its performance goals after only one month of development. We decided to assess it's performance during the 2012 field season and make refinements as necessary.

RESULTS

Our final model covers 10% of the land area and is able to predict unknown sites (from the testing set) at 80% accuracy.

The 10 meter cell size and the high-resolution LiDAR data helped us build a model that picks out areas of high potential for containing historical resources without blanketing huge areas of the map as high potential.

While the model does like to predict high potential areas near major streams, it doesn't just blanket everything within x meters of water as high potential. Our model is much more nuanced than that. It takes slope, solar irradiation, vertical distance to water, local terrain roughness, and several other factors into account when making its predictions.

We delivered a binary predictive model to Sundre Forest Products (the cell is either high potential or not). But our software actually produces continuous predictions from 0 to 100%. A skilled and experienced archaeologist such as Kurtis can use this additional information to make very fine-grained decisions while making assessments to reduce costs for our client.

Sundre Forest Products began using our model in the summer of 2011 to guide decision making and reduce costs associated with actual archaeological field testing and assessment.

This model is just a first step in Sundre Forest Products' quest to reduce costs associated with archaeology. They plan to have us refine the model in an iterative fashion as we discover more sites, the performance of computer hardware improves, and modelling techniques advance and mature.

model program code

MODEL FEATURES

  • only archaeological predictive model we could find that is built using a "Support Vector Machine" machine learning algorithm
  • 10 meter cells when other models are commonly using 30 or 90 meter cells
  • the use of high-resolution LiDAR elevation data
  • we wrote our own software to:
    • move data between our GIS software and our support vector machine software
    • perform data transformations
    • make optimizations
    • perform data validation checks
    • automate repetitive workflows
    • and to optimize the model
  • because of the large computational demands of this project we made our software utilize all the cores in a multi-core computer to drastically reduce computational times

MODEL VARIABLES

  • proximity to major hydrology
  • proximity to minor hydrology
  • vertical distance to major hydrology
  • slope
  • total solar irradiation on the shortest day of the year
  • total solar irradiation on the longest day of the year
  • relative elevation
  • local terrain roughness