Skip to content

joshbrew/Agroforestry_Suitability_Mapping_with_iNaturalist

Repository files navigation

iNaturalist + global GIS species suitability mapping with eta-squared and XGBoost/ExtraTrees

See preliminary results below.

This software lets you make individual and combined suitability maps for any species captured in the iNaturalist dataset, by pulling a rich set of climate, geological, and topological data from other freely available datasets. This lets you map possible habitat ranges for plants, the idea being you may want to develop land with many kinds of species, e.g. for permaculture or industrial agroforestry.

ChatGPT Image Mar 20, 2026, 01_46_45 AM

This project requires ~1 TB of data to run yourself:

  • bulk global iNaturalist species occurrences via GBIF.org
  • Terraclimate 12 month global climate summaries (latest or multi-year)
  • SoilGrids 2.0 global grids
  • Global TWI
  • Global DEM via HydroSHEDs + derived tiles (scripts supplied)
  • Global GLiM lithographs.
  • MCD12Q1 land use classification map (latest)

This system is memory-optimized to run on laptops, I did it with a USB drive to make it extra slow for myself and focus on optimization, so much of the data requires transforming into better COG TIF format or indexing for quick CSV lookup. All scripts are provided, including an enrichment script that lets you define any taxa levels from iNaturalist and enrich cumulative csvs to run through our suitability mapping programs. This all needs documentation or you can feed files into a good LLM to get the workflow spelled out for you.

Docs TBD,

Look at Collection.md and each folder for how to start gathering.

Look at iNaturalistOccurrences and Suitability folders for how to start processing that data, lots of assembly required.

Initial Results

Note our results are using an "artist's touch" to manually adjust eta-squared priors and manually control the blending between the ML and eta-squared results. The actual realism is species-dependent and requires more survey data, but this is already a great result running on fairly naive assumptions.

Fairbanks North Star Borough, White Spruce (green) and Poplar/Cottonwood (red) blended suitability map mixed in QGIS. The more limited poplar range is likely due to sparsity of iNaturalist records as they are found everywhere in town. raster Where you find good mixes of these, you find one of the best edible mushrooms :P We validated parts of this map by asking local foragers about their previous season. Not bad for a shot in the dark.

Blending eta-squared with the classifier probabilites gives the most convincing results, e.g. the deserts truly are not suitable for most of the plants here: image

Eta-squared style empirically-weighted suitability scoring, with stress and reliability modifiers, compared to actual habitat ranges. We next added a machine learning habitat classifier to blend with this. image image image image

Initial results overlap well with known habitat, using well known Oregon species as our test case. Future results will show "agroforestry" profiles where we have valid overlap for dozens of useful cultivatable species.

XGBoost/ExtraTrees result image

Model performance: image The low F1 scores here are more due to the extrapolating rather than the original classification accuracy, as we deliberately are weakening it to get a larger suitability area.

Leaky XGBoost model that overtunes around the actual observation sites (minus coordinates), useful for blending better from occurrence data ground truth: image This was our first attempt but it didn't do background sampling correctly, however it still has some usefulness as a narrower model.

Leaky model (overtunes around observation sites), very high F1 due to less extrapolation: image

We also created a community model on top of this to look at multi-species probabilities, we are still testing it.

Numeric comparisons: image image Raw occurrence data, you'll see that the ML model has very strong overlap with the clusters here: image

Pseudotsuga menziesii (Douglas Fir) is the most generalized, Alnus Rubra (Red Alder) likes the hills, Arbutus menziesii (Pacific Madrone) favors the savannahs in the Willamette valley, and Kopsiopsis strobilaceae is parasitic to the Pacific Madrone, but found only in Southwestern Oregon and Northwestern California. Overall we get great overlap with their actual ranges and habitat preferences, with distinct hill and valley favoring species, and the parasitic species occupying a subsection of the wider Madrone range.

Contribute

This is free to use for any reason under an MIT License. It's not trivial to set up by any means but we may streamline it more as we continue exploring this framework. If you find any interest in using or improving it, feel free to fork or contribute to this repository. We're all in this together!

About

Map iNaturalist observations to global climate, geology, and topography data and see combined species habitat or cultivation potential.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors