ERS Cooper Trade Data Automation

A comprehensive data automation and harmonization pipeline developed to support empirical and scenario-based analysis of how Free Trade Agreements (FTAs) and tariff changes influence U.S. agricultural exports.

This project was developed as part of a research effort connected to an ERS/USDA cooperative research project and later refined to support a graduate thesis on the impact of FTAs on U.S. agricultural trade. The work integrates international trade, macroeconomic, and policy datasets from multiple global sources into a structured and reusable research workflow.

Project Overview

The purpose of this repository is to automate the collection, cleaning, standardization, and merging of country-level trade reference data from multiple international sources. These automated scripts reduce manual data handling, improve reproducibility, and create a reliable input pipeline for downstream trade and policy analysis.

The project supports a broader analytical framework used to:

quantify the impact of U.S. Free Trade Agreements on U.S. agricultural exports,
evaluate how tariff changes influence export performance,
integrate macroeconomic, trade, and policy variables into a unified dataset,
support gravity-model estimation and tariff scenario analysis.

Research Background

The original research proposal focused on quantifying trade creation and trade diversion due to Free Trade Agreements, with particular attention to U.S. FTAs and their effect on agricultural commodities.

As the research developed, the final project evolved into a broader and more applied framework centered on:

automated trade-data integration,
FTA and tariff impact analysis,
scenario-based modeling of U.S. agricultural exports.

The final thesis framework examined how trade agreements, tariffs, distance, GDP, exchange rates, and other trade-related factors shape export outcomes. In addition to supporting empirical estimation, the automation pipeline created a repeatable process for integrating data from diverse international institutions.

Key Objectives

Build an automated framework for collecting country and trade reference data from multiple international sources.
Standardize country names and ISO codes across inconsistent datasets.
Reduce manual data collection errors in international trade research.
Produce harmonized outputs that can be used in econometric and scenario analysis workflows.
Support the analysis of how FTAs and tariff changes affect U.S. agricultural exports.

Data Sources

The automation workflow uses data from multiple sources, including:

UN Comtrade
USDA Foreign Agricultural Service (FAS)
USDA Production, Supply and Distribution (PSD)
World Bank
WITS (World Integrated Trade Solution)
IMF DataMapper
FAO / FishBase
U.S. Census Bureau
CEPII GeoDist Database
WTO RTA Database

Repository Structure

ERS-Cooper-Trade-Data-Automation/
│
├── README.md
├── requirements.txt
├── .gitignore
│
├── 01_country_reference/
│   ├── README.md
│   ├── config.py
│   ├── fetch_un_comtrade.py
│   ├── fetch_fas_psd.py
│   ├── fetch_worldbank_wits.py
│   ├── fetch_imf_reporters.py
│   ├── fetch_fao_census.py
│   └── merge_country_datasets.py
│
├── 02_trade_flows/
│   ├── README.md
│   ├── fetch_wits_imports.py
│   ├── fetch_wits_exports.py
│   ├── fetch_wto_trade_data.py
│   ├── fetch_fas_agricultural_exports.py
│   ├── fetch_fas_gats_exports.py
│   ├── fetch_worldbank_agriexports.py
│   ├── fetch_worldbank_agrimports.py
│   └── fetch_trademap_exports.py
│
├── 03_commodities/
│   ├── README.md
│   ├── fetch_fas_commodities.py
│   ├── fetch_wits_commodities.py
│   └── process_fas_commodity_groups.py
│
├── 04_gravity_variables/
│   ├── README.md
│   ├── gdp/
│   ├── exchange_rate/
│   ├── population/
│   └── geography/
│
└── 05_trade_policy/
    ├── README.md
    ├── tariffs/
    └── trade_agreements/

Module Descriptions

File	Description
`config.py`	Central configuration file for API keys, environment variables, and global settings such as timeouts.
`fetch_un_comtrade.py`	Extracts and cleans M49-based country reference data from the UN Comtrade source.
`fetch_fas_psd.py`	Connects to USDA FAS and PSD APIs to retrieve and standardize country reference data.
`fetch_worldbank_wits.py`	Processes World Bank JSON responses and WITS XML data for country-level integration.
`fetch_imf_reporters.py`	Retrieves country data from IMF DataMapper and UN Comtrade reporters reference files.
`fetch_fao_census.py`	Uses headless Selenium automation to scrape country data from FAO FishBase and the U.S. Census Bureau.
`merge_country_datasets.py`	Merges all generated outputs into a harmonized country reference dataset and Excel export.

Workflow Summary

The automation workflow follows these general steps:

Collect country reference data from APIs, websites, and structured online sources.
Clean the extracted data by selecting relevant fields and standardizing variable names.
Normalize ISO country codes and naming conventions across datasets.
Merge the cleaned source files into a unified harmonized dataset.
Export final outputs for use in downstream trade, econometric, and scenario analysis.

Professional Features

This repository was designed to reflect a professional and research-oriented data engineering workflow.

Highlights

Modular script design for maintainability and clarity
Environment-based configuration for secure API key handling
Error handling to prevent source-specific failures from breaking the entire pipeline
Headless browser automation for websites that do not offer direct APIs
Data harmonization logic to reconcile inconsistent country naming conventions
Reusable outputs suitable for econometric modeling and policy analysis

Installation

Install the required dependencies with:

pip install -r requirements.txt

If you do not yet have a requirements.txt, the main dependencies used in this project include:

pip install requests pandas selenium tabulate lxml openpyxl

Configuration

Some scripts require an API key for USDA FAS/PSD access.

Set the API key as an environment variable:

export FAS_API_KEY="your_key_here"

On Windows PowerShell:

$env:FAS_API_KEY="your_key_here"

The scripts are designed to read this key through the configuration module.

How to Run

You can run each fetcher individually to update specific sources, then run the merge script to create the final harmonized output.

Examples:

Step	Command	Purpose
1	`python 01_country_reference/fetch_un_comtrade.py`	Fetches and prepares UN Comtrade country reference data.
2	`python 01_country_reference/fetch_fas_psd.py`	Retrieves and standardizes USDA FAS and PSD country reference data.
3	`python 01_country_reference/fetch_worldbank_wits.py`	Processes World Bank and WITS country-level reference data.
4	`python 01_country_reference/fetch_imf_reporters.py`	Retrieves country data from IMF DataMapper and UN Comtrade reporter references.
5	`python 01_country_reference/fetch_fao_census.py`	Scrapes and prepares country data from FAO FishBase and the U.S. Census Bureau.
6	`python 01_country_reference/merge_country_datasets.py`	Merges all cleaned source outputs into the final harmonized dataset.

Outputs

Typical outputs generated by this workflow include:

cleaned country reference CSV files
harmonized merged datasets
Excel exports for research use
standardized ISO-based lookup tables for downstream analysis

These outputs are intended to support broader trade research workflows involving:

gravity-model estimation
tariff scenario analysis
FTA impact evaluation
agricultural export analysis

Research Relevance

This repository contributes to research on international trade policy by creating a reproducible data preparation workflow for studying the relationship between:

Free Trade Agreements
tariffs
macroeconomic conditions
U.S. agricultural export performance

The automation pipeline improves research efficiency and helps reduce errors associated with manual international dataset collection and integration.

Thesis Connection

This repository supports a larger research project titled:

Quantifying the Impact of Free Trade Agreements on U.S. Agriculture Exports: A Data-Driven Scenario Analysis

The final analytical framework emphasized:

the positive role of FTAs in promoting exports
the negative effect of tariffs and distance on trade flows
the importance of automated data integration for reliable research outputs

Future Improvements

Potential next steps for the repository include:

adding automated logging with Python’s logging module
converting output generation into a configurable pipeline
adding unit tests for data validation
building commodity-level modules for imports, exports, tariffs, GDP, and exchange rates
integrating notebook-based or dashboard-based result visualization

Author

Douglas Akwasi Kwarteng
Graduate Researcher | Cybersecurity & Data Automation
Master’s Research in Agribusiness, International Trade, and Applied Data Analysis

License

This repository is intended for academic, research, and professional portfolio purposes unless otherwise specified.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ERS Cooper Trade Data Automation

Project Overview

Research Background

Key Objectives

Data Sources

Repository Structure

Module Descriptions

Workflow Summary

Professional Features

Highlights

Installation

pip install -r requirements.txt

pip install requests pandas selenium tabulate lxml openpyxl

Configuration

Set the API key as an environment variable:

export FAS_API_KEY="your_key_here"

$env:FAS_API_KEY="your_key_here"

How to Run

Outputs

Research Relevance

Thesis Connection

Future Improvements

Author

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
01_country_reference		01_country_reference
02_trade_flows		02_trade_flows
03_commodities		03_commodities
04_gravity_variables		04_gravity_variables
05_trade_policy		05_trade_policy
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

ERS Cooper Trade Data Automation

Project Overview

Research Background

Key Objectives

Data Sources

Repository Structure

Module Descriptions

Workflow Summary

Professional Features

Highlights

Installation

pip install -r requirements.txt

pip install requests pandas selenium tabulate lxml openpyxl

Configuration

Set the API key as an environment variable:

export FAS_API_KEY="your_key_here"

$env:FAS_API_KEY="your_key_here"

How to Run

Outputs

Research Relevance

Thesis Connection

Future Improvements

Author

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages