← Back to Portfolio
ESG Automation System

Project Overview

Large enterprises manually process thousands of utility bills annually for ESG reporting, leading to high labor costs, human error, slow reporting cycles, and expensive AI-only solutions ($10-20 per 1,000 bills). This system automates the entire workflow from bill upload to GRI-compliant PDF reports, reducing processing time from hours to seconds and costs by 95%.

Key Performance Metrics

Technical Skills Demonstrated

AI Integration & Architecture

Data Engineering & Processing

Production Application Development

ESG & Compliance Domain Knowledge

Python & Software Engineering

System Architecture

The ESG Automation System uses an intelligent 3-tier extraction strategy that prioritizes cost-effectiveness while maintaining high accuracy:

Tier 1: Docling (Local Processing)

IBM's open-source document AI processes text-based PDFs locally at zero cost. Handles 85% of standard utility bills with 85-90% accuracy in 2-3 seconds.

Tier 2: Tesseract OCR (Local Processing)

Open-source OCR processes scanned/image PDFs locally at zero cost. Handles 10% of bills with 70-85% accuracy in 3-5 seconds.

Tier 3: Claude Vision API (Cloud Fallback)

Anthropic's Claude Vision API handles complex layouts when local methods fail. Processes 5% of bills at ~$0.01-0.02 per bill with 95%+ accuracy in 2-4 seconds.

Processing Pipeline

  1. Upload & Validation: PDF uploaded via Streamlit interface, validated for format and size
  2. Intelligent Routing: System selects optimal extraction tier based on document characteristics
  3. Data Extraction: Utility name, account number, billing period, kWh usage extracted with structured JSON output
  4. Quality Validation: Completeness checks, rate sanity verification, hallucination detection
  5. Emissions Calculation: EPA eGRID factors applied based on selected region
  6. Report Generation: GRI 305-2 compliant PDF with full methodology documentation

Business Impact

This system demonstrates practical application of AI to solve real enterprise problems:

Tools & Technologies

Key Takeaways

This project demonstrates several critical principles in production AI systems:

  1. Cost-Effective AI Architecture: Strategic use of local processing (free) before cloud APIs (paid) reduces costs by 95% while maintaining quality
  2. Intelligent Fallback Systems: Multi-tier extraction ensures reliability - when cheaper methods fail, more sophisticated (and expensive) methods take over
  3. Production-Ready Design: Comprehensive error handling, validation, and audit trails make this system enterprise-grade
  4. Domain Knowledge Integration: Understanding ESG compliance requirements (GRI standards, EPA factors) is as important as technical implementation
  5. User-Centric Development: Clean interface, real-time feedback, and clear documentation make complex AI systems accessible to non-technical users

Potential Enterprise Enhancements

If moving this system into production, key improvements would include: