← Back to Portfolio
Writing Studio Analytics

Problem

University writing centers collect session data through platforms like Penji, but analyzing that data means handling FERPA-protected student records. Cloud-based tools introduce compliance risk. Manual analysis is slow and inconsistent. The Writing Studio needed a way to generate standardized analytics reports without student data ever leaving the machine.

Solution

A self-contained Streamlit application that:

Key Features

Report Generation

Privacy & Security

AI Chat Assistant

Analytics Engine

Over 1,300 lines of metrics calculations covering:

Tech Stack

Layer Technology Role
UI Streamlit Web interface, file uploads, tab navigation
Data pandas, NumPy, DuckDB Cleaning, transformation, fast SQL-like queries
Visualization Matplotlib, Seaborn, Altair Charts for PDF reports and interactive display
PDF ReportLab Multi-page report generation with embedded charts
Privacy cryptography, hashlib Anonymization and encrypted codebook
AI llama-cpp-python, Gemma 3 4B Local LLM inference, no external API calls
Distribution Embedded Python 3.11, batch scripts Portable package, no installation required

Architecture

The project follows a modular structure with clear separation of concerns:

app.py                          Entry point (Streamlit app, ~900 lines)

src/
  core/
    data_cleaner.py             10-step scheduled session cleaning pipeline
    walkin_cleaner.py           Walk-in session cleaning pipeline
    metrics.py                  Scheduled session analytics (1,384 lines)
    walkin_metrics.py           Walk-in specific metrics
    privacy.py                  PII detection, SHA256 anonymization, Fernet encryption
    location_metrics.py         Location-based analytics

  ai_chat/
    chat_handler.py             Orchestrates queries, validation, code execution
    llm_engine.py               Gemma model loading and inference wrapper
    safety_filters.py           Input validation + response filtering
    code_executor.py            Sandboxed pandas/DuckDB code execution
    query_engine.py             Natural language to SQL translation via DuckDB
    prompt_templates.py         System prompts built from data context
    setup_model.py              Model download and system requirements check

  visualizations/
    report_generator.py         Scheduled session PDF report (11 sections)
    walkin_report_generator.py  Walk-in session PDF report (8 sections)
    charts.py                   Reusable chart functions

  utils/
    academic_calendar.py        Semester detection from dates
					

Design Decisions

Local-only inference

Student data never leaves the machine. The LLM runs on CPU via llama-cpp-python rather than calling an external API, which eliminates FERPA concerns around data transmission.

Deterministic anonymization

SHA256 hashing produces the same anonymous ID for the same email across runs, so longitudinal analysis works without storing raw identifiers.

Graceful degradation

Every optional feature (AI chat, walk-in mode, GPU acceleration) disables itself cleanly if its dependencies are missing, rather than crashing. The app always works at its core: upload data, get a report.

Portable Python

The target user has no development tools and limited admin access on a university-managed machine. Embedding Python and all dependencies in the distribution folder avoids installation entirely.

Weighted safety scoring

Rather than a simple keyword blocklist, the AI input validator uses a scoring system (+2 for data-relevant terms, -3 for off-topic, -5 for harmful) so that a query mentioning both data terms and an incidental flagged word isn't falsely rejected.

Distribution

The app ships as a portable folder (~600 MB zipped, no model) that runs on any Windows machine without Python installed:

WritingStudioAnalytics/
  python/          Embedded Python 3.11 + all dependencies
  src/             Source code
  models/          Empty — user downloads AI model here (~3 GB, optional)
  app.py           Main application
  launch.bat       Double-click to start
					

The AI model is hosted on S3 and can be downloaded from within the app with a single button click and a progress bar. The full analytics and reporting features work without it.

Key Takeaways

This project demonstrates several critical technical capabilities:

  1. Privacy Engineering: Implementing robust PII detection, anonymization, and encryption to ensure FERPA compliance while maintaining data utility
  2. System Architecture: Designing a modular, maintainable codebase with clear separation between data processing, visualization, AI components, and utilities
  3. End-User Focus: Creating a solution that runs without installation or technical knowledge, addressing real user constraints and institutional environments
  4. Local AI Integration: Successfully integrating local LLM inference with safety systems, code execution capabilities, and graceful degradation patterns
  5. Analytics Depth: Building a comprehensive analytics engine with 1,300+ lines of metrics calculations covering diverse dimensions of writing center operations