Phase 4 Preview

Retrieval PoC: Document Generation

AI-powered sustainability report generation using template-driven content hydration, Workiva data retrieval, and conversational editing via Microsoft 365 Copilot.

Overview

The Retrieval PoC enables analysts to generate complete sustainability report sections through a conversational interface. Using a declarative template language, the system dynamically populates content by combining LLM-generated narratives with live data from Workiva factbooks.

Template-Driven

LaTeX templates with embedded tags for dynamic content injection

Workiva-Connected

Direct cell references hydrate templates with factbook data

Copilot Interface

Natural language commands via Microsoft 365 Copilot chat

Template Language Specification

The template language uses a tag-based syntax embedded in LaTeX documents. Tags are resolved at generation time, with content either retrieved from Workiva or generated by the LLM based on specification files.

Tag Syntax Reference

Tag Pattern Type Description
{@content:section-id} LLM Generated Generates content using the spec file at /specs/section-id.yaml
{@content:section-id.subsection} LLM Generated Nested content generation with hierarchical spec lookup
{$workiva:sheet_name:cell_ref} Data Reference Retrieves live value from Workiva spreadsheet cell
{$workiva:sheet_name:range} Data Reference Retrieves a range of cells (e.g., A1:D10)
{#if condition}...{/if} Conditional Conditionally include content based on Workiva values
{#each range as item}...{/each} Iterator Iterate over Workiva rows to generate repeated sections

Example: Environmental Section Template

templates/environmental-section.tex
% VSME Environmental Section - Generated Template
\section{Environmental Disclosures}

% Pull company name from Workiva factbook
\subsection{About {$workiva:Factbook:B2}}

% LLM-generated introduction based on spec file
{@content:env-intro}

% Climate transition plan status from factbook
\subsection{Climate Transition Status}
{#if {$workiva:C3:G4} == "TRUE"}
    The undertaking operates in high-impact sectors and has adopted
    the following transition measures:

    {@content:transition-plan.adopted}
{/if}

{#if {$workiva:C3:G4} == "FALSE"}
    {@content:transition-plan.not-applicable}
{/if}

% KPI display with live data
\begin{kpicard}
    \kpivalue{{$workiva:Metrics:C12}}
    \kpilabel{GHG Emissions Reduction}
    \kpichange{{$workiva:Metrics:D12} YoY}
\end{kpicard}

% Pollutant emissions table - iterate over data rows
\subsection{Pollutant Emissions}
{#each {$workiva:B4:B5:B23} as row}
    \emissionrow{{row.pollutant}}{{row.air}}{{row.water}}{{row.soil}}
{/each}

Content Specification System

Each {@content:id} tag is backed by a YAML specification file that defines the generation parameters. Specs include style guidelines, VSME framework references, tone, and context.

specs/env-intro.yaml
id: env-intro
type: narrative
vsme_reference: B1-B7
paragraph_guidance: "29-56"

style:
  tone: professional, formal
  voice: third-person
  length: 150-250 words
  brand_guidelines: ./brand/kpmg-style.yaml

context:
  company_name: {$workiva:Factbook:B2}
  reporting_year: {$workiva:Factbook:B5}
  sector: {$workiva:Factbook:B8}

prompt: |
  Write an introductory paragraph for the Environmental section of a
  VSME sustainability report. The paragraph should:
  - Introduce the company's environmental commitment
  - Reference the reporting period and framework (EU VSME)
  - Set the context for detailed disclosures that follow
  - Maintain consistency with prior year reports if available

cache:
  enabled: true
  invalidate_on:
    - workiva:Factbook:B2  # Company name change
    - workiva:Factbook:B5  # Reporting year change

Specification Directory Structure

Project Structure
specs/
├── env-intro.yaml                 # Environmental introduction
├── transition-plan/
│   ├── adopted.yaml              # For companies with plans
│   └── not-applicable.yaml       # For non-applicable cases
├── emissions/
│   ├── scope1.yaml
│   ├── scope2.yaml
│   └── scope3.yaml
├── pollutants/
│   └── summary.yaml
└── brand/
    └── kpmg-style.yaml           # Shared style guidelines

templates/
├── environmental-section.tex
├── social-section.tex
└── governance-section.tex

cache/
└── content_cache.json           # Generated content cache

Workiva Data Integration

The system reads directly from Workiva factbooks using the existing API integration. Cell references follow a simple addressing scheme that maps to the VSME framework structure.

Sample Workiva Factbook Structure (C3 - Climate)

Key
Framework
Instruction
Value
C3-1-4
C3
High impact sector?
TRUE
C3-1-5
C3
Transition plan status
Not adopted...
C3-1-6
C3
Plan description
Freight transport...
C3-1-7
C3
Adoption date (Year)
2025

Cell Reference Syntax

Reference Returns Example
{$workiva:C3:G4} Single cell value "TRUE"
{$workiva:C3:G4:G10} Range (array) ["TRUE", "Not adopted...", ...]
{$workiva:B4:B5:B23} Row iterator Pollutant data rows for {#each}
{$workiva:Factbook:company_name} Named field Uses mapper.yaml field definitions

Document Generation Flow

1
User Request

Analyst requests section via Copilot

2
Template Load

System loads LaTeX template

3
Tag Resolution

Parse and identify all tags

4
Data Fetch

Retrieve Workiva cell values

5
Content Gen

LLM generates narrative

6
Compile

Build PDF output

Caching Strategy

Microsoft 365 Copilot Interface

The primary interface for document generation is the Microsoft 365 Copilot chat. Analysts use natural language commands to generate, edit, and refine report sections.

Example Copilot Interactions

Copilot Chat Session
Analyst: Generate the environmental section for the 2024 report

Copilot: I'll generate the Environmental section using the current Workiva
         factbook data. Let me fetch the latest values...

         Reading from Workiva:
         - Company: Acme Industries Ltd
         - Reporting Period: Jan 1 - Dec 31, 2024
         - High Impact Sector: Yes
         - Transition Plan Status: Adopted with targets

         Generating content...

         Here's your draft Environmental section. Key highlights:
         - Introduction paragraph (150 words)
         - Climate transition status with targets
         - 3 KPI cards with live metrics
         - Pollutant emissions table (12 rows)

         [Preview PDF] [Edit in Word] [Approve & Save]

---

Analyst: The tone in the introduction is too informal. Make it more formal.

Copilot: I'll regenerate the introduction with a more formal tone while
         keeping the same factual content.

         Updated introduction:
         "Acme Industries Ltd hereby presents its environmental disclosures
         in accordance with the EU Voluntary SME Sustainability Reporting
         Standard (VSME) for the fiscal year ended December 31, 2024..."

         [Accept Changes] [Try Another Version]

---

Analyst: Add a section about our water usage from the B4 data

Copilot: I'll add a water usage subsection. Let me check what data is
         available in the B4 (Pollutants) section of your factbook...

         Found water-related data:
         - B4-6 through B4-12: Water pollutant emissions
         - Metrics available for Nitrogen, Phosphorus, BOD

         [Generate Water Section] [Show Available Data]

Supported Commands

Command Pattern Action
"Generate [section] for [year]" Creates full section from template with current Workiva data
"Show me the data for [framework code]" Displays relevant Workiva factbook values
"Regenerate [content block]" Forces fresh LLM generation, ignoring cache
"Make it more [tone]" Adjusts style/tone of most recent generated content
"Add [topic] from [data source]" Incorporates additional Workiva data into section
"Preview as PDF" Compiles current draft to PDF for review

System Architecture

User Layer
M365 Copilot
Word/PDF Preview
Processing
Template Engine
Tag Resolver
Content Cache
LaTeX Compiler
Generation
Azure OpenAI
Spec Loader
Style Enforcer
Data Layer
Workiva API
Factbook Reader
Cell Mapper

Technical Reference

Implementation Base

The PoC builds on the existing document generation prototype at experiments/test_doc_gen.py. Key components to extend:

API Endpoints Required

New API Routes
POST /api/generate/section
  Body: { template: "environmental", year: 2024 }
  Returns: { pdf_url, word_url, content_hash }

POST /api/generate/regenerate
  Body: { content_id: "env-intro", force: true }
  Returns: { new_content, previous_version }

GET  /api/workiva/cells/{sheet}/{range}
  Returns: { cells: [...], last_updated }

POST /api/preview/pdf
  Body: { template, data_overrides }
  Returns: { pdf_url, warnings }

Ready to Build the PoC?

This specification outlines Phase 4 capabilities. Implementation follows the existing architecture patterns.