AI Email Assessment Demo - Case Study

At a Glance

Project Summary

Situation

Many digital assessments still depend on multiple-choice questions, recall checks, or static answer logic. Those formats are efficient, but they rarely reflect how people actually communicate or perform at work.

Task

I'm building a proof of concept for an AI-enabled assessment product that explores a different model: authentic workplace tasks, visible rubric criteria, optional drafting support, and structured feedback that feels more like coaching than grading.

Action

The current prototype asks learners to write an all-staff phishing warning email in response to a workplace scenario. They can review the same rubric the evaluator uses, optionally access drafting support, submit their response for analysis, and receive criterion-level feedback plus a suggested rewrite.

Result

What this proof of concept is demonstrating is a credible alternative to traditional pre-configured assessment logic: authentic learner performance, transparent scoring criteria, traceable AI behavior, and a reviewer-facing system trace that makes the product direction easier to inspect and discuss.

Overview

Project Background

This project starts from an instructional design problem rather than a technology-first brief. In many workplace learning contexts, assessments still over-index on multiple choice, short answer recall, or static "correct answer" logic. Those formats are easy to scale, but they often fail to capture whether someone can produce a real workplace artifact under realistic constraints.

I'm exploring a better pattern: what if the learner completed an authentic communication task, and AI was used not to replace the instructional design, but to operationalize a visible rubric, return criterion-level feedback, and make open-ended practice more scalable?

The use case I'm using is intentionally narrow and credible: writing an all-staff phishing warning email. It is a realistic workplace task, quick to understand, and rich enough to evaluate clarity, tone, structure, and action guidance.

The Challenge

Authenticity, Transparency, and Support in Tension

This is not simply a UI exercise or a generic "AI feedback" demo. The core challenge is designing an assessment experience that balances three competing needs. The design problem is not "how do I add AI?" — it is "how do I structure AI so it strengthens the assessment experience without distorting what the assessment is supposed to measure?"

Design Tension

Design Response

Authenticity — the learner needs to write a realistic workplace artifact, not answer a disguised quiz

Scenario-based writing task (all-staff phishing warning email) with a plausible context and clear performance expectations — not a pre-programmed "correct answer"

Transparency — if AI is going to score the response, the scoring logic cannot be mysterious

Learner sees the same rubric the evaluator uses before writing. The rubric is the explicit source of truth — not hidden rules, not model judgment

Support — AI drafting support can undermine the assessment if coaching and scoring are blended together carelessly

Deliberately split into two separate workflows: an Assist flow (coaching, drafting) and an Evaluation flow (rubric-based scoring). Coaching can be helpful and flexible without contaminating the scoring logic

My Approach

Assessment-First, Product-Minded Design

I'm approaching this as both an instructional design problem and a product concept — starting from what the learner needs to demonstrate, then building the AI layer to serve that structure.

01

Start With the Performance Task

I'm defining the assessment around a concrete workplace output: an all-staff phishing warning email. From there, I'm clarifying the learner objective, task requirements, expected strengths, and expected weaknesses before treating UI or prompting as the main problem.

02

Engineer the Rubric First

Instead of treating feedback as a loose AI text-generation problem, I'm using the rubric as the backbone of the system. That gives the product a stable scoring frame and makes the learner-facing expectations explicit:

Understanding of Email Communication
Appropriate Response to the Situation
Advice on Phishing
Presentation and Writing Style

03

Separate Coaching From Evaluation

I'm deliberately splitting the system into two workflows: an Assist flow (outline, improve draft, and full draft modes) and an Evaluation flow (scores the final learner response against the visible rubric). That separation matters instructionally. Coaching support can be helpful and flexible without contaminating the scoring logic.

04

Make the AI Traceable

I'm designing the AI layer to be inspectable rather than magical. That helps the current prototype feel less like speculative UI and more like a serious systems concept:

Server-side model calls only
Structured JSON outputs for assist and evaluation
Schema validation against defined response contracts
Retry handling for invalid model responses
Attempt persistence with prompt version, model name, and timestamp

Frameworks & Theories Applied

Rubric-Driven Assessment Design Authentic Performance Tasks Structured AI Output Design Kolb's Experiential Learning Cycle Knowles' Andragogy Deliberate Practice (Ericsson)

The Solution

A Working Prototype for an AI-Enabled Assessment Product

The current prototype is a single-scenario web application designed to feel polished enough for a portfolio reviewer to experience directly while still clearly reading as an early product direction.

Learner Flow
                    Open the assessment and read a short workplace scenario
Review the visible rubric
Draft an all-staff phishing warning email in a clean response editor
Optionally use AI drafting support
Submit the response for analysis
Receive criterion-level scores, overall feedback, strengths, improvements, and a suggested rewrite

                

The reviewer flow matters just as much. A separate System Trace view exposes saved attempts, total scores, model name, prompt version, and timestamps — so the concept can be discussed as a product system rather than just a surface-level interface.

The prototype remains intentionally narrow: one assessment scenario, one visible rubric, one end-to-end learner flow, one reviewer trace view. Rather than simulating an entire LMS or enterprise assessment platform, the focus is on making one scenario feel coherent, inspectable, and believable.

Design Decisions

Critical Choices Shaping the Product Direction

Visible Rubric as the Source of Truth

The learner sees the same rubric the evaluator uses. This is the most important design decision in the project. It improves transparency, gives the learner a fair frame for the task, and prevents the AI from being positioned as a mysterious scoring authority.

Authentic Writing Task Over Multiple Choice

I'm using a workplace email instead of a quiz because the product direction is stronger when the learner produces a real artifact. The task is more representative of workplace communication and creates space for better feedback than a pre-authored answer key would allow.

Separate Assist and Analyze Modes

The product distinguishes between help me draft and analyze response. That separation makes the learning experience more credible. Coaching can support the learner, while final analysis remains bounded by the rubric and the submitted text.

Structured Outputs Over Loose AI Prose

Rather than accepting free-form model output, the system requires structured JSON, validates it against schemas, retries if needed, and computes the total score server-side from criterion scores. This makes the prototype more reliable and easier to explain in both product and technical terms.

System Trace Instead of Hidden Plumbing

I'm using a reviewer-facing trace view to make the system legible. In portfolio work, this matters: it shows not just the learner UI, but the underlying logic, saved attempts, and traceability of the AI workflow.

Impact

What the Current Prototype Is Proving

Because this is an active proof of concept, I'm framing impact in terms of product validation rather than business KPIs.

Instructional Value

The current prototype is showing that AI can support a more authentic performance task without abandoning clarity or structure. Learners are asked to produce a realistic workplace artifact, guided by a visible rubric, and receive feedback that is more specific and educational than a typical auto-graded quiz response.

Product Value

This product direction is currently validating a plausible pattern for AI-enabled assessment:

Visible criteria instead of hidden scoring logic
Separate coaching and evaluation flows
Structured and traceable model outputs
Persistence that supports reviewer trust and future analytics

Portfolio Value

For hiring managers and collaborators, the prototype makes several capabilities concrete:

Translating an instructional problem into a product concept
Designing a rubric suitable for AI-supported evaluation
Shaping prompts and system rules around instructional intent
Building a narrow but credible MVP instead of an over-scoped concept deck

Technical Validation

The deployed prototype is already proving that the concept can operate end to end:

Learner input is captured
AI assist and evaluation happen server-side
Model outputs are schema-validated
Attempts are stored in Supabase
Reviewer-facing trace data is available for inspection

Reflection

What I'm Learning While Building It

This project keeps reinforcing that AI in learning design is most useful when it is tightly bounded by instructional intent. The value is not coming from adding a chatbot or generating generic feedback. It comes from defining a real performance task, writing a transparent rubric, and designing the AI layer to support that structure rather than bypass it.

It is also reinforcing the importance of scope discipline. Building one scenario well is a stronger move than pretending to have a full assessment platform. The narrow scope creates space to think carefully about prompt behavior, error handling, reviewer trust, and learner experience.

The work is also surfacing important limitations. This should not be positioned as a high-stakes scoring engine. Reliability, calibration, fairness review, and human moderation patterns would all matter much more in a production or certification context than they do in an early-stage proof of concept.

As I continue building the product direction, the next steps are likely to include:

Additional workplace scenarios
Side-by-side human and AI scoring comparisons
Instructor review or override
Revision-quality analytics across multiple attempts
Richer reviewer analytics around scoring consistency and feedback patterns

Proof of Concept for an AI-Enabled Assessment Product