← Back to CAIT Hermyon AI Logo Hermyon AI

Understanding Human Signals — Intelligent Emotion Recognition for Psychotherapy

Multimodal emotion recognition designed for clinical psychotherapy. Fusing face, voice, text, and posture through Mamdani fuzzy inference for transparent, explainable emotional assessment.

Try Live Demo →

Live Demo

Try Hermyon AI directly in your browser. Register with your email for a 60-second trial session.

Having issues? Open demo in a new tab →

3 Modalities
28 GoEmotions
14+ Fuzzy Rules
Session Tracking

Multimodal Perception

Three complementary modalities capture the full spectrum of emotional expression in therapy sessions.

🧑

Face — 7-Class Emotion

Facial expression analysis with 7 basic emotion categories (happy, sad, angry, fearful, surprised, disgusted, neutral). Real-time detection of micro-expressions and emotional transitions.

🎙

Voice — 8-Class Emotion

Speech emotion recognition across 8 categories. Analyzes prosody, pitch, energy, speaking rate, and pause patterns to capture emotional state from the acoustic signal.

📝

Text — 28-Class GoEmotions

Fine-grained text emotion analysis using the GoEmotions taxonomy. Maps patient speech transcripts to 28 nuanced emotion categories for deep emotional profiling.

Fuzzy Fusion Engine

Mamdani-style fuzzy inference system for transparent, interpretable multimodal emotion fusion.

🧩

Mamdani Inference

Classic Mamdani-style fuzzy system with linguistically interpretable rules. Each rule maps input modality confidence levels to fused emotional output.

📐

Trapezoidal Membership Functions

Input and output variables defined with trapezoidal membership functions (low, medium, high), providing smooth transitions and robust handling of uncertainty.

📋

14+ Fuzzy Rules

A carefully designed rule base of 14+ rules covering agreement, disagreement, and mixed-signal scenarios across all three modalities.

Why Fuzzy Logic?

In clinical settings, transparency is critical. Fuzzy inference provides full reasoning traces — clinicians can see exactly why Hermyon reached a particular emotional assessment. Every rule activation, membership degree, and defuzzified output is fully traceable and explainable.

Topic-Emotion Linking

Understanding what patients talk about and how they feel about it.

🔑

KeyBERT Extraction

Automatic keyword and keyphrase extraction from patient speech using KeyBERT. Identifies the core topics and themes discussed during sessions.

📊

BERTopic Clustering

Topic modeling with BERTopic groups related themes into coherent clusters, revealing the underlying structure of patient discourse.

🔗

Topic-Emotion Maps

Links detected topics to their associated emotions, building a rich map of which subjects trigger which emotional responses in each patient.

Longitudinal Tracking

Track emotional evolution across therapy sessions to identify patterns, progress, and regression.

📈

Session Evolution

Track emotional trajectories within individual sessions. Identify emotional peaks, valleys, and transitions to understand the therapeutic process.

📅

Patient Timeline

Cross-session analysis reveals long-term emotional patterns. Monitor treatment effectiveness and identify breakthrough or regression moments.

🔍

Pattern Detection

Automatic detection of recurring emotional patterns, topic sensitivities, and emotional volatility trends across the patient timeline.

Clinical Dashboard

A web-based real-time interface designed for clinicians.

🖥

Real-Time Display

Live emotion visualization during therapy sessions. Face, voice, and text emotion streams displayed alongside the fused output and confidence scores.

📊

Session Reports

Automated post-session reports with emotional summary, topic-emotion maps, key moments, and fuzzy rule activation traces for full transparency.

👤

Patient Profiles

Comprehensive patient profiles with full session history, emotional baselines, topic sensitivities, and longitudinal trend analysis.

28-Emotion GoEmotions Space

Fine-grained emotion taxonomy for deep clinical understanding.

Category Emotions
PositiveAdmiration, Amusement, Approval, Caring, Desire, Excitement, Gratitude, Joy, Love, Optimism, Pride, Relief
NegativeAnger, Annoyance, Disappointment, Disapproval, Disgust, Embarrassment, Fear, Grief, Nervousness, Remorse, Sadness
AmbiguousConfusion, Curiosity, Realization, Surprise
NeutralNeutral

Architecture Overview

Input streams (video, audio, text transcript) are processed by modality-specific models (CNN/ViT for face, Wav2Vec2 for voice, GoEmotions-RoBERTa for text). Confidence vectors feed into the Mamdani fuzzy inference engine, which produces fused emotion assessments with full reasoning traces. Results flow to the clinical dashboard and longitudinal storage for cross-session analysis.