Back to Blog

The Hidden Entity Layer of ChatGPT: From Named Entities to Products

A deep dive into ChatGPT’s hidden semantic layer. By analyzing its SSE streams, we uncover how OpenAI’s web client structures entities, moderates outputs, and connects to a product graph that mirrors Google Shopping feeds.

By Andrea Volpini

November 3, 2025

—

8 min read

entities in this article

What Lies Beneath ChatGPT’s Interface

In my previous exploration, The OpenAI Emerging Semantic Layer, I started to review how GTP-5 organizes knowledge beneath the surface, not through keywords, but through entities.

This time, I took a closer look at the Server-Sent Event (SSE) streams that power ChatGPT’s responses, inspired by the insights shared by the team at PromptWatch.
By recording, parsing, and analyzing these real-time data flows, I uncovered a hidden layer of entity infrastructure that extends far beyond language understanding, now encompassing products, organizations, people, and even moderation logic.

How ChatGPT Streams Knowledge: The SSE Architecture

ChatGPT’s web interface doesn’t operate like an API call that returns a block of text.
Instead, it maintains a persistent streaming connection with the server via the text/event-stream protocol, known as Server-Sent Events (SSE).

Every time ChatGPT “types,” the client receives a flow of structured events.
Each event contains not only the visible tokens of the reply but also a rich metadata payload.

First insight: the payload returned by the OpenAI APIs differs from the metadata structure available in the ChatGPT WebUI; and it also varies between free and paid accounts.

A simplified view of the process:

Phase	Endpoint	Purpose
1️⃣ Moderation	`/backend-api/moderations`	Checks for policy violations before generation
2️⃣ Generation Stream (SSE)	`/backend-api/conversation`	Streams assistant responses and internal state
3️⃣ Output Moderation	`/backend-api/moderations`	Re-evaluates AI output for compliance

Unlike the public API (/v1/chat/completions), which streams only plain text tokens, the private web interface streams complex JSON objects — containing message IDs, metadata, content types, and hidden annotations.

This is where the “semantic layer” lives.

The Method: Reverse-Engineering the Conversation Stream

I built a custom Playwright-based recorder in Python to intercept and log ChatGPT’s internal /backend-api/conversation SSE stream.

Each event is captured as raw text, decoded, and stored for analysis.
Using pattern recognition and entity extraction, I was able to identify multiple layers of structured data hidden in these streams, including:

System Entities: Internal identifiers such as message_id, conversation_id, and end_turn, used by the React frontend.
Moderation Entities: Safety classifications (e.g., self-harm, violence, sexual) retrieved from the moderation endpoint.
Named Entity Recognition (NER) Layer: Lightweight in-text annotations for Person, Organization, Event, and Location.
Product Entities: Structured commerce nodes emitted in product-related conversations.

Importantly, these data structures do not appear in:

the OpenAI API (api.openai.com/v1/chat/completions),
nor in the free ChatGPT tier (which uses a different backend).

They are exclusive to the ChatGPT web and paid tiers, where richer data pipelines are active.

Inside the Entity Layer

At the heart of the stream, I found that every conversational turn embeds entity placeholders that look like this:

Each placeholder is then replaced dynamically by an object of a specific type —
for instance, organization_entity, person_entity, or event_entity.

This allows ChatGPT to build context-aware memory anchors across turns. When you ask, “What is WordLift?” and then follow with “Who founded it?”, the system already has a reference to the organization entity WordLift in memory.

ChatGPT Entity Classes

Type	Description	Purpose
`entity`	Generic placeholder for untyped entity references	Lightweight mention linking
`person_entity`	Individuals, founders, public figures	Used in biographies and relationships
`organization_entity`	Companies, institutions	Used for corporate and contextual queries
`event_entity`	Events, conferences, historic moments	Temporal grounding
`product_entity`	Products, devices, commercial items	Structured product layer for commerce
`moderation_entity`	Policy categories (violence, hate, etc.)	Internal safety classification

Each component serves a specific purpose in how ChatGPT organizes, filters, and enriches its responses. Beyond inference, these entities may also hold value for pre-training tasks, forming the semantic layer that characterize every conversation.

The Product Knowledge Graph of ChatGPT

If you follow our work on agentic commerce and product feeds, this section will feel like the next logical step. Building on:

I was able to analyze the product_entity objects. They behave differently from other entity types: instead of just being named references, they carry fully structured product metadata, much like a JSON-LD Product object.

Here’s a simplified example extracted from the stream:

{
  "id": "2997526925583449256",
  "title": "Bialetti Moka Express (classic size)",
  "price": "€23.90",
  "rating": 4.7,
  "num_reviews": 5900,
  "merchants": "Unieuro + others",
  "featured_tag": "classic everyday model",
  "image_urls": ["https://...jpg"],
  "metadata_sources": ["p2"]
}

Key findings:

Product IDs are 18–20 digit numeric codes — matching Google Shopping catalog IDs, not GTINs.

All product URLs are empty strings — ChatGPT renders them internally, without external navigation.

The provider field ("p2") is consistent — suggesting a single, centralized product data source.

These objects represent a hidden product graph, connecting user intent to structured commerce data — effectively turning ChatGPT into a semantic front-end for product discovery.

Why These Entities Matter

The presence of entity annotations and product schemas inside the ChatGPT stream indicates a shift from generative text toward structured reasoning.

This architecture enables:

Context persistence across turns (via entity IDs).
Memory-level grounding (via internal references).
Hybrid search + reasoning (via entity-typed nodes).
Commerce experiences (via the product_entity layer).

In other words, ChatGPT’s semantic layer functions like a private, evolving knowledge graph, updated in real time as users interact with the model.

SSE entities reveal how the UI reasons in terms of typed nodes. Publishing the matching Schema.org JSON-LD makes those same nodes machine readable for AI search and shopping. Use the table below to translate each SSE entity into concrete markup, then the product attribute map to wire your feeds.

SSE entity → Schema.org map

SSE entity class	Primary Schema.org type	Key JSON-LD properties to include	Where it helps
person_entity	Person	name, sameAs, url, image, jobTitle, worksFor	Author panels, citations, E-E-A-T surfaces
organization_entity	Organization	name, url, logo, sameAs, brand, contactPoint	Brand cards, site links, knowledge answers
place_entity	Place or LocalBusiness	name, address, geo, openingHours, telephone, sameAs	Local AI results, maps, travel panels
event_entity	Event	name, startDate, endDate, location, organizer, offers	What’s on, ticketing answers
brand_entity	Brand	name, url, logo, sameAs, aggregateRating	Product clustering, brand overviews
product_entity	Product + Offer	name, description, image, brand, color, material, weight, category, inProductGroupWithID, sku, gtin*, isVariantOf; offers.price, offers.priceCurrency, offers.availability, offers.itemCondition, offers.inventoryLevel	AI shopping answers, price and availability, product comparisons

* add gtin when available. If missing, prefer sku plus a stable inProductGroupWithID.

Product attribute map

Mapping detected attributes and ChatGPT Shopping feed fields to Schema.org for the example item.

Feed field or detected attribute	Example value	Schema.org property	Notes
id	shopify_US_9020023177513_47811512729897	sku	Use variant id as sku
item_group_id	shopify_US_9020023177513	inProductGroupWithID	Stable group id across variants
title	Slim Ski Bag Martian Red	name	Keep variant-level name
description	Travel light and stylishly…	description	Plain text preferred
link	https://myblackbriar.com/…	url	Canonical PDP URL
image_link	https://cdn.shopify.com/…png	image	Array supports multiple images
additional_image_link[]	…01.jpg, …03.jpg	image	Append to image array
price	199.99 USD	offers.price and offers.priceCurrency	priceCurrency from suffix
sale_price	163.99 USD	offers.priceSpecification or a second Offer	Use UnitPriceSpecification or separate Offer for sale
availability	in_stock	offers.availability	Map to InStock, OutOfStock, PreOrder
inventory_quantity	2	offers.inventoryLevel	QuantitativeValue with value 2
condition	new	offers.itemCondition	NewCondition, UsedCondition, etc.
brand	Black Briar USA	brand.name	Wrap in Brand object when possible
color	Martian Red	color	Free text ok
material	N/A	material	Omit if unknown
weight	10 lb	weight	QuantitativeValue if you can parse unit
product_category	luggage & bags > backpacks	category	Use your site taxonomy or GPC
age_group	adult	audience.audienceType	Or PeopleAudience with audienceType
gender	unisex	audience.suggestedGender	Or audienceType “unisex”

The Limits of Access: Why You Won’t See It in the API

The public OpenAI API — even with streaming enabled — provides only lightweight text deltas (choices[].delta.content).
It does not expose any of the rich metadata, entity references, or moderation categories visible in ChatGPT’s private SSE stream.

This means:

Developers using the API are working with surface-level tokens, not the structured entity layer underneath.
The richer layer is reserved for OpenAI’s own interfaces, such as ChatGPT and GPTs.
The free ChatGPT version also lacks this infrastructure; the entity layer is observable only in paid or “Pro” tiers.

This reinforces that ChatGPT is not just an interface to a model — it’s an orchestrated system combining models, moderation services, entity linkers, and product indexes.

From Knowledge Graphs to Product Graphs

We now see the semantic layer evolving in real time: from basic NER to structured product representation.
The pattern is unmistakable:
OpenAI is aligning its internal knowledge representation with web-standard ontologies like Schema.org and GS1, creating a private, operational Product Graph.

This has deep implications for SEO, e-commerce, and digital marketing.
For brands and publishers, the only way to appear in this new conversational economy is to publish structured, machine-readable data.
WordLift’s mission, to make content and products understandable by machines, has never been more relevant.

Conclusion: The Rise of the Semantic Infrastructure

ChatGPT’s entity layer is not theoretical; it’s operational.
From people to products, it builds a real-time knowledge graph beneath every conversation.
And just like the web, it relies on structured identifiers, metadata, and linked data patterns to function.

This new layer is invisible in the API, absent in the free version, and yet central to how modern AI systems perceive, reason, and recommend.
It’s the missing semantic infrastructure that connects human language to machine understanding, and soon, to transactions.

We are witnessing the birth of the reasoning web, where every entity, from a person to a product, becomes a structured node in a machine-readable conversation.

Note on methodology

I captured ChatGPT’s Server-Sent Events (SSE) from the web client using Playwright’s request routing. The script below logs the full SSE payload to disk and extracts the internal content_references blocks into JSON. This is a point-in-time, UI-level observation for research. Only record your own sessions, avoid sensitive prompts, and respect the site’s Terms of Use and your local laws. Entity field names and shapes can change without notice. The code of the recorder is available on Gist.