Steven Hart

Knowledge Architect

< BACK

Design blueprint for a knowledge graph-driven intelligence platform

A knowledge graph design for a private equity intelligence platform, modelling the relationships between investors, fund managers, funds, deals, people, events, and editorial content, then showing what those relationships make possible in the product.

Developed while contracted to PEI Group as Senior Information Architect.

Overview

This project explored how rethinking the underlying information model of a platform could unlock richer insights, and better market intelligence.

By modelling the entities and relationships of a private equity asset class as a knowledge graph, I showed how the platform could move beyond static lists of disconnected data or editorial content, instead revealing genuine network insights.

The work demonstrates how information architecture, conceptual modelling and graph-based data design can enable richer discovery, exploration and AI-assisted analysis.

Designing for AI systems as well as human intelligence

While LLMs are powerful at interpreting text, they perform far better when operating over clearly defined entities and relationships. Systems that rely purely on documents or loosely structured data often struggle to provide reliable answers or explanations.

Knowledge graphs provide a complementary layer: a structured representation of the entities and relationships that define a domain.

In that sense, designing the structure of knowledge is becoming an increasingly important part of designing modern data products.

The start point: an underlying structural problem

The PEI platform holds a large amount of valuable information about the private equity ecosystem: general partners, limited partners, funds, deals, strategies, sectors and regions.

But this information exists largely as isolated records in a content management system.

Firms appear in one place, deals in another, investors in another again. Articles reference these entities, but the connections between them are not consistently modelled and rely on manual tagging.

As a result, the platform behaves more like a library of documents than an intelligence system.

Users can retrieve individual pieces of information, but understanding the relationships between them requires manual effort.

Real questions that investors naturally ask are:

Which managers specialise in infrastructure strategies in Northern Europe?
Which investors frequently co-invest together?
Which firms have recently shifted strategic focus?

These are difficult or impossible to answer when relationships are not explicitly represented. The information exists, the platform cannot connect it.

The design insight

In analysing the domain, a pattern emerged in how users think about investment markets.

They rarely begin with a single entity. Instead, they think in terms of intersecting dimensions:

Strategy – what type of investment activity is being pursued
Sector – the industries involved
Region – the geographic focus

These three dimensions — strategy, sector and region — form a natural conceptual anchor for understanding private equity activity.

Instead of organising the platform primarily around documents or isolated entities, the system could instead organise knowledge around these intersecting dimensions of the market.

This insight provided the conceptual foundation for the knowledge graph and a foundational building block for the interaction framework: each could be considered in relation to the other from the very beginning.

The graph

The schema I designed represents nineteen distinct entity types — general partners, limited partners, funds, deals, portfolio companies, individual assets, people, events, articles, and others — connected through sixty named relationship types. The relationships aren't just links; they carry properties. A commitment relationship records the amount, the date, the fund vintage. A participation relationship records the role a person held at the time of an event. The graph knows not just that two things are connected, but how, when, and in what capacity.

A 127-term controlled vocabulary — covering strategies, sectors, and regions — is enforced through the graph structure itself. Terms are nodes, not tags. An entity can only be classified using a term that exists in the graph, which makes the vocabulary consistent across every content type without relying on editorial discipline to maintain it.

The graph was implemented in Neo4j and populated with realistic synthetic data — modelled carefully on the domain — to make the intelligence it could generate legible to stakeholders. Abstract schema diagrams don't demonstrate value.

A query that returns fourteen LPs with active infrastructure intentions in Europe, ranked by capital signal strength, does.

Key modelling decisions

Strategy drift detection

Every fund has a declared focus — the strategy, sector, and region it says it will invest in. Every fund also has an actual behaviour — the strategies, sectors, and regions where its portfolio companies actually operate. These two things are often different, and the difference is meaningful: it reveals whether a fund is drifting from its stated mandate, entering new territory quietly, or shifting focus in response to market conditions.

The graph models both. A fund's declared focus is captured as explicit relationships to strategy, sector, and region nodes. Portfolio companies' actual operations are captured separately. Querying the difference between the two makes strategy drift detectable — something that is effectively impossible in a relational database without complex joins, but is a natural graph traversal.

Investment intention signals

Investor appetite is not a binary state. An LP moves through a sequence: general appetite for a strategy, active intention to invest in a specific fund raise, formal RFP process, committed capital. Each stage has different implications for a GP running a fundraise — an LP at the intention stage is worth a different kind of conversation than one at the appetite stage.

Modelling this pipeline as discrete nodes — rather than as properties on a relationship — makes the lifecycle queryable. You can ask: which LPs have moved from appetite to active intention in the last quarter? Which have stalled at RFP without committing? Which are new entrants whose intention signals suggest a strategic shift? The answers are not lookup queries; they are pattern queries, and the graph makes them natural.

Editorial-to-data bridging

An article about infrastructure debt in European markets is connected, through the graph, to the same controlled vocabulary terms used to classify funds, investors, and deals. This makes relevance computable: an LP whose declared focus includes infrastructure debt in Europe is a relevant reader of that article — a fact the graph can state explicitly, not one that requires a text-matching algorithm to approximate.

The same bridge works in reverse. A GP reading an article can see which LPs match the article's themes and what their current cap

Keeping dual roles explicit

Some firms in the private equity market act as both fund managers and investors — committing capital to other funds while raising their own. The temptation in schema design is to merge these into a single node type for simplicity. The graph keeps them as distinct types, with an explicit relationship handling dual-role cases.

The reason is query clarity. The relationship patterns around a firm acting as a GP are different from those around the same firm acting as an LP. Collapsing the types would make those patterns harder to query and easier to misread. A small increase in schema complexity in exchange for a significant increase in analytical precision is usually the right trade.

What the graph makes possible

Nine page templates were designed to show the graph's intelligence in practice — not as abstract capability, but as specific things a user can see and do.

A GP dashboard, configured by strategy, sector, and region, surfaces the LPs most likely to be interested in a fund raise in that market: ranked by capital signal strength, filtered by active intention, flagged if they're new entrants. It tells a GP not just who is out there, but who is worth calling this week.

A GP profile moves beyond descriptive data — AUM, fund list, headquarters — to show the firm's actual behaviour in the market: the gap between its declared strategy focus and where its portfolio companies operate, the LPs who have committed repeatedly as a signal of relationship depth, the competitor GPs with overlapping focus.

An article page connects editorial content to the market it describes. The graph identifies which LPs match the article's themes — not because they're mentioned in the text, but because their declared focus aligns with the article's SSR classification. Relevance is computed from structure, not inferred from language.

A people and events hub surfaces relationship implications from executive movements across the industry. A newly appointed investment director at a major LP is an outreach opportunity. A senior departure from a competitor GP may signal strategic shift. Events are scored by LP density and capital signal concentration.

Instead of static lists of content, the graph shows a market — LPs who match your strategy, the signals they're sending, events you can meet them at, and the themes driving activity. It's the difference between a directory and an intelligence system.

Reflection

This project sits at the point where information architecture, data modelling, and product design overlap. The schema decisions — what to model, what to leave out, how to distinguish declared behaviour from actual behaviour — are design decisions as much as technical ones. They require understanding of how users think about the domain, not just how the data is structured.

A few things I'd approach differently in a production context: I'd want user research to validate the SSR anchor against actual query behaviour before committing the full schema. I'd want cross-team governance for the controlled vocabulary from the start — editorial, data, and product aligned on terminology before implementation begins.

And I'd want an entity resolution strategy for live source data, since the synthetic dataset sidesteps the hardest practical problem: the same GP firm appearing under six different names across four source systems.

The graph was just the first step, though. The hard work of connecting it to production reality is what comes next.