Part 9: Graph structure, process computation, and dimensional reduction
There’s surely an uncountable number of atoms in a beam holding up your roof, and it’s clear that we don’t need to refer to each atom to talk about the role of the beam. The beam is a superagent; it has its own story, on a much larger scale than the atoms. It relies compositionally on the atoms within it, but they don’t play a direct role in the function of the beam. You could replace the specific atoms with something else, and it would still work. There is a separation of phenomena across scales — effective descriptions dominate the stories we tell. When we shift perspective, from one scale to another, eliminating unwanted degrees of freedom, we both reduce the dimensionality of the problem, and we reorganize computation to achieve a sufficient answer. In the process, we go from something “uncountable” to “a manageable few”. In this post, I want to discuss the foundations for such effective pictures, coarse-graining, and renormalization of models, when computing answers from graphs.
In studying graphs, we’re not just looking for shortest paths, or to how get from one point to another in the quickest time: we may actually want to know where the weakest points in a network are, or find alternative (even hidden) routes, backdoors and, covert channels. We might be looking for the optimum place to establish a new node, in order to serve others, or to depend on a critical source. We might want to turn a timeline of events into a map of knowledge, turning happenings (“X reads a book”) into invariant state (“X knows the book”). Finally, we may look for co-activation clusters to find correlated events, items, or relationships!
Graphs are maps, processes, and calculators all in one. The first machine learning graph was the Internet routing infrastructure: RIP, OSPF, BGP, etc. Today, we have begun associate graphs with machine learning particularly for“AI” (Artificial Intelligence) in a number of ways: the strengths of causal connections can be used to record history and predict priorities; the clusters that form large scale structure can identify patterns of similarity for inference. Graphs and networks are now basic tools of “AI”. In all these cases, how we identify process scales is key to finding the answers we seek.
However, when we look at a huge graph, we are looking at more than just a database — it’s filled with both the remnants of old and harbingers of new processes. We might be interested in querying its large scale patterns, single paths, or topographical features — but this requires iterative interaction with the many alternatives through it. A one-off query statement, such as we’re used to from archival (e.g. SQL) databases, usually won’t cut it. Few other kinds of database can easily show patterns in data in the same way that a graph database can, but that doesn’t mean it’s easy. This whole topic makes graphs part of an unfolding story about knowledge representation that probably begins and ends with neuroscience (a story for another time).
An atom inside a structural beam has no individual significance — it merely plays a role in a larger structure, thus we look mainly to that larger structure for answers. Unless the single atom is a critical part of another process (say holding together the mouth of a crack), we don’t want to deal with it explicitly. We work with the beam on the scale of the role it plays.
The concept of being inside and outside of some place is awkward in the Euclidean model of space because there are no natural boundaries, but it plays a major role in graph semantics. In a network, boundaries are easily drawn and influence from outside a region can only pass along channels that are links or edges in the graph.
An outside influence may imprint changes throughout the interior of the region it touches, not just at the edge or point of contact by propagation, or iterative flow. These external influences can thus be learned and distributed throughout the interior configuration of links and nodes. This is how a network can act as memory. In society, it’s often the habitual knowledge, built into ongoing process relationships between people and groups, that keeps social practices alive, not so much the passive knowledge documented inside books or agents. For example, we sometimes call this distributed process memory “muscle memory” when it refers to ourselves.
So, the impression left by exterior “influences” is recorded on some level by the strengths and types of bonds between things or steps in a process. This is how cognitive systems learn. A graph is a learning network. The links between atoms in the metal beam, like the bonds in living tissue (e.g. bone, muscle, and skin) update their configurations dynamically in response to exterior stress information too. They learn the pattern of stress by adapting to it. Holograms and paths worn across grassy lawns do the same. Learning is a cognitive process — not unique to humans. It’s everywhere!
To make use of cognitive processes, wherever they may be, we have to see salient features at the right scale. A useful scale for learning is usually much bigger than the scale of the smallest change in a system, because small details are changing too quickly — atoms and beams. We need persistence to make sense of ideas. Thus statistical averages play a key role in processes of interest, because they separate persistent signal from fluctuating noise. Averaging is a way of finding effective invariants, i.e. the stable message within a body of data. It does so by summing over the “random” variations, which change faster than the persistent scale we’re interested in, hoping for something more persistent to emerge.
Renormalizing graphs and “summarization”
The technique of integrating out unwanted details (known as degrees of freedom) goes by different names in different fields of science and engineering: dimensional reduction, coarse-graining, pooling, renormalization, etc. It’s the basis for averaging, statistical analysis, the classical measurement of quantum processes, and also for Artificial Neural Networks and now Graph Neural Networks.
In network routing, summarization is the process by which IP addresses are replaced by their common prefixes to reduce the dimension of information about the network for signposting.
When we want to eliminate detail, we look for patterns that express equivalences, and try to collect them under a common umbrella. Then we replace the collection with a single identifier — something simpler. To do this, without losing too much of their meaning, we have to look to the local semantics of agents and nodes, i.e. their specific functional roles in the processes of interest, and try to preserve causality on a composite level. This is just the coarse graining described in the previous post (part 8), where we looked for “scattering regions” with common inputs and outputs. It always has to happen relative to a process. Recognition of a pattern is always based on some underlying concept of process semantics because it’s processes that memories meaning not things. The devil, of course, is in the details — how we decide to encode as much or as little of that pattern as we need.
In figure 1, for instance, we start out with a graph of distinguishable atomic components, which EXPRESS properties with names like H and O. They are bound together into super agents that CONTAIN the parts H and O. The superagents can be labelled with a new name H²0, which contains less information than the original graph, but EXPRESSes new semantics on the scale of the molecule. All we did was to integrate out some local detail in the semantic neighbourhood of the molecular parts. If we now coarse grain the whole system graph for this pattern, we can identify a single component superagent which we rename as “water”.
Figure 1: Renormalization of a graph by coarse graining leads to different semantic interpretations — a separation of scales, where we tell distinct stories.
In each case there is a dimensional reduction associated with the semantic scale of the data representation. How should we go about doing this? There are many possible approaches. With an underlying basis for semantics, using the four spacetime types, we could calculate this simply as we did in the last episode.
But not everyone models graphs with clear semantics. Then we need to extract a discipline indirectly. This is what graph embeddings try to do. I’ll not go into that topic here, because it’s orthogonal to the semantic spacetime approach and requires significant processing to stand a chance at recognizing roles. When that’s done, one still has to assign new names to the clusterings and divine their meanings. I’ll be returning to this point in the next post too, to show how we can use this as a practical technique.
Flow gives us a way to generate calculations
In physics, the generator of the proper time is a rule for transforming the state of a physical system into a new state — a flow of “energy” (associated with the Hamiltonian of the system). This is a graph process, as physicist Richard Feynman showed by representing processes as Feynman Diagrams. Analytical applications — in which graph relationships represent on-going dynamical processes — are effectively flow systems too. The effective adjacency matrix of the graph, at different scales, has the role of a propagator, and the flow becomes a way to compute outcomes.
Everything is a process. Even when we think something is static, there’s a process associated with observing it that takes time and effort! Whether a flow is data passing through a pipeline, money passed by transactional exchange, histories over citations in journal publications, or just conversations exchanged between employees in an organization, in each case there is a kind of “vote” that gets passed from node to node in a network. Directed graphs are thus the circuitry that expresses this voting flow. The lucky “winners” (or perhaps losers) in this view are those nodes which receive most “votes” during a process’s history.
A simple way to calculate and summarize this score in a steady state graph is by using the Frobenius-Perron result for this principal eigenvector of a graph’s adjacency matrix! It allows us to compute eigenvector centrality. Or, in plain language, the effective score for votes, summed over the whole graph. If we base current characteristics on statistics, it’s a technique that’s related to the technique of Principal Component Analysis. Some examples, where this is useful for finding bottlenecks and Single Points of Failure:
Data pipelines, dependencies and interoperability.
Human resources and social influencer networks — voting or reliance.
Production/collaboration delivery networks, i.e. service delivery and dependency.
Voting is only a numerical way of counting importance, but it’s an easy one to understand. Semantics on graph nodes and links allow us to supplement this with qualitative criteria.
Example calculation: node importance ranking
The network centrality or importance story begins with the concept of flow and equilibrium in a network. There are several measures of how “central” or “important” a node is. The one I find most useful is called eigenvector centrality. The most important nodes are the ones with the most important neighbours. If we take the “weight” of a node and carry it along the links it connects, then it flows mainly towards certain centres. The graph computers the value automatically just by following the flow. That’s leads to a circular definition which can be solved by an eigenvector equation:
A v = k I v,
where A is the adjacency matrix for the graph, I is the identity matrix, and k is a constant (eigenvalue). That leaves v as a vector whose solution components are the relative ranks of centrality or importance for each node (see figure 2).
Figure 2: Importance by graph connectivity: eigenvector centrality. Nodes that are very well connected to nodes that are very well connected are most important, or most “central” in the thick of things.
A node’s centrality score also simply indicates the points of fragility for a system, because the impact of a failure will be larger closer to a very central node. This is why military strategy targets key infrastructure to disable an enemy. Semantics become more important when network weightings refer to:
Citation networks (assumed fair and representative)
Dependency networks (“supply chain”) etc of components
Smart building systems
Centrality is a numerical ranking, not automatically designed to account for semantics. It’s also not designed to work for directed graphs. These are issues that can be dealt with (and which I’ve written a lot about over the years). When applied to directed graphs, eigenvector centrality is a subtle matter. In a directed graph, a flow never reaches an equilibrium (i.e. a common “sea level” of votes), it just follows the arrows and runs down the drain. There are various “sink remedies” to this problem — some of which will alas still be patented for a few more years. The simplest remedy is to just remove the arrow direction from all links, then it closes and becomes stable to eigenvectors under the Perron-Frobenius theorem.