Universal Data Analytics as Semantic Spacetime(Part 11)

Part 11: In search of answers, beyond this horizon

In this series, I’ve discussed at length — and with explicit examples — what we can do with the principles of Semantic Spacetime to make sense of generalized data analytics, using just a couple of tools on a personal computer. For some applications, indeed for exploring and learning, this combination will take you quite far. It’s not necessary to jump into the deep end of High Performance Computing, Big Data, or Deep Learning to find answers. Some problems are certainly larger though: the long tail of the data processing power-law has plenty of hurdles and challenges, and I’ll return to these at a later date. In this final installment I want to summarize what we’ve accomplished using these basics.

Graphs want to be “alive”

In the series, I’ve tried to show how graphs are the natural data representation for active processes. Graphs are built by processes, they lay out the circuitry of flow processes, and interacting with graphs is an on-going process, not a random access story. Graphs are not merely frozen archives of passive data, graph circuitry remains an active spatio-temporal mirror of the world, with embedded directionality, and unresolved inline choices that capture complex causality. Every graph is a computer as well as a model of state. Thus graphs are samples of spacetime — with semantics.

Today, we have a battery of methods developed to calculate certain properties of unlabelled graphs, at least when their nodes are homogeneous and memoryless. We sum up weighted contributions, e.g. in Artificial Neural Networks, or “entire” Graph Algorithms such as eigenvector centrality (PageRank) etc, to expose certain structures just from the topology. But the most fundamental resource for machine learning and causal computation lies in the variable data held within graph: vertices or nodes, and their labelled edges or links. Advanced machine learning is accomplished by memory processes with strong semantics, by traversing both symbolic and quantitative information.

Graphs resist all attempts to be frozen into static snapshots or dead archives. Standardized (one might even say “authoritarian”) hierarchies, like taxonomiesor ontologies, tables of contents, etc. try to define spaces, like a fixed coordinate system in Euclidean space. These are common but fail to capture the nuances of real world complexity — because these coordinate systems are only ad hocspanning trees, i.e. incidental route maps overlaid onto a snapshot of an evolving system. As we learned from Einstein, relativity of viewpoint and circumstance forces us to change perspective. Every spanning tree yields a unique “view” or partitioning of data, but usually the deeper invariant semantics within nodes don’t fall into these ad hoc treelike hierarchies. This is why hierarchical filesystems still need symbolic links to patch them up into a usable state.

The “liveness” of network data makes graphs a key part of the animated storytelling of the world around us — from embedded edge computing, within local ecosystems, to the routing of signals and resources for control purposes. This characteristic also makes graph databases quite different (i.e. highly non-linear) compared to their static SQL cousins. The process of querying, and working with, graph data for the realtime evolution of a system requires a very different languages and more active patterns of analysis than mere retrieval from a tabular archive.

Don’t forget the Internet

Internet routing protocols were the first large scale machine learning graph databases. They enabled the “living” Internet of changing pathways and services we enjoy today to grow and evolve. Today, the data structures are highly distributed and the Internet graph spans the world, but everything began in closets in a few universities before multiplying and spreading virally, as part of their own growth process.

The Internet was designed to be a hierarchical mesh network — an “ecology” of connections, resilient to failure. If you like, it was by observing spacetime principles that we ended up with an emergent ecosystem of technologies and regional human organizations to bring about the information sharing and social network of our times. The sum of all those small patches appears incoherent and noisy, even though the reality of it spans the globe according to a high ordered set of principles.

There’s a lesson there about how processes form networks, and how networks perpetuate processes to explore them. I occasionally wonder whether we have we forgotten these lessons in our contemporary obsession with Artificial Neural Networks, Deep Learning, and large scale graph based computation such as Pregel and Apache Giraffe.

A graph in semantic spacetime is a representation of a process by an inhomogeneous graph, in which context and specialization are encoded by hard-coded labels and other data discriminators. A Deep Learning Neural Network is a simulation of a process over a relatively homogeneous surrogate network, in which inhomogeneity has to be imprinted to capture different instances — by learning the relative weights software links, in a neutral manner, to generate an interference pattern.

Smart use of spacetime

We can all do data science with just a couple of tools. Don’t be ashamed of your laptop — not every problem needs to be crushed by brute force mega-clusters. Google envy — the dream of commanding an army of computers that can crush computations at the push of a RETURN key — might excite us on an adolescent level, but science and engineering are more subtle adversaries, which don’t necessarily respond to the threat of force. There are ethical concerns too in the use of brute force: flaunting brute force calculations in massive datacentres flouts global warming concerns. The IT industry is every bit as noxious as the airline or car industry, make no mistake, when you trace its dependency graph to the energy source! Cloud operational expenditures are actually quite high too.

Of course, there remain a few large scale problems, e.g. like protein folding and other scale-dependent phenomena, where brute force may be the only option for the time being. We don’t know enough of the underlying principles to reduce these problems yet — but that will change in the future. GPUs and other specialized chips to parallelize data processing pipelines can temporarily exploit the spacetime of the computation for greater efficiency, but these still require a deep understanding of the structure of spacetime to mimic the processes. The key is always to exploit the structure of space and time in each of its localized meanings for best results.

In the previous post, I showed how two competing models of graph computation have emerged:

The direct use of graphs with weighted nodes and links to memorize process state, attaching semantics to nodes and links through precise encoding. This is what you might store in your graph database. Learning is accumulated like the well trodden path across virgin territory by updating scalar weights. One can reason by direct graph relationships (FOLLOWS, CONTAINS, EXPRESSES), as well as cache similarity measures (effective distance) on a pairwise basis (NEAR). This can be computed efficiently on a need to know basis for inference, but not necessarily for extrapolation. This is the routing algorithm model of graph analytics.
The there’s the “Stanford” approach of transforming an entire graph into multi-dimensional Euclidean embedding as a speculative vector space, using Deep Learning, and using the resulting Pythagorean distance as a similarity measure. You hand the whole thing over to a cloud provider, because the process is on an industrial scale. The results are computed up front, in their entirety, by brute force — and they remain somewhat magical and inscrutable. This embedding of processes as an over-complete Euclidean space makes a certain kind of speculative prediction and extrapolation possible. The algorithms are made one-by-one to work on specially curated kinds of graph data, and require multi-stage processing in one gigantic effort. The results are “out of control”, but could be impressive, even amazing because they’re unexpected.

Conputation is expensive. I think we’re also fast approaching an age in which we’ll have no other choice than to think much more of utilizing the ubiquitous edge devices we’ve spent decades acquiring —letting the edge processes themselves compute their answers in real time. Those origin graphs can be viewed as the accumulation of many small steps, distributed over space and time. Perhaps we can look forward to a new age in which small is once again considered beautiful, and the common ecosystem of virtual information circuitry surrounding us is what will define our human experiences.

The problem with brute force computation, marshalled as a bulk operation, is that it’s really expensive unless one exploits the graph of process interrelationships directly. Forget about simulating everything later, do it in real time. Part of this pain is self-inflicted. The culture of simulation doesn’t lead us to approach problems with an eye to making them efficient. Computing culture is to throw brute force at problems.

Instead of using multidimensional vector spaces as a paradigm, could we instead use bit operations to optimize similarity and discrimination with AND and OR? Interferometry (see part 8) is a powerful tool for merging parallel processes — analogous to methods of quantum computing. If a calculation is interrupted, could we pick it up where it left off, or do we have to start the whole thing again? Calculations can also be treated as “living evolving state” rather than snapshot batch jobs. These are all problems for carefully engineered data pipelines, and an extended caching hierarchy.

It’s really pipelines all the way down.

Digital twins and data centralization

As an example, consider the digital twin. Now, if ever, is surely the age of the digital twin. Digital twins are supposed to represent facsimiles, shadows, or cybernetic ghosts of physical devices — connecting a real world to a virtual world. They live somewhere in “the cloud”, while the source devices haunt the edge of reality. Good old fashioned monitoring dashboards are, in a sense, poor-man’s examples of digital twins. They collect the numbers, but we’ve still a long way to go to make good use of the data semantics.

One can easily kill off the remnants of semantics, gleaned from edge sources, by transforming data into a dead knowledge representation — like fixed schema databases, designed for random access. Processes are not random access structures: they have causal topology.

For example, if one has a process driving changes, the events are typically captured event-by-event as log files, at the point of origin as timeseries. Later, these may be merged into a single muddled thread, in which origin is lost. Today, logs are typically uploaded into relational SQL databases to throw at Elasticsearch, for post hoc brute force searching. Such random access stores can’t describe the process timeline without relying on artificial timestamps or auto-incrementing keys, which have no invariant meaning.

A graph database would encode such a proper-time series using FOLLOWSedges, and separate annotations about events with EXPRESSES ,CONTAINS, and NEAR links. Auto-incrementing numerical keys are okay, until a process bifurcates into parallel threads, or merges from several parallel threads, at which point all bets are off. The original causality then gets eliminated by projecting the original causal graph into “universal continuum time” (this is the problem with Euclidean spacetime — it has only a single average timeline, and thus gets muddled by generating entropy). A graph can easily bifurcate to avoid this. A series of timestamps may also pass through several timezones, and derive from independent clocks that are potentially unsynchronized, as processes migrate from host to host — thus “Euclidean” exterior time ceases to have any meaning. Semantic spacetime explains how to deal with these issues. Graphs are the key.

A graph database can retain the original causal relationships (of proper interior time) for each process, as well as fork and abstract these in order to relate them to invariants for context and semantic inference (see parts 7 and 8)— to model events.

Ultimately, there’s the issue of where to keep the data for digital twins. It seems like an odd idea — to manage a thing by making a copy of it somewhere else, a bit like honouring celebrities with weird Madame Tussaud’s waxworks. You either feel proud or embarrassed about the likeness — but they are dead things, not living representations. Yet, we try climb the ladder of representation, comprehension, from mere data capture, via perspetives, interpretations, to an eventual understanding (see figure 1), and twins are the contemporary expression of that.

Figure 1: Layers of processing take us up the DIKW ladder from Data coming out of the blue, twinned in storage, to interpreted Information, then experiential Knowledge, and hopefully later Wisdom. But what is the penalty for climbing this ladder in time and resources?

These are the well-known problems with trying to move data around and rule the world by remote control:

Synchronization of data over long distances (equilibrium) involves latency, and clocks can’t be trusted to reflect real processes. Determinism relies on speed, and timing issues expose the fundamental flaws of technologies based on instantaneous responses.
Collecting data from multiple components doesn’t automatically provide a representation of the whole story. If we lose the context of the measurement, and the interrelationships, there’s no way to reconstruct them later. We need a scaled approach that blends dynamics with semantics (see figure 4 below).
Finally, the relevance of the answers, inferred from past data, for the “here and now” depends on a chain of invariances linked by causation, each of which needs to be questioned. There is uncertainty in every story we try to tell with data.

From a spacetime perspective, it’s completely clear why there is no single concept of time in a distributed system (see Smart Spacetime), and thus no precise deterministic control on any timescale. Sending data over a network incurs delays, loses context, and perhaps loses the chain of causality — even data unless special measures are taken to capture them all. So why then would we centralize in cloud datacentres?

We can centralize just as much as we have to in order to calibrate information to a common standard, so that semantics are the same for all. Then, we preferentially keep data as close to the point of application as possible, in order to minimize delays and distortions. It may sound simple, but it’s fraught with subtleties. The bottom line is:

The first rule of distributed computing: don’t send anything over a network unless you have to.

Knowledge representations as cybernetic control systems

As I mentioned at the start of the series, today we’re using data as much for cybernetic feedback and control as for scientific discovery. Consider then a data pipeline, pumping information through space and time. Think of any computational process or digital conversation passing through a mobile communications network between endpoints that could be in motion. Maintaining the flow of data, at a certain rate, is one problem, especially when spatial relationships might be changing — this is a cybernetic challenge of interest in 5G or 6G services and beyond. Maintaining the meaning of the results is a bigger issue, especially at scale.

A lot has been made of the revolution in edge computing: the Internet of Things, RFID chips, 5G network coverage, and smart “embedded” services. The promise of rich contextual edge-services is still a work in progress, but advancing by the day. With data sources in relative motion, the systems we rely on are constantly seeking paths through “Internet spacetime”, negotiating handovers from cell tower to cell tower in a “telco spacetime” graph, and coordinating physical locations with available services at the “edge”. By separating events (logs) from invariants, using the principles I’ve described, we can manage and interact with active processes as a chain of simple coincidences (part 7). These are similar to RDF’s triples, but have spacetime semantics.

PersonLocation(roomID,personID)
UserService(userID,serviceID)
RoomIPprefix(roomID,IPprefix)Train(engine,list carriageIDs)
TrainStation(trainID,stationID)
PassengerTicket(name,ticketID)
TicketTrain(ticketID,trainID)
PassengerTrain(name,trainID)
PassengerStation(name,stationID)etc..

These coincidence function annotations quickly generate a rich and easily searchable graph, from easily understandable spacetime events — in an idempotent or invariant way. From a database viewpoint, details that are interiorto agents in this description can be modelled as document details that don’t need to be exposed as graph structure. This is why I chosen ArangoDB at my tool of choice.

Notice how there are several possible spaces in play in a semantic spacetime model: coordinates over the Earth surface, building addresses in a city map, train stations on a railway network, etc.. For spatial reference, IP addresses or street addresses are meaningless to geo-space, and geospatial coordinates are meaningless in mobile communications of the Internet space or to postal delivery. They may need to be mapped to one another by virtual processes of discovery (say a postman’s travels). These mapping searches are themselves non-static! Processes involve types of virtual motion on many incompatible levels.

Edge computing isn’t just about smart devices in the home either: it’s about wiring together these completely different models of space and time, through data pipelines, with levels of understanding and sophistication to complete a puzzle of staggering complexity. Well chosen data representations are needed to process and curate these into well chosen knowledge representations at the right speed to anticipate the answers to future questions. It’s data pipelines all the way down.

Figure 2 : Meaning isn’t instantaneous. It’s inferred by processes that proceed at rates less than or equal to the rate of data arrival, mixing past and present in the causality cone. Rich answers rely on elapsed time to add value to data, e.g. in Machine Learning. The ability to access answers quickly from a rich model depends on the sophistication of the knowledge representation after processing.

Data processing pipelines (figure 2) are becoming the circuitry of the cybernetic world, every bit as important as our utility networks. Asimov got this part wrong: the cybernetic future doesn’t begin with isolated humanoid robots, but rather with a giant silicon ecosystem, a digital rainforest of diversity to explore the intrinsic complexity of human civilization. Today, it’s still held together with coffee, VPNs, and bug fixes. If we’re going to make progress in the future, it needs to become as robust and commoditized as electronic circuitry is today. There’s still a long way to go, because the problem of data circuitry spans many layers of virtualization, each treated as distinct — but waiting to be unified by underlying spacetime principles.

Intrinsic scales and choosing a data store