Universal Data Analytics as Semantic Spacetime(Part 6)

Part 6. Applying reusable SST patterns for graph data modelling

With the basic tools and conceptual issues in place, we can begin to apply them more seriously to build database-backed models of the world, including (machine) learning structures. The purpose of storing data is then not just to build an archaeological record of events, but to structure observations in ways that offer future value to possibly related processes. This is where Semantic Spacetime helps, but there are other practical considerations too. In this installment, I’ll walk through the Semantic Spacetime library package for Go to explain what’s going on.

Semantic Spacetime treats every imaginable process realm as a knowledge representation in its own right. It could be in human infrastructure space, sales product space, linguistic word space, protein space, or all of these together. In part 5, I showed how certain data characteristics naturally belong at nodes (locations) while others belong to the links in between (transitions).

How could we decide this rationally? Promise Theory offers one approach, and explains quite explicitly how models can be a “simple” representation of process causality.

Choosing Node and Link data fields

When we observe the world and try to characterize it, we have to distinguish between modelling things and modelling their states. These are two different ideas. Trying to think about objects led to the troubles in Object Oriented Analysis mentioned in the previous post. If we think about state, some information is qualitative or symbolic in nature (definitive and descriptive, and based on fixed-point semantics), whereas other information measures quantitative characteristics, degrees, extents, and amplitudes of phenomena. The latter obey some algebra: they may combine additively or multiplicatively, for instance.

In data analysis, and machine learning in particular, probabilistic methods stand out as a popular way of modelling. Probabilities are an even mixture of qualitative and quantitative values. The probability of a qualitative outcome is a quantitative estimate. Using probability is partly a cultural issue — scientists traditionally feel more comfortable if there are quantitative degrees to work with, because it makes uncertainty straightforward to incorporate. Thus one tends to neglect the detailed role of the spanning sets of outcomes. In probabilistic networks, people might refer to such outcomes in terms of convergence points in a belief network. But it’s important to understand the sometimes hidden limitations of probabilities in modelling. Sometimes we need to separate qualitative and quantitative aspects in a rational way, without completely divorcing them!

In IT, there are databases galore, vying for our attention and trying to persuade users with impressive features. Yet, you shouldn’t head straight for a particular database and start playing with its query language expecting to find answers. Wisdom lies in understanding how to model and to capture processes with integrity and high or low fidelity. My choice of tools for this series (ArangoDB and Go) were based on trial and error — after five other attempts.

Promise Theory, nodes and links

Our guide, behind the scenes, is Promise Theory. Promise Theory applies an extended form of labelled graph to model cooperative scenarios — with a few insights added. It was introduced originally for modelling distributed control structures, but has since been generalized into an independent tool for analyzing network processes of any kind. Promise Theory sees the world in terms of agents(which are nodes in a graph) and promises which are self-declarations by the agents. Promises don’t form links by themselves, but can combine, like chemical bonds, to lead to cooperative links. Promise Theory is a minimal model that unifies many ideas about interaction. It’s also the theoretical foundation for Semantic Spacetime.

Promise Theory tells us that, if some information (a property) comes from inside one agent alone, then that agent is its sole source, and thus the property can bepromised by that agent alone — using data from within that node. Even if the data were passed to it by another agent, in a previous interaction, it’s only what is in this source agent at the moment of interaction that matters. This is true on a point to point basis, transaction by transaction.

Conversely, if a property is a characteristic of some process that’s on-going between two nodes (a kind of entanglement, or co-dependence) then such a property is shared information, and thus the agent it arises from is the combination (entangled superagent) of the two — so the information belongs between them. The obvious place is then the directed link between the agents, which is represented as an “edge” in a graph database. A graph link symbolizes several promises and captures the idea of a directional process entanglement. There can be many links of different types of edge between the same pair of nodes (idempotently accumulated so that there are no identical duplicates), so we can model qualitative issues by type and quantitative issues by embedded weight or value.

Object modelling sometimes fails to model such cases because it has no concept of what could be “between objects”. That can be worked around by modelling shared entanglements, but it may lead to an unwanted explosion of types.

Sorting out the mechanics of where to put data fields — of how to update and retain information in a learning structure — is the first order of business with a data model. Ultimately, we’re looking for a dimensional reduction of concepts, to span a large number of ad hoc data with a small number of modelled concepts. All the programming intricacies shouldn’t obscure that goal.

So — now that we have all the issues in place — let’s solve them using our Goand ArangoDB tools, building on their low level programming APIs to make something a little more user friendly for scientists.

Turning principles into a template: pkg/SST

Take a look at the pkg directory in the SST github repository. This is where you’ll find all the code for this blog series. Part of that code is a Go package or library. Follow the “destructions” in the README file (or in part 2 of this series) to connect it to your GOPATH and you’ll be able to start using and modifying this proto-SST library for your own projects. The remainder of this post gives a quick overview of the package and how to use it.

The example code below shows a template for what every data application could look like.

SemanticSpaceTime/InitializeSST.go at main · markburgess/SemanticSpaceTime Contribute to markburgess/SemanticSpaceTime development by creating an account on GitHub. github.com

The setting up of a graph model on top of the underlying database is relatively complicated, due to the potentially large number of options involved. So this OpenAnalytics() function conceals a lot of details so that we can live with the overhead. It simplifies by creating a single graph with a number of fixed interior collections. The Analytics data type has three types of nodes called Fragments, Nodes, and Hubs — which can be used to model three levels of node aggregation. We often want to identify clusters and clusters of clusters, etc.

Later if we want to add new sets of Nodes and Links for temporary usage, we need to call new functions, AddNodeCollection() and AddLinkCollection()which are in the SST library. I’ll come back to that when talking about matrix multiplications in part 9.

In this code example above, everything at the beginning can be pasted into any program, and after the “do your own stuff” line of demarcation, we can get down to defining nodes and links (vertices and edges in graph parlance). There are functions for adding Nodes, Fragments, Hubs, and Links between them. The code is designed to be easy to understand, not to impress hard nosed software developers, so there is deliberate repetition. You can decide to keep it or throw it. I’ve added just a few data fields (descriptions and weighting values) to each kind of node and link. It’s enough to do a lot already, but you can add more (see below). Let’s run through the structure of what’s going on.

The code is packaged as a toolkit on github. You can browse the repository, download as a zip file, or use the “git” tool on Linux to copy the whole project using “git clone”.

First: opening everything up

As we’ve seen in past installments, we have to open a database as a service, and then create and open various collections every time we call an application. The library bakes all of this into an “analytics” object. Then the OpenAnalytics()function proceeds to:

Open/create Database Service connection
Open/create a specific database project by name
Open/create collections for nodes, hubs, and components (3 layers)
Open/create collections for edges in each of the four STtypes.
Make all of the above available through a single reference “g” for “graph”.

The code to set everything up is just this:

S.InitializeSmartSpaceTime()var dbname string = “SemanticSpacetime”
var url string = “http://localhost:8529"
var user string = “root”
var pwd string = “mark”g := S.OpenAnalytics(dbname,url,user,pwd)

Notice the use of the prefix “S.” to call functions in the library. Notice also that we assign the result to a variable “g” that we are now going to use to pass to graph management functions. It contains references to all the structures and their open handles. The full structure is defined in the pkg/SST.go file, and it looks like this:

type Analytics struct {// Container dbS_db A.Database// Graph modelS_graph A.Graph// 3 levels of nodes and supernodesS_frags A.Collection 
S_nodes A.Collection
S_hubs  A.Collection// 4 primary link typesS_Follows   A.Collection
S_Contains  A.Collection
S_Expresses A.Collection
S_Near      A.Collection// Chain memoryprevious_event_key Node
}

This contains an open database handle S_db, an open graph handle S_graph, open node collections S_frags, S_nodes, and S_hubs which describe three levels of node nesting to capture multidimensional structures, and open collection handles for the four spacetime types of link. From time to time, we might want to access these open handles directly to extend the library code.

Finally, at the end of the data analytics structure, I’ve added a reference to a “previous” link so that the graph structure is stateful with respect to process chains too. This is used in the passport example below, as well as in the next post to automatically create an incremental “log” of activity with a proper timeline. An automated chain makes it easy to build narratives without carrying around a lot of bloat.

Go Note: when using the previous_event_key value and its interfaces NextDataNode(), we have to pass the g variable by reference so enable updating. In Go, this means passing &g instead of g, and subsequent references in the child functions have to use *g rather than g.

Adding nodes and links