Cautiously Merge

We have developed techniques for adding new information sources without disrupting the work we have already done. In essence, we use queries of partially integrated data to understand how more useful relations can be established.

Adding a new source follows a series of steps that might take hours of individual analysis or days when the expertise of others must be engaged to fully understand the relationships between new and existing nodes.

steps

Load the new data into the graph without regard to correctly relating it to existing nodes.

Write queries that study the degree that new data agrees with that already in the graph.

Improve the matching of old and new as encoded in the algorithms used in the transform step.

Query again to measure improvement in matching old and new data. Iterate improving and measuring.

When we first load new sources we often have a description of related entities without fully understanding if they are the same as or related to existing nodes. In this case we first translate them as three-letter codes, KCG for Kafka consumer group, or NGK for NerdGraph Key.

Provisional label names brings the data into the realm of our query language without making any commitment to update existing queries.

See Match with Heuristics