The goal of PPRL techniques is to link horizontally partitioned data referring to the same person. Such techniques can be used, whenever person data relies in different data sources and need to be merged to analyze them or, basically, to exclude them in order to avoid duplicate entries. Application domains include medical sciences, e.g., when patient data have been captured in a multi-center clinical or epidemiological study by using different pseudonyms at each medical center for the same patient.
We have created two RESTful services. An encoder service transforms identification data into bit vectors (one way hashes using Bloom filters). This service is typically available within an institution or organization and is feeded by a Master Patient Index (MPI). Hence, no identification will spread out over institutional borders but bit vector pseudonyms. These pseudonyms will then be used by the second service, the match service. This service computes the similarity (distance) between the bit vectors and returns how similar (different) two bit vectors are. Taking this similarity (difference) into account, we can probalistically derive whether the corresponding persons are the same
A Showcase User Interface
We have built a simple showcase application called Basic Privacy Preserving Person Entity Linker (BP3-Link). The goal of BP3-Link is a) to show how the services work and b) to let you playing with both PPRL services, the encoder and match service. This showcase app utilizes both services in the shown way (see overall architecture below). In contrast to real application scenarios, the showcase user interface replaces the Master Patient Index components allowing to specify identification data for both institutions and, thus, to simulate the match process by given artificial data.
This showcase takes person identification data which will be sent to encoder services in parallel (1). Both encoder services (for Organization A and B) generate a bit vector for each person (left and right side in the UI) which are sent back to the showcase app (2). The app, then, sends these bit vectors (3) to the independent match service computing the confidence between the two given bit vectors using a selected metric. Finally, the results are shown below the data input panel.