A Modern Data Platform for Precision Medicine
Posted by: Tony Loeser 01/27/2014

If you set out to build a data platform to support modern personalized medicine, what would it look like? This is the challenge we face developing the “back end” software at Syapse. There are many aspects to this challenge, including interface and scalability and others. In this blog post I discuss how we organize and query the biomedical data that we handle.

From my point of view, the fundamental characteristic of our customers’ undertaking is its complexity. The knowledge and data sets that medical researchers and doctors work with every day are fantastically large and intricate. Put in software terms, you could say that the human body is an integrated system that makes your average enterprise application suite appear, well, modest. Biologists and physicians are reverse engineering mother nature’s crazy masterpiece, building the manual as they go.

Enter Syapse; we make that complexity manageable. Our answer is to leverage semantic technologies to build a powerful and flexible data platform. Through our user interface and API, we present a set of usable concepts for working with precise, structured data. Under the covers, the implementation uses components based on W3C semantic data standards. For example, our users view data as records, organized by templates. Internally, the data is stored as RDF triples, organized by OWL-compatible ontologies. Similarly, users construct queries either visually or via our SyQL query language, while internally these are a simplification of SPARQL queries into the RDF triple store.

How is this approach suited to address the biomedical data domain?

Biomedical information itself, with the structure and vocabularies we use to represent it, is extremely intricate. The high expressivity of an ontology-based data model is critical. This model is also extremely flexible, which is important because nobody has figured out the definitive schema for biology just yet. We organize our ontologies in customizable layers because every one of our customers sees the world differently. We expect our customers to change their ontology frequently, because medical knowledge is constantly changing and improving, and our platform supports that.

Any work with biomedical data needs to integrate with the vast amount of knowledge that is publicly available. It is a connected world, after all, and the most powerful systems are the ones that take full advantage of that. Because of its flexibility and loose commitment to schema, RDF is the ideal model for data integration. Whether we are pulling in vocabulary or taxonomy from one of the many public biomedical ontologies (e.g. ones found in NCBO’s BioPortal repository), or integrating with data from a public service such as NCBI’s dbSNP or ClinVar, we can easily relate the concepts from those sources to our users’ custom ontology.

In biomedicine we have specific, precise requirements when analyzing data; in most cases approximate results are not good enough. Syapse’s structured platform provides the necessary level of granularity. Our search facility enables highly targeted queries that combine anything from process QC data to clinical patient information, at a level of detail that is tuned to each customer’s needs. Meanwhile, the RDF data store supports a detailed audit capability, as we track the provenance of every individual value.

The value of this precision is demonstrated when working with genomics-based diagnostic reports. Generally, these reports are built by taking knowledge bases of curated public and private data, and applying them to a patient’s genomics results and clinical data. Syapse’s data platform is built to manage any or all parts of this process, from collecting and managing the curated knowledge base (CKB), to presenting the report to a clinician and capturing feedback. In particular, Syapse provides a facility for generating the report based on a set of semantic rules that describe how records in the CKB relates to the patient’s data. There can be no surprises in this report generation process; every step must be precise, transparent, and auditable. Syapse’s semantic data layer is the ideal platform for implementing such a careful process.

These examples illustrate the strengths of Syapse’s semantic platform. Other aspects will be subjects of future blog posts. For now, let’s say that our data platform’s customizability, ease of integration, and detailed structure make it a tool that is uniquely suited for precision medicine.