visit
“From when I was very very little, I always dreamed of developing crazy ideas and making them a reality.”
Alessia’s current work is in developing a Clinical Decision Support System (CDSS). This isn’t a new concept as she highlights, but one with a lot of opportunity for improvements and developments. So…How long does a patient spend with a doctor, on average?
With the exception of countries that self-report, the average was around 9 minutes per encounter. Alessia notes that after speaking with doctors and nurses, the reality is closer to 5 minutes. Now imagine what needs to be accomplished by the doctor in those 5 minutes:This is a tall order, one that can lead to missing information, misdiagnosis or at worst, a recommendation of treatment that causes adverse effects on the patient.
Let’s look at what using a tool like this might look like in the field:
“A computer system that emulates, or acts in all respects, with the decision making capabilities of a human expert.” - Professor Edward Feigenbaum — Stanford University
What are the Components of an Expert System?
in bringing expert systems to the medical decision domain?
As a concept this has been around a while. In the medical domain however, there’s much to be desired. In the 1970s there were a few expert systems built as clinical decision support systems, such as De Dombal’s original in 1970, focused on acute abdominal pain. , built in 1974, was a general internal medicine support system, while De Dombal used naive bias as a guide for the reasoning, was the first rule based CDSS built in 1976.It is remarkable that between MYCIN in 1976 and the turn of the century, there was almost no development done to improve or innovate in this space. In the last 20 years interest has grown and a new found effort has been placed into empowering medical recommendations with technology. Today, we have just one CDSS in production. However, in the medical field there is little to no examples or projects using a knowledge graph for such systems.
Quickly..."What is a Knowledge Graph?"
"...knowledge graphs have a flexible structure: the ontology can be extended and revised as new data arrives. This makes it convenient to store and manage data in a knowledge graph if you have use cases where regular updates and data growth are important, particularly when data is arriving from diverse, heterogenous sources. A knowledge graph can support a continuously running data pipeline that keeps adding new knowledge to the graph, refining it as new information arrives." - (hackernoon 2017)
Here we get a look at the data flow within SOPHIA. As Alessia is developing this CDSS in partnership with Medas, their medical technology products are used for passing patient data into SOPHIA. The three products used are:
These reports are then combined with the NLP output of the medical guidelines. In the future, Alessia would like to incorporate existing, publicly available datasets like: Pubmed or Wolters Kluwer. These would be used to match the bibliographic data from the guidelines to these datasets.
For the end user SOPHIA outputs three types of data:Identify and Align to Terms:
Terminology can cause problems in a system that uses multiple sources of data in the same domain, a headache and a migraine for example. Alignment is necessary to before building the Grakn schema
Define Concepts :
Take these aligned-to terms and solidify the hierarchy of concepts that are necessary for the system. This often takes some thinking about what questions the end user will end up asking of the system, as well as how the concepts relate to each other in their real world context.
Descriptions, Rules, Decisions, Plans and Agent :
These components are where the “fun” starts. Now that we have some NLP output and a concept level Grakn schema, how do we go about matching, finding relations between, the medical guidelines and patient data?
1. Identify Input Data to be Used
The first step is to crystalise what data you plan on migrating into Grakn. These can be structured and unstructured data, output from an NLP pipeline, patient demographic data, etc. Alessia uses a combination of all these:2. Identify Text Mining Tool
The next step is Alessia’s favourite. It’s important to assess the NLP tools available through the use case you are working in. Tools like Stanford Core NLP and Spacy are useful for general domain use cases. Stanford Core NLP requires a custom named entity recognition model, which for the medical domain would require a lot of work to make effective.The two other tools assessed for SOPHIA were and . The latter does some impressive things but does come with a financial investment. Apache cTAKES is an open source system widely used in the medical community and one that has seen its popularity grow in the wake of the COVID-19 pandemic. ApacheCon this past year an entire track to the tool. The training work that makes Core NLP and Spacy too resource intensive to set up, isn’t required for both of these systems, making them ideal solutions for the medical and life sciences domains.Ultimately, Alessia chose Apache cTAKES for the NLP work within SOPHIA. However, Alessia did note that the desktop application wasn’t helpful given she works in web apps. We can see below a selection of the key pipelines that the tool can handle:3. Mine Text
Having annotated the entities and dropped some free text , italian medical guidelines in Alessia’s case , into Apache cTAKES, the output is then transformed into a low level JSON. This output is then able to be utilised in other ways.As you can see in the code snippet below, the text annotation is divided into categories:
signs-symptom-mention
, anatomical-site-mention
and disease-disorder-mention
.},
"disease-disorder-mention": {
"NEOPLASIA": [
"start: 2325",
"end: 2334",
"";polarity: 1",
"[codingScheme: SNOMEDCT_US, code: 1008369006, cui: C0027651, tui: T191]"
],
"CARCINOMA": [
"start: 3732",
"end: 3741",
"polarity: 1",
"[codingScheme: SNOMEDCT_US, code: 68453008, cui: C0007097, tui: T191J]"
],
},
"signs-symptom-mention": {
"TEST": [
"start: 95",
"end: 99",
"polarity: 1",
"[codingScheme:...]"
],
},
"anatomical-site-mention": {
"NIPPLE": [
"start: 2453",
"end: 2459",
"polarity: 1",
"[codingScheme:...]"
],
"UTERO": [
"start: 5652",
"end: 5657",
"polarity: 1",
"[codingScheme:...]"
],
"SKIN": [
"start: 2437",
"end: 2441",
"polarity: 1",
"[codingScheme:...]"
],
},
For each of these categories you have a key, this key is mapped to the entity that has been found in a medical dictionary — a list of all possible medical terms within a particular specialty. For each term you can find different objects: positional, start to end; polarity, representing the context in which a word has been found. A value of -1 when the word was found in a negative context; and 1 when the word was found in the affirmative. codingScheme is the medical dictionary where the match was found.
4. Model Ontology
The last step is modelling the ontology, and this part is really simple…In order to map all the data together, coming in from SOPHIA’s NLP pipeline, you need to model the data against Grakn’s knowledge model.Alessia provided [image below] screenshots from Grakn Workbase — Grakn’s IDE — visualising her schema.
Breaking it down quickly and then we’ll look at a Graql code snippet of Alessia’s schema below. In the image above we can see that there are entities;
guideline
, which has thematic-sections
, is further broken down into sentence
s. These entities are related to each other through a relation: guideline-contains-thematic-section
, and thematic-section-contains-sentences
.The attributes are used to provide the specific pieces of raw-text:
guideline-raw-text
, thematic-section-raw-text
, sentence-raw-text
.Continuing from the previous slide,
sentence
is connected to a token
. This token
matches the entity recognised by Apache cTAKES and is linked via a relation with the sub types of the entity medical-entity
where each plays the role of mined-token
. In this way we can see that Grakn uses type inheritance, allowing all sub types to inherit the attributes assigned to the parent entity: start
, end
, polarity
, etc.How Does This Look in Graql?
define
medical-entity sub entity,
owns start,
owns end,
owns polarity,
owns [attribute-name],
plays ctakes-named-entity-recognition:minded-token;
token sub entity,
plays [relation-name]:[role-name];
start sub attribute,
value: string;
end sub attribute,
value string;
polarity sub attribute,
value string;
ctakes-named-entity-recognition sub relation,
relates mined-token,
relates [role-name];
anatomical-site-mention sub medical-entity;
medication-mention sub medical-entity;
disease-disorder-mention sub medical-entity;
date-annotation sub medical-entity;
drug change status annotation sub medical-entity;
fraction-strength-annotation sub medical-entity;
measurement-annotation sub medical-entity;
strength-annotation sub medical-entity;
frequency-unit-annotation sub medical-entity;
sign-symptom-mention sub medical-entity;
cui
- concept unique identifier: assigns a concept code to medical-entity which is then shared among all the synonyms of the wordtui
- semantic concept unique identifier: the semantic type of the word foundRemember the problem:
How can a doctor provide a comprehensive and accurate recommendation to a patient, based on the doctor’s experience, patient history and up-to-date guidelines from the medical field; when they have, on average, 5 minutes in the room with the patient?Meet Angela, a 45 year old patient with breast cancer. Angela goes to the doctor where she describes her symptoms (as in the slide above) and is examined. The doctor enters this information into the application, miFort, and goes through the signature approval steps before sending the report to SOPHIA as free text.The report is then read and queried across Grakn to look for any matches. If such a match is found, a recommendation of action is provided to the doctor, with explanation of the links between the patient’s situation, represented by the medical report, and the knowledge base, represented by the guidelines or medical dictionaries. This is given as a bibliography for the doctor to reference.
Ambiguous Entities
Ambiguous entities in the graph can bring about a host of other challenges; however, by reasoning not on the text itself but on the entity identification code — as shown above. Alessia never looks for a link between text but between concept codes (highlighted in red in the above slide). This ensures specificity and accuracy in her queries.If you’re nervous about being precise or can’t find a code for what you need, here are 20 strange medical concept codes for coding diagnoses and symptoms.Entities with the Same Meaning
This is relatively simple, as again, Alessia is reasoning not on the text itself but on the
cui
. Never looking for a link between text but between the cui
attribute. So that when queried you are able to get the synonyms you want.Concluding her talk, Alessia welcomed any contributions in the form of GraphQL integrations, self awareness of concepts within the knowledge graph or general interest in furthering the work of patient care through technology. Special thank you to Alessia for her inspired work, contribution to the community and for always bringing joy into her work.You can find the full presentation on the Grakn Labs YouTube channel .Previously published .