Creating and Exploiting Knowledge Graphs
Building the knowledge graph framework
We are building a knowledge graph that integrates research and clinical (EHR) data, and in parallel developing new graph prediction algorithms that can use both types of information to infer unknown properties of nodes, such as new (post-marketing) side effects of drugs. The graph includes public data such as the known ADRs, indications, target proteins of drugs as well as ADRs extracted from the free text of CRIS records using NLP. These concepts are linked to Biomedical Ontologies which enable reasoning with Description Logics (OWL) to allow for searching, visualisation and exploratory data analysis.
Exploiting (knowledge) graphs
The edge prediction algorithm we are developing for the drug knowledge graph is applicable to any graph structure, so we are also working to increase the scalability of our machine learning method to graphs with billions of elements.
- Daniel Bean
- Richard Dobson
- Honghan Wu
Upcoming presentation at IHF Conference http://informaticsforhealth.org
KCH Patient Flow Networks
It is increasingly important to understand the factors affecting hospital efficiency, and to do so from an unbiased and global perspective. We applied graph theory to analyse the movement of patients between wards in two hospital sites (King’s College Hospital, Denmark Hill and the Princess Royal University Hospital) over an 18 month period. By representing each hospital as a dynamic weighted, directed graph and we were able to identify a “core” sub-graph in each site that is critical to the flow of patients, and also to associate changes in flow throughout the network with extremes of A&E performance against the 4-hour waiting time target.
- Daniel Bean
- Richard Dobson
- James Teo (KCH)
- Clive Stinger (KCH)
Blood Tests for the Early Identification of Alzheimer’s disease
It can be hard to identify individuals at the early stages of Alzheimer’s disease (AD), and yet these individuals are likely to be those who would benefit most inclusion in trials of new treatments.
We have been investigating the potential of a blood test to help identify these individuals, so that they can be recruited into clinical trials. This has involved attempts to replicate previous findings about genetic and protein biomarkers, as well attempting to discover novel AD markers, such as metabolites (small molecules like fats and vitamins).
- Nicola Voyle
- Steven Kiddle
- Richard Dobson
- Claire Steves
- Nicholas Ashton
- Angela Hodges
- Cristina-Legido Quigley
- Simon Lovestone
- Plasma protein biomarkers of Alzheimer’s disease endophenotypes in asymptomatic older twins: early cognitive decline and regional brain volumes. Steven Kiddle, Claire Steves, Mitul Mehta, Andrew Simmons, Xiaohui Xu, Stephen Newhouse, Martina Sattlecker, Nicholas Ashton, Chantal Bazenet, Richard Killick, Jihad Adnan, Eric Westman, Sally Nelson, Hilkka Soininen, Iwona Kloszewska, Patrizia Mecocci, Magda Tsolaki, Bruno Vellas, Charles Curtis, Gerome Breen, Steven Williams, Simon Lovestone, Tim Spector, and Richard Dobson. Translational Psychiatry
- Blood metabolite markers of Neocortical Amyloid Burden: Discovery and enrichment using candidate proteins. Nicola Voyle, Min Kim, Petroula Proitsi, Nicholas J Ashton, Alison L Baird, Chantal Bazenet, Abdul Hye, Sarah Westwood, Raymond Chung, Malcolm Ward, Gil D Rabinovici, Simon Lovestone, Gerome Breen, Cristina Legido-Quigley, Richard JB Dobson and Steven J Kiddle. Translational Psychiatry (2016) 6, e719; doi:10.1038/tp.2015.205.
- Blood protein predictors of brain amyloid for enrichment in clinical trials? Ashton*, Kiddle*, Graf, Ward, Baird, Hye, Westwood, Wong, Dobson, Rabinovici, Miller, Rosen, Torres, Zhang, Thurfjell, Covin, Hehir, Baker, Bazenet, Lovestone, AIBL. Alzheimer’s and Dementia: Diagnosis, Assessment and Disease Monitoring (2015).
- A pathway based classification method for analysing gene expression for Alzheimer’s Disease diagnosis. Nicola Voyle, Aoife Keohane, Stephen Newhouse, Katie Lunnon, Caroline Johnston, Hilkka Soininen, Iwona Kloszewska, Patrizia Mecocci, Magda Tsolaki, Bruno Vellas, Simon Lovestone on behalf of the AddNeuroMed consortium, Angela Hodges, Steven Kiddle and Richard JB Dobson. Journal of Alzheimer’s Disease. 2015 Oct 15;49(3):659-69. doi: 10.3233/JAD-150440.
- Blood protein markers of Neocortical Amyloid-β Burden: A candidate study using SOMAscan technology. Nicola Voyle, David Baker, Samantha C. Burnham, Antonia Covin, Zhanpan Zhang, Dipen P. Sangurdekar, Cristina A. Tan Hehir, Chantal Bazenet, Simon Lovestone, Steven Kiddle, Richard JB. Dobson and the AIBL research group. Journal of Alzheimer’s Disease; 46 (2): 947 – 962.
- Candidate blood proteome markers of Alzheimer’s disease onset and progression: a systematic review and replication study. Steven Kiddle*, Martina Sattlecker*, Petroula Proitsi, Andrew Simmons, Chantal Bazenet, Eric Westman, Sally Nelson, David Sterling, Stephen Williams, Angela Hodges, Caroline Johnston, Hilka Soininen, Iwona Kłoszewska, Patrizia Mecocci, Magda Tsolaki, Bruno Vellas, Stephen Newhouse, Simon Lovestone, And Richard Dobson (* Joint first authors). Accepted to Journal of Alzheimer’s Disease
Heterogeneity of Cognitive Fecline in Dementia: Taking Into Account Variable Time-zero Severity
Understanding the progression of Alzheimer’s disease (AD) is critical to the optimal design of clinical trials, and the ability to plan an individuals care. However, the current FDA-approved model of AD progression only takes into account age, gender and the main genetic risk factor (APOE). While this needs to be improved, this work is held back by data with short follow-up and many data analysis challenges. A vast amount of data on AD progression is available through anonymised medical records, but a key problem is that patients can be at different disease stages at first assessment.
In this project we aim to overcome this problem through the introduction of a novel approach – Temporal Clustering – that jointly learns a set of common trajectories and a new time frame in which, at first assesments individuals are predicted to have been at a similar disease stage.
- Elizabeth Baker
- Richard Dobson
- Chris Wallace
- Alice Parodi
Development of a High Throughput Gene, Environment and Epigenetics Database and Analysis System for International ALS Research
A combination of rapid advances in genetic technology and a close-knit ALS research community, particularly in Europe, but also internationally, means that we are now collecting huge amounts of multilayered genetic, epigenetic, environmental and clinical data, much of which is difficult to share and therefore to analyse. As a result, we lose the main advantage that large-scale collaboration brings, partly negating the benefit of a well functioning research community. This project is a collaboration between ALS researchers and bioinformatics experts to resolve this issue and facilitate the storage, analysis and sharing of such large-scale clinical and research datasets.
The project is embedded within the bioinformatics group (http://phidatalab.org/) at the NIHR Biomedical Research Centre for Mental Health (BRC-MH), Kings College London. From here, we are collaborating with ENCALS and NEALS through multiple consortia including STRENGTH, EuroMotor, Project MinE, SOPHIA and ALSGEN. The solution will be based on dedicated open source frameworks for the integration of genetics and related data for analysis, tailored to fit a set of specific user requirements set out by a stakeholder group. Since the EuroMotor and SOPHIA projects use an existing Progeny-based clinical database, a specific component will allow a direct two-way transfer of key data between the two systems allowing them to be compatible and avoiding duplication of effort.
We are implementing a solution that will enable the sharing of each of the 1-4 levels of omics and associated data: Level 1 data refers to the data in its raw, non-normalised form, levels 2 and 3 represent data at different stages of processing and level 4 represents summary results data
The data base system will be a combination of both a central data warehouse that can be accessed by users through a web browser, and a federated system where multiple storage locations are used but accessed through a single portal.
We are producing a set of standardised analysis pipelines for data processing and submission to the repository. For example, there will be a pipeline for ALS Genome Wide Association Study (GWAS) genotype imputation.
- Prof Ammar Al-Chalabi
- Dr Richard Dobson
- Prof Leonard van den Berg
- Dr Stephen J. Newhouse
- Dr Alfredo Iacoangeli
- Dr Jan H. Veldink
- Prof Vincenzo Silani
- Prof John Landers
- Prof Adriano Chio
- Prof Orla Hardiman
Large Scale Omics Data Intergration for Biomarker Discovery, Drug Repositioning and Screening for New Therapeutic Targets for Alzhemier’s Disease
Alzheimer’s disease (AD) is the most common form of dementia affecting over 44 million individuals worldwide with numbers expected to triple by 2050. Based on the age of onset, AD can be primarily classified as either early-onset (familial) AD, where AD manifests before the age of 65 and is inherited in a mendelian dominant fashion; or the much more common late-onset (sporadic) AD where AD manifests after the age of 65. Progression of this disease can only be clinically measured and monitored using neuropsychological tests such as Mini Mental State Examination (MMSE) or neuroimaging of amyloid-β (Aβ). However, the high cost of neuroimaging and medical complications prohibit routine global use. A peripheral blood-based biomarker would serve as an easily accessible, relatively cheap and time-effective approach in diagnosing and monitoring AD, however, currently, there is no clinically established diagnostic blood biomarker for AD.
Furthermore, there is no known cure or clinically approved disease modifying therapy; however, drugs are available for temporary symptomatic relief. Drug repositioning may offer an innovative approach to drug discovery and identification of new therapeutic targets for AD.
Microarray technologies have the ability to measure genome-wide gene expression providing a comprehensive view of gene activity within biological samples. Using this technology, the detection of a gene or combination of genes that are under or overly expressed in AD, could potentially be exploited as biomarkers for the disease. A vast quantity of gene expression data is increasingly becoming publically available through public repositories such as ArrayExpress, which is a generic database designed to store data from all microarray expression platforms. We are currently exploiting some these databases to integrate transcriptomic data from AD, non-AD neurological and age-related disorders for a large-scale analysis using machine learning techniques to create an AD specific blood diagnostic test with clinical utility in mind.
In addition, these public repositories provide transcriptomic data generated on various human brain tissues, regions where proteins involved in AD manifest. Using a meta-analysis approach and an extensive database containing compound-induced gene expression profiles (connectivity map), we aim to identify potential AD intervening compounds through drug repositioning. Our initial research has incorporated 47 AD brain expression datasets to re-discover a candidate small molecule for possible AD intervention. We now know through mouse trials (Pharmidex), this compound, when administrated intravenously, is also able to cross the Blood Brain Barrier.
- Richard Dobson
- Stephen Newhouse
- Sang Hyuck Lee
- Charlie Curtis
- Gerome Breen
- Claire Troakes
- Safa Al-Sarraj
The objective of the project is to develop an image analysis pipeline that reliably identifies induced pluripotent stem cells (iPSC) which exhibit a highly variable and challenging morphology.