Information Overload
As medical technology
becomes more precise and continues to deliver growing quantities of data, the
process of analyzing such large figures can be overwhelming. A recent news
briefing from DarkDaily reported unstructured medical laboratory data as “one
significant hurtle on the path toward the universal electronic health record” (EHR),
marking a breakthrough for researchers at MIT in the interpretation of this
information.
According to
the news briefing, Anna Rumshisky, PhD, Assistant Professor in MIT’s department
of computer science, noted the popularity of synoptic
pathology reports (SPR) and incorporated an algorithmic approach. The problem
with SPR is the open-ended appearance in which the information is structured. By adopting a technique based on top modeling,
an area of research that “seeks to automatically identify the topics of
documents by inferring relationships between prominently featured words,” the MIT
research team used word-sense disambiguation (WSD) to develop a method of data analytics
that could potentially impact EHR and electronic medical records (EMR) as a
whole – “more accurate systems that require less human intervention.”
In the US,
WSD and top modeling have proven reliable in the understanding of clinical notes
– “an average accuracy rate of 75 percent,” as opposed to the previous 63
percent average. Across the pond, however, researchers have been employing data
visualization to interpret big data in fields like molecular biology. A story from LaboratoryNews
chronicled the role of specialist biostatisticians and bioinformaticians in
data analysis over the expert scientists and researchers who performed the
tests originally, stating that a combined effort from both groups is necessary
in providing quality results.
“Of course,
I still think that a good collaboration between statisticians and biologists is
vital,” said Anna Andersson, a scientist studying childhood leukemia. “It is
very useful, for example, to discuss cut-offs and statistical significance with
a statistician in order to make sure that the data is not ‘over-interpreted.’ However,
with the latest data visualization tools, a biologist is now able to query the
data instantly, and to be perhaps more critical about the data, by looking at
the information in a way that is different from a statistician.”
In an effort
to ensure this “partnership,” data is visually being displayed on a screen
using powerful software “designed to take full advantage of the most powerful
pattern recognizer that exists: the human brain.” This approach has made great
strides in areas like genome sequencing, utilizing visualization and
mathematical techniques like heatmaps and principal component analysis (PCA) to
interpret large amounts of data.
With the
flood of information rising from improvements in new technology, the idea of
big data simplified into visual structures allows for the promise of testing
that can be analyzed as quickly as it was administered. On the other hand, the
DarkDaily news release cited unstructured data analysis as “one reason why
clinical laboratory managers and pathologists may want to follow further
developments with this research.” As interpretive software continues to improve,
electronic record keeping is becoming solidified as a standard in the medical
industry.